Simple scan for slow machines. One thread file scannig.

I hame Syncthing server with 1000000 files and 20 clients with 50 folder. Program work very hard. Every day server rebooted, and when started higth cpu load and hdd.

  1. How can scan files in one thread. HDD with higth load.
  2. Minimize CPU when scaning - only on size and time.

I this release syncthing good program with small count of server and folders. Server configuration: i5-2500k 16gb, SDD (for config), HDD (for storage)

With that configuration it already uses only one thread per folder. But it is per folder and this can’t be changed. On startup, those 50 folders are scanned in parallel. With a sufficiently long rescan interval Syncthing will then randomize the following scans to spread them out over time.

The scanning is already based on size / modtime, but for a hard disk when things are not in cache that’s still a lot of reads to do. (Every file stat is accompanied by a database read too.)

You could also start syncthing with -paused, which will pause all devices and folders on start, and then start the folder individually when the folder before has finished scanning (status can be chacked via rest/event api), or start the next folder every 10 minutes or so.

2 Likes

Why?

One (or seting) thread file scaning. When many folders scan folder veeeeeerrrrryyyy slow with HDD degrade. My this feature weel be in the next release.

Image youself - 50 threads on HDD (not SSD) with 1000000 files + db acsess + OS needs - HDD speed will be very slow. HDD bad work with many thread.

I start full rescan 2 day ago. Some folders “up to date” but som “out of sync” and thay becme scaning and agayn “out of sync” (HDD not stop and work hard) - clients are stopped. Size of data 2 Tb When i try to analyze HDD acsess by procmon from sysinternals tools - i see almost only DB acsess. 99% DB- 1% Files Read. My DB stay in memory?

How work DB logic?

When i try to start client system freeze. I do not undestend what are they doing? HELP

You should obviously keep the database on the SSD. “Out of date” means it’s trying to sync something, unless the folder is in send only mode and someone else has changed their data.

The scanning logic is essentially:

  • Walk the directory tree
  • For each file, get the name, modification time, size, permissions
  • Look up the corresponding entry in the database
  • Compare. If it differs, add to the queue of files to rehash.
  • When done, walk the database instead.
  • For each entry in the database, see if the file still exists on disk.
  • If it does not, mark it as deleted.

I see how many access to HDD DB from procmon - SSD lifetime limit with writes. I see very many DB acsess, may be cache DB in memory?

How this may happend?

Folder Path s\Profile Global State 21876 709 ~21.5 GiB Local State 21821 709 ~21.5 GiB Out of Sync Items 21871 items, ~21.5 GiB

All 21 gb have other data, size and other? But i setup for one client ignore file permison to reduce file scan.

Out of sync could mean metadata (mtimes, permissions etc) are out of sync.

[quote=“AudriusButkevicius, post:12, topic:9814”] Out of sync could mean metadata (mtimes, permissions etc) are out of sync.[/quote] Or maybe some Ignores are set somewhere.

SSD’s lifetimes are limited in bytes written, not the number of writes. The DB is small, and only makes small incremental updates as files/folders change. Furthermore, any decent modern SSD (consumer or enterprise) supports pretty hefty TBW (in the hundreds of TBs). If you’re not tearing up the SSD with other writes, it won’t be an issue – and if you are tearing up the SSD with other writes, it’s the least of your issues.

PS: Why does your server reboot every day?