Rescan interval for 3million files

Hi! Heres info regarding the data being synced: Files 2,871,103 Folders317,419 Data ~1.11 TiB 10th gen core i7 so no problem with the CPU

Just need some advise regarding the scan interval. It did take quite a few hours IIRC for the initial scan , so dont think hourly or even daily would be suitable for full rescan?. Currently have it set to 500000s which is around 6d.

Watch for changes is on.

Would having a long full rescan interval have negative consequences? What would be the issue since watch for changes is on?

Thanks!

  1. There’s filesystem watching, which picks up changes directly when they happen, thus you can get away with much less scanning.

  2. Scanning isn’t the same as hashing: If nothing changes, it will just go quickly over all files. In the first scan everything “changes”, as everything is new, and thus everything gets scanned, which takes a while. So further scans will be much faster.

Wonderful. Will set the scan to every few hours for now. THanks for the support . Much appreciated

Question : Does a restart of the PC redo the entire hashing process , or will it redo the quicker scan? Thanks

Absolutely no idea since you don’t mention what sort of storage you’re using, what else the system will be doing, whether Syncthing will be throttled, where the database is being kept, what filesystems, if you’ve made changes to the default Syncthing config, how many devices and folders are being used, and how often they’ll be busy. Et cetera. Your best bet is to find out for yourself, since the differences between systems and configurations can change the times a lot.

The first scan is also a hashing process so it’ll take drastically longer. Once this has completed, rescans only have to check metadata and re-hash changed files. As has been mentioned, Syncthing will automatically pick up changes through the watcher, although this isn’t 100% effective if you’re making lots of changes to many files in a short space of time.

The longer the rescan interval, the longer it will take before picking up any changes that have been missed by the watcher. If the long rescan time is a problem, you could always schedule it for a quiet period via the CLI or REST API. The interval setting relates to the last time a scan was completed, so if you automate it, you shouldn’t expect rescans outside of that.

Restart of the PC/Syncthing will resume the hashing process from where it left off - although I’m not sure if it has to restart from the last file or the last chunk of the file it was hashing at the time.

Although, to be frank, I would say that “quickly” is very relative here. In my case, it used to take 10+ minutes to go through ~30,000 of files on a very old phone with slow storage and a weak CPU. On fast hardware, it takes seconds at most.

Here, I would bet that the bottleneck will be the storage, unless it is an SSD. If it is an SSD, it shouldn’t take very long regardless of the number of files.

Yes, scanning results are persisted in the database/on disk, rebooting doesn’t reset anything.

It is quick. It is still quick if there’s a lot of data, “quick” for a lot of data just naturally means more time than “quick” for little data. And the same for a slow system: Obviously “quick” there takes longer than on a fast system. It’s nice that Syncthing can be and is used on incredibly slow/old/weird hardware, but in my opinion one shouldn’t give advice based on experiences from such hardware.

Well, we don’t know the exact hardware specs, but if the data in question is located on a standard HDD, then I would say that going through all the 3 million files in question is not going to be quick, regardless of the fast CPU and other factors, since the storage will be the bottleneck before everything else.

I’m speaking from my personal experience. For example, I have 450,000 files located on a 5400rpm HDD. Scanning takes a long time, even if there have been no changes, while the CPU usage always stays very low. Of course, the OS being Windows may also play a role here, but I haven’t done any extensive testing in this regard.

Then let’s say “as quick as quick can be under the given circumstances”. As imsodin says, slow hardware is slower than fast hardware. I’d take that one step further and say that if you’re running 5400 RPM disks, performance is obviously not a concern so it doesn’t really matter how quick quick is, precisely.

2 Likes

On the RO side, I have over 4M spread over 7 drives. All are set to 86400 and the watcher off. On the sending end, it’s also 86400 but with the watcher on. Has worked well for many years. I don’t think there is a need for St to have the scanning set to 60 by default when the watcher is also enabled.

Key thing is to ensure the database is on an SSD else it will painfully slow when checking / scanning millions of files.

Wonderful.

Storage drive is 3.5 hdd. Windows / database is on a ssd.

Basically I just needed to confirm the latter rescans won’t take as long as the initial scan. This has been confirmed, thanks for your help.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.