I noticed that after every update the initial scans and meta data updates are very slow because all 20 folders are scanned at the same time causing I/O bottleneck (normal HDD, not SDD). This time it took several hours to scan ~500 GB and ~2,5 mio files although none of them have changed in the meantime (I always make sure of that before update). Hash speed is ~130 MB/s and if I do a manual scan of all folders one by one it usually takes 15 minutes to scan all files.
Would it be possible to do scans sequentially? If found #4888 with code which looks like a solution but I don’t know why it was not implemented.
We could special case hack it for initial scans, essentially delaying startup until they are done one by one. I don’t particularly think it’s attractive though… These are just metadata walks so there shouldn’t be any particular penalty to doing more than one in parallel, as I understand things. As opposed to when there are things to actually hash, which is when the parallelity hurts (on spinning disks).
I wonder if part of it is just the database being in cache vs not?
I’m not a programmer so I only understands parts of the points described in the issue.
Another problem is that while the multiple scans slowly progress, the index (updates) are constantly sent to other devices. This causes even more db accesses and additional slowdown. Then since everything is maxed out also the GUI dies and clicks on “pause folder” are no longer sent to backend. It’s all at the same time what causes the problem.
The short of it, for the issues with that PR, is that there are things that depend on scans happening to make progress. If we block scans, we block progress. We don’t want to prevent folder A from pulling a change because it needs to do a scan first and folder B is already scanning.
It’s only after updates/startup - during normal use there are no problems as scans almost never hit at the same time. Personally I would not mind if there are no pulls, transfers or even connections during initial scans as sequential is a lot faster here. It might be a while until I can afford a 4 TB SSD so I appreciate any solutions you have.
I don’t know the right terminology in GO, but it might make sense to move the file reader into it’s own thread (channel?) and send requests to it to scan files, which it would then queue up and return.
This way there would only be one file being read at once (this could always be increased to one per physical device).
I’ve got an SSD, but have noticed slowness in Syncthing, but also other apps that try and scan lots of files at the same time (e.g. if I open a few copies of PyCharm on big projects at once).
If we block scans, we block progress. We don’t want to prevent folder A from pulling a change because it needs to do a scan first and folder B is already scanning.
Except that the workaround to this problem - pause all the folders, then unpause one, wait for it to finish scanning, unpause the next one - ALSO blocks all progress. As well as being really timeconsuming and annoying. And not working around this problem, letting syncthing scan all the folders at once and build up a massive disk IO queue - ALSO blocks all progress AND stops anything else that uses the same physical drive from doing anything either.
I don’t see how the issue you raise means the fix should be rejected when it makes things better and it doesn’t make anything any worse.
If literally what it did internally was start all the folders paused and unpause them one at a time when they were ready that would be totally fine by me.
I’m seriously considering having to ditch Syncthing over this - at least in one use case. One of the things I’m using Syncthing for is to move backups from a production Sql Server to a backup server (and my development environments). When Syncthing decides to rescan all the folders simultaneously it flattens the backup drive (which is a separate device from data and logs!), and somehow that also means that Sql Server then grinds to a halt. If I’m lucky I can get onto the server to stop Syncthing, if I’m not I have to reboot it and then stop the syncthing service before the scan starts up. Having your backup solution break the server that it’s supposed to be backing up is a bit of a red flag.
We do essentially that (except per folder), on Windows and macOS only one thread will hash files, while on Unixes we default to the number of CPU cores. However I think the problems people are suffering from here are just the metadata scans, so not really related to the hashing.
It makes unknown things worse because it’s something not tested and breaks assumptions in the code. It’s also something of a niche problem, and not actually the problem of the thread starter here.
But the code is out there, the diff is small, and you could test it to see if it makes your life better. The PR is by Audrius so I think it has a decent chance of being correct, but also that the only testing it’s been put through is that it compiles and didn’t immediately blow up his computer… Our concerns were mostly theoretical but noone had enough energy to clear them up. Someone actually running it in production and saying it’s awesome goes a long way towards getting something merged.
When I unpaused all the folders they all went to “Waiting To Scan” before switching rapidly to “Up to Date”. Same with “Rescan all” - everything went greed after a couple of seconds.
So - looks very encouraging. But how do I get it in a state where it does an actual “read every byte” rescan to test that? Stop SyncThing, wait for a file to be added to every folder (which will be in 30 minutes), restart Syncthing?