Looking for ways to speed up scanning on large folders

Hello,

I am using syncthing as a solution to sync folders from on-premises to cloud. A<ll folders that are synced are part of a single NFS mount(read only). Selected sub folders under the mount are added for syncing. Size of sub folders vary. Solution works really well and files are synced. Thanks to the developers and community for such a fantastic application.

Although syncthing eventually syncs all files and folders, for large folders i have notices that syncing is slow. Last scan update(per last update timestamp in syncthing page) also does not appear to pick up newly created files.

I tried kicking of a scan manually. Observed that scan takes long time for large folders(with lot of files). It makes sense that entire folder may have to be traversed for differences and that may just take more time.

I am trying to find if there are ways to speed up scanning for folders in syncthing when folders exists in NFS drive. I read some posts on using inotify watchers. I am not sure if they work for NFS filesystem as these depend on notifications from kernel.

Other option i explored was to use ignore patterns on folders. Idea being that ignoring some patterns/big-files could save on scan times. However there are some restrictions(NFS is read only) on creating .stignore file under each folder that will be synced.

Does anyone have some thoughts or ideas on speeding up scan times for large folders?

Thanks

Filesystem notifications would be great, but they don’t work across NFS. This leaves you with full, periodic scans. For a large tree that’s a lot of metadata to fetch, and over NFS this is quite slow. The NFS client has metadata caching that you may be able to tweak (enable or prolong the cache time). You might even get better performance by doing more frequent scans, if that means the metadata is kept in cache as opposed to timing out between every scan.

Better yet, if you can, is to run Syncthing on the box that actually has the files, instead of over NFS.

Ignoring files won’t help. If you can ignore large parts of the tree, like directories at the top level, that would help as there would be less to scan. Don’t use any negative patterns (beginning with !) as Syncthing will then need to do a full scan regardless of the other ignore patterns.

2 Likes

Thanks for your valuable suggestions. Scan interval was set to 10 minutes. Running syncthing on NFS server will certainly help as you had mentioned. I was also thinking of breaking down large folders into smaller sub-folders. My expectation is that scanning can happen in parallel. Is there any way to increase number of threads that can be dedicated to scan particular folder?

The directory walk part of scanning is, as you guessed, single threaded. There is no way to change that at the moment. If pretty much all time is spent waiting on network roundtrips it may indeed be faster to split it up into multiple folders that will scan in parallel.

(Parallel walks are not a win on SSD and total fail on spinning disks, so I don’t see a good reason to parallelize this in the general case.)

2 Likes

Sure, thanks Jakob!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.