Optimizing syncthing (scanning) for weak machines with s3 backend

Running syncthing on a k8s cluster, with object storage (s3) mounted as data folder

There is Optimising Syncthing for low end hardware that looks into some of the settings

Limiting hashers / copiers / max concurrenct scans is working fine so far.

The biggest bottleneck I see is scanning which is very taxing on a remote mounted file system, a folder with around 30k files in it takes almost an entire day to complete scanning. I increased hashers in the hope that it’ll do something but still very slow

This is especially a problem after doing node upgrades which causes k8s to restart the containers. Everything is getting rescanned.

What are things worth trying to speed things up, especially considering a backend that isn’t the best fit for full scans? I/O is less of a concern beacuse it’s not a real disk, but bandwidth may be

The linked topic is mine, but I’ve never used Syncthing with a remote/mounted storage, so all that’s written there was based solely on my experience with slow hardware locally. I’ve got a feeling you may not be able to do much in this specific case, as Syncthing isn’t really optimised/designed for using such mounted storage to begin with (even if it technically is possible to use it).

Dang, that’s sad to hear. It’s working so well for syncing and keeping my stuff up to date, it’s just this bottleneck that I want to somehow resolve

You would need to run a separate Syncthing instance on the server where the network storage is mounted from, and use that to transfer data instead of the network mount. Syncthing is designed work on synchronized local copies of the same data, which is rather orthogonal to a traditional network mount where only the data currently needed goes over the wire. Combining them as you are trying to really gets you the worst of both worlds performance-wise. I guess you should re-evaluate your architecture and choose one or the other solution for data transfer, not both.

“Scanning” can mean two different things depending on what precisely is going on.

The first phase looks at just metadata – directory listing, sizes, modification times, nowadays inode change times as well. I guess that can be expensive by itself, especially if the metadata isn’t cached locally but requires S3 roundtrips for each thing. Local caching is the only thing that would speed that up I think. We do a lot of these things and live in the assumption that local file metadata lookups are reasonably quick. When that assumption doesn’t hold everything will slow to a crawl.

The second part is reading contents, if any of the metadata changed. That shouldn’t happen normally, but if some part of the fake metadata presented does change from scan to scan, then you’ll be stuck in eternal scanning purgatory.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.