Tried to Sync huge directories tree, (100Mio files)

Hi,

i’m new here. I really like syncthing until the point i try to sync some huge directories trees. Than the sync scan and will never end, i waited for 2 week without success. During the sync scan i looked to the leveldb performance and this seams the root cause of this slow performance.

Did anybody tried this before? And have some hints where to look for optimisation

Or is there a different idea around to make this working. I would like to dig into it.

This questions should help a syncthing newby to get a direction.

thx in advance

meno

That’s a lot of files, and a long way outside the usual usage profile. My guess is that you might be the first, and that there are a lot of optimizations in the opposite direction of your use case.

There are people that have 30m files according to https://data.syncthing.net, but yes, overhead per file is large, so 100m files might take ages. You should make sure the db is on an ssd, and potentially the files themselves. Also, there is scan progress indicators which you might want to disable with that many files in advanced settings which might speed it up, as building a list of files to indicate progress is a lot of work in this case.

There might be some nonlinear scaling going on somewhere. In addition to what Audrius says I’ve done a few tests with 5-10 million files, and scans (initial, with hashing) complete in a few hours on my rather standard iMac. But having the database on fast storage is quite essential I’m sure.

@fastandfearless Profile, and we might be able to offer hints beyond the obvious.

2 Likes

I tried to sync over a milion of small files on my Asus Eee PC. Syncthing cannot handle it. Mybe ST should merge small files in some virtual “superfiles”?

Please explain what you mean by this. Does Syncthing run out of memory? Does it crash? Or is it just slow? Also please post specs of your hardware.

Scanning never ends. ASUS Eee PC 1021. Intel Atom Z520, 2GB RAM, Hdd.

Yeah… So that’s a shitload of files on very underpowered hardware.

And definitely make sure you don’t run with the default 60s rescan interval, or it will indeed spend 24/7 scanning as there is no chance file metadata and database fit in the cache.

as suggested, please set the rescan interval to a longer period, e.g. 1209600 (= 14 days). Also make sure you have the file watcher disabled and then start the scan. Please wait for the initial scan to finish before you lower the settings.

I run 3 million files/folders on an Athlon 240e and 2 GB RAM without any problems.