How is the performance with a lot of files?

fabian · October 10, 2014, 8:27am

I have a server with several terabytes of data, somewhere around 10 million files. I’m wondering if Syncthing would work well on such a scale, especially as we have a lot of very small files, and if anyone has experience with an installation of similar or larger size.

The files would be transferred via LAN, and no user would actually sync all of the directories. Each user would only sync their own folder to their computer.

Is Syncthing viable for this case?

AudriusButkevicius · October 10, 2014, 9:03am

I guess you’d have to break it down to many smaller folders (aka repos in old syncthing terms) to keep the index size reasonable in order to reduce the amount of information exchanged.

Furthermore, you might benefit from using the external inotify extension, and having repo rescan disabled or set to something crazy large.

But I guess @calmh might have a better insight on this.

jpjp · October 10, 2014, 3:36pm

Similar question: Large repositories -- general questions and some issues

calmh · October 10, 2014, 6:53pm

This means splitting it into folders (in the syncthing sense) per user directory, as that’s the level things are synced at - you can’t really sync a part of a folder.

For the individual users this should be perfectly fine. The central point will carry a heavier load. I’d speculate that very few changes would originate on the central server, as opposed as from the clients, so the rescan interval on the server should be set to something high - 24 hours or so. A rescan means walking the full file tree and inspecting modification times, which is not a trivial thing for 10 million files.

It’ll require hundreds of megs of RAM for this for sure, although I’m not sure how much exactly. I’m curious, though.

jottorice · March 4, 2015, 5:38pm

Hi Jakob, I’m new to SyncThing, so pardon me if this is a newbie question. Can SyncThing take advantage of filesystem change notification APIs, to avoid the exhaustive rescanning which you mentioned? In particular, on OS X, there’s a File Systems Events API (“FSEvents”) which can notify programs of file-system changes, at the directory or file level: https://developer.apple.com/library/mac/documentation/Darwin/Reference/FSEvents_Ref/

Apparently, using this API is not without its challenges (see here, for example: http://stackoverflow.com/questions/8314348/cocoa-fsevents-kfseventstreamcreateflagfileevents-flag-and-renamed-events), but it would seem to be just what SyncThing would need, no? Of course, maybe the question then shifts to “How well does the FSEvents API operate on multi-Terabyte directories?” But I’d presume that it’d still be a win, compared to exhaustive rescanning.

chucic · March 4, 2015, 6:10pm

I’m not Jakob (sorry) but I think you are looking for this https://github.com/syncthing/syncthing-inotify ? But I’m not using it, so you have to wait for someone who knows more about it, and there are some issues with OSX, see troubleshooting at the and of page i linked.

EDIT: I think this is also what AudriusButkevicius meant by “Furthermore, you might benefit from using the external inotify extension, and having repo rescan disabled or set to something crazy large.”

jottorice · March 4, 2015, 10:21pm

Thanks, that clarifies things greatly. I’ll keep this in mind when I test things out.

kamborio · March 4, 2015, 10:36pm

I have been testing for 24 hours and I have been a bit naughty because my first test is quite the test…

In my opinion, it doesn’t seem to handle large amounts of files very well. For my first test I am trying to sync 940k files (around 137GB) between too machines over the internet (both have a 1GB pipe) and after 24 hours only 60k and 4GB have been transferred. It doesn’t seem to be a problem with thespeed between the nodes, some transfer happens at 20 or 30 Mbps then it all stops and both machines start using the CPU heavily until there is another burst of data.

Hope it helps, suggestions are welcomed.

AudriusButkevicius · March 4, 2015, 10:42pm

So 2 tips:

Don’t keep the UI open in some tab, as that requires often database access + lock on the model.
Try letting it finish scanning before you add nodes, as the two processes might be competing.

chucic · March 4, 2015, 11:50pm

Also try disable compression as it might help a little, more info here: https://github.com/syncthing/syncthing/issues/1374

idef1x · March 5, 2015, 12:20am

I am just testing syncthing as well with my 290GB photo collection and indeed with the gui open speed drops to zero. After closing it (just the receiving site was enough for me) I get a decent speed

jpjp · March 5, 2015, 10:08am

Is the speed slow after the initial sync, or during the initial sync?