Scaling to hundreds of users

Kirr · July 2, 2015, 4:55am

Hi all! I am currently using Syncthing to share a bunch of scientific data (~250 GB, 30,000 files) between just several machines. I wonder how well would this scale to, say, 100 nodes?

I.e., is there any limitation that would prevent such use? For instance, required memory, or starting time increases a lot with each added node? (Or any other issue you can think of?)

I also wonder if anyone tried pushing Syncthing to the limits, and what was the limit, if any?

Sorry if I missed any previous discussion about this, and I’ll appreciate any replies.

AudriusButkevicius · July 2, 2015, 6:40am

https://data.syncthing.net/

Someone has nearly 15TB most likely over 300 devices, though that requires a lot of ram as you can see.

Kirr · July 2, 2015, 8:01am

Well, this is both encouraging and worrying! I guess that RAM requirement is related to huge number of files or total size of data? In such case my much smaller database might by OK with much smaller RAM amount.

However, does RAM usage also grow significantly (e.g., linearly) with each added node?

AudriusButkevicius · July 2, 2015, 8:14am

Possibly, because we have to have each devices index making the growth linear.

AudriusButkevicius · July 2, 2015, 8:21am

Actually its the index database which grows linearly, but given go does its own memory management you can expect slight memory growth due to having to deal with more data between the gc invocations.

jpjp · July 2, 2015, 11:34am

The stats in the 100% column should be read as “the heaviest user has x”. It doesn’t mean that all of the rows in that column are for the same user!

Kirr · July 3, 2015, 1:17am

Thanks Audrius and jpjp.

So, if I understand correctly. At first the network has few nodes and each node is OK with little RAM. Then the network grows and some nodes with small amount of RAM may have to drop out. So eventually the network consists of only big badass nodes.

Thanks, I was just wondering about this. It would be nice to see the full set of stats from each of the heaviest users! (heaviest in each category)

Zillode · July 3, 2015, 3:36pm

Or only sync to a few ‘more important’ nodes that have good connectivity.

jangrewe · July 4, 2015, 1:03pm

Maybe you could try a hierachical approach? If you have e.g. 100 nodes, group them by 10, then have node 00, 10, 20, etc. share “Folder A” with each other, then add shared folder “Folder B” of the same local path (maybe needs a symlink on the filesystem, if ST doesn’t allow using the same path for two shared folder) and share that with the nodes x1-x9.

In short: Share the same folder twice, once for the “masters” of each group of ten, and once for node x1-x9 of that group

That should limit the RAM usage to 1/10th on all “slave” nodes, and still only 1/5th on the “master” nodes (which may need to be a bit beefier then, RAM-wise) - and they would still all be syncing with each other.

Well, that’s a theory at least, i have absolutely no idea if it would work in real life.

manning-ncsa · March 18, 2021, 2:03pm

@Kirr now that five years have passed, do you have any more experience sharing large scientific data sets via Syncthing? I recently began working with astrophysics and cosmology communities who frequently synchronize large and dynamic data sets, and I have been looking for an excuse to experiment with Syncthing for some of their use cases. Another technology I’ve been thinking about is Hypercore.