Scaling to hundreds of users

Hi all! I am currently using Syncthing to share a bunch of scientific data (~250 GB, 30,000 files) between just several machines. I wonder how well would this scale to, say, 100 nodes?

I.e., is there any limitation that would prevent such use? For instance, required memory, or starting time increases a lot with each added node? (Or any other issue you can think of?)

I also wonder if anyone tried pushing Syncthing to the limits, and what was the limit, if any?

Sorry if I missed any previous discussion about this, and Iā€™ll appreciate any replies.

https://data.syncthing.net/

Someone has nearly 15TB most likely over 300 devices, though that requires a lot of ram as you can see.

Well, this is both encouraging and worrying! I guess that RAM requirement is related to huge number of files or total size of data? In such case my much smaller database might by OK with much smaller RAM amount.

However, does RAM usage also grow significantly (e.g., linearly) with each added node?

Possibly, because we have to have each devices index making the growth linear.

Actually its the index database which grows linearly, but given go does its own memory management you can expect slight memory growth due to having to deal with more data between the gc invocations.

The stats in the 100% column should be read as ā€œthe heaviest user has xā€. It doesnā€™t mean that all of the rows in that column are for the same user!

1 Like

Thanks Audrius and jpjp.

So, if I understand correctly. At first the network has few nodes and each node is OK with little RAM. Then the network grows and some nodes with small amount of RAM may have to drop out. So eventually the network consists of only big badass nodes.

Thanks, I was just wondering about this. It would be nice to see the full set of stats from each of the heaviest users! (heaviest in each category)

Or only sync to a few ā€˜more importantā€™ nodes that have good connectivity.

Maybe you could try a hierachical approach? If you have e.g. 100 nodes, group them by 10, then have node 00, 10, 20, etc. share ā€œFolder Aā€ with each other, then add shared folder ā€œFolder Bā€ of the same local path (maybe needs a symlink on the filesystem, if ST doesnā€™t allow using the same path for two shared folder) and share that with the nodes x1-x9.

In short: Share the same folder twice, once for the ā€œmastersā€ of each group of ten, and once for node x1-x9 of that group

That should limit the RAM usage to 1/10th on all ā€œslaveā€ nodes, and still only 1/5th on the ā€œmasterā€ nodes (which may need to be a bit beefier then, RAM-wise) - and they would still all be syncing with each other.

Well, thatā€™s a theory at least, i have absolutely no idea if it would work in real life. :wink:

@Kirr now that five years have passed, do you have any more experience sharing large scientific data sets via Syncthing? I recently began working with astrophysics and cosmology communities who frequently synchronize large and dynamic data sets, and I have been looking for an excuse to experiment with Syncthing for some of their use cases. Another technology Iā€™ve been thinking about is Hypercore.