Balancing the load in a cluster

rms · December 1, 2020, 5:30pm

I’m wanting to use Syncthing to distribute data to a group of up to 40 devices on a private network. One device, the publisher, shares one folder with the other devices (subscribers) and files are dropped here to get distributed to the others.

In my testing with a small cluster of 3 devices, it’s been working very well. But I want to make sure that I’m properly configuring the cluster so that it doesn’t put too much of a load on the device that is acting as the publisher.

For instance, I have three devices; A, B, C. Device A is the publisher sharing a folder in Send Only mode with devices B and C. Devices B and C are the subscribers sharing the folder from device A in Receive Only mode. Given that Devices B and C aren’t aware of each other, at least based on the configuration in Syncthing, will they still communicate with each other as part of the cluster and share blocks when necessary to become synchronized? Or will B and C only rely on A for downloading blocks? If it’s the later case, then this might not scale very well if I have a lot more devices in the cluster where they are all relying solely on device A.

Or in order to allow this sharing within the cluster, do I have to add device C to B and B to C and specify that each should share the folder with A and its counterpart?

From what I understand, the default mode when synchronizing a folder is to download blocks in a random order. I would think this is done so each device is starting from a different subset of the blocks which promotes sharing within the cluster. This in turn should help balance network traffic so it’s not all directed at the device that is acting as the publisher. Do I have that right?

I also noticed that by default the download/upload rate limiting is only applicable to internet traffic. If I’m sharing on a private network with up to 40 devices do I need to consider enabling the advanced option “Limit Bandwidth In Lan” so I’m not overwhelming the publisher device and/or saturating the network bandwidth?

Thanks, Jim

AudriusButkevicius · December 2, 2020, 12:06am

These sort of questions have been asked before on the forum, so I suggest you look for prior posts.

If the folder between B and C is not shared, then there is no data exchanged between them.

At the start everyone will hit A anyway, as that is the only device with the data.

There will be no waiting for others to get some data or something, so you’ll have a thundering herd problem regardless.

If A starts struggling or as data becomes available elsewhere peers will probably start getting data from each other.

But everyone connects to everyone has an overhead of everyone having to maintain a database about everyone else, so it becomes a n^2 problem.

You are probably better off with some sort of btree/snowflake/hub and spokes graph, where every child of a parent are interconnected, but you have multiple generations/levels to avoid everyone connecting to everyone and having to maintain a database. This reduces the load but introduces propagation delay, as if I am not mistaken, children will not be able to even start downloading the file until the parent has fully finished.

rms · December 2, 2020, 7:01pm

Thanks for your response. I did some searching before posting this question but due to my inexperience with Syncthing I apparently didn’t search on the right terms.

I hadn’t considered the amount of overhead that would be added to each device by being connected to every other device in the cluster so thank you for pointing that out.

I like the idea of a parent-child type topology but my concern is that if a parent goes down then all children would never receive the blocks. This would then require some additional effort to monitor this type of incident and re-balance the tree.

What about having small groups of say 3 devices (subscribers) that are connected within Syncthing and all are connected to the publisher device? An orchestrator would manage the publisher and subscribers by only allowing one of these groups at a time to be running while the others are paused. When a group finishes, it would be paused and then another group becomes active and so forth. Do you think that could be efficient?

imsodin · December 2, 2020, 7:56pm

If we are talking 40 devices as mentioned in the OP I wouldn’t worry about inefficiencies of a fully connected topology.

Just let them have more than one parent

calmh · December 2, 2020, 8:00pm

I’m still mentally working on a design where you set a low- and high water mark for number of established connections. You could then configure your devices full mesh and say you want each device to keep between (say) 10 and 20 connections. They’d connect out to fulfil the low water mark, then accept incoming connections up to the high water mark. Desired end result should be a decently connected sparse mesh. Need to make some calculations on the probability of ending up with partitioned parts of the cluster, and figure out how to handle the scenario when you have a thousand clients that should be “balanced” onto a handful of top level devices… but it should be doable.

rms · December 2, 2020, 11:49pm

Interesting, well then I have one location where I can give this a trial run which consists of 18 devices on the network. I’ll see how everything holds up.

Multiple parents – I like it!

system · January 1, 2021, 11:49pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.