I’m quite new to Syncthing and I’m trying to deploy it on a large cluster (1000 nodes).
I want to keep a folder of 10 files with a total of 100Mo synced over the nodes from a single entry point (the master). The files are refreshed each 5 minutes.
All the nodes are on the same LAN.
This cluster must be resilient to outages, i.e we can lose random machines at any time, and this shouldn’t impact the syncing process (of course this is a corner case but it’s really important to have a reliable solution).
Actually I can’t perform tests on this cluster directly as it’s a production environment.
At the beginning I tried to setup a Bittorrent swarm, and with some home made scripts, after detecting files changes create a new torrent and send it to each node.
But meanwhile, I discovered Syncthing and how all the syncing process is supported natively. I also was impressed by it’s performance and sharing speeds and P2P ratios achieved.
Then I realized that in such environment, we can’t just connect all the nodes together, as Syncthing keeps an active connection with each active node and keeps updated a local database for each device and it’s synchronization state.
On the other side, Bittorrent handles this problematic by having a tracker and sharing states using DHT.
So if I want to stick with Syncthing, the proper way is ordering my nodes in groups or in tree-like structure. This structure should be updated often in order to prevent having some nodes isolated if they lost all their connected peers.
By this time, there is no simple way to add/remove devices using the REST API. The workarounds are :
- Having all devices to know each other but all are paused except few ones on each node, so we can pause/unpause them using the API.
- Rewrite each time the whole configuration files for the impacted nodes and push them using the api and restart the service.
Both workarounds will reduce the number of active connections, the first one will use a little bit more space on the database (but normally the information of each device will be updated only if there is an active connection with it, i.e the device is unpaused). In the 2nd case, don’t know if the infos on a device are purged when it’s deleted.
What I’m afraid of, is that I will finish by having a big mess in order to keep my nodes always connected with let’s say 5 to 10 nodes, and I have to keep track of the actual structure.
I had an exchange about this on the IRC channel, and thought it’s better to share it here.
So do some people here had such experience with large clusters ? Or just what are you thinking about this use case, and if using Bittorrent is more suitable ?
Thanks a lot