(Not so efficient?) Sync algorithm

bilz · April 14, 2023, 10:18pm

Hello,

First, thanks a lot for maintaining the software for such a long time

I have noticed the following behavior a long time ago, but I wanted to check if there was some ongoing development to improve it.

Let’s consider 3 machines, all connected together and a folder shared between them. The download order is random. We have:

desktop
server1
server2

The desktop has a limited bandwidth, while both servers have a 100 Mbps connection.

Create a bunch of files on the desktop, e.g. 1000 files at 10 Mo each. After folder analysis, it seems we have the following:

server1 <-- desktop --> server2

It means that the desktop sends to both server1 and server2, but the servers do not communicate one with another. This is not efficient, since we do not take advantage of the large bandwidth between the servers. Although both servers are downloading different files, they do not share them unless the desktop is turned off

As a workaround, I can change the folder sharing configuration so that the desktop only shares the folder with server1. It gives:

desktop --> server1 --> server2

In this architecture, the limited desktop bandwidth is not split in 2, and we take advantage of the bandwidth between the servers. But that’s not great… Ideally, we should have something like Resilio / BitTorrent Sync does:

server1 <-- desktop --> server2
   ^                       ^
   |_______________________|

The 3 machines should share their file list on a regular basis and take advantage of multiple sources. Indeed, the example given is with 3 machines, but you can easily imagine what happens if the changes must be propagated to tens of machines.

Is this correct? If so, I guess you are already aware of the limitation. Is there any ongoing development to improve the feature?

Thanks a lot

calmh · April 15, 2023, 5:03am

The servers do communicate with each other if you tell them to, by pairing them, no? In Syncthing the topology is up to you to define, using your knowledge of the underlying performance and constraints.

bilz · April 15, 2023, 9:06am

Yes, the servers do communicate with each other. As a concrete example, this is what happened yesterday:

update the desktop with hundreds files ranging from 2-10 Mo each, all different but in a single shared folder.
the desktop starts sending files randomly to both servers, at ~600 ko/s on each connection. It means that different files are completed on the 2 servers after a few minutes.
the servers, albeit being connected together, do not send files to each other
after ~30 minutes, I stopped the desktop => suddenly, the servers started exchanging the files they received at max speed (~100 Mbps)

My feeling is that if I hadn’t stopped the desktop, it would have sent all the files twice (one time for each server).

To workaround this behavior, I disconnected the desktop and server2. This way, the desktop sends to server1, which sends to server2 right away.

To put it differently, it seems that Syncthing can receive files for a given folder from one source at a time. Does that make sense?

Nummer378 · April 15, 2023, 9:42am

This sounds very similar to New files are not shared between downloading devices · Issue #8208 · syncthing/syncthing · GitHub.

Syncthing is capable of pulling from multiple devices simultaneously, but only if it knows that a given file block is available somewhere.

I suspect that there is a bug somewhere in syncthing’s code, where newly announced files/blocks do not get announced between downloading devices. I haven’t been able to narrow down the root cause of this though.

calmh · April 15, 2023, 11:12am

Similar but not same. I agree that in your described case (one single large file) there is something not necessarily working exactly as intended, and even the intended design is far from optimal. I started reproducing/working on it at one point, but it took longer than I had available and then it went to the side…

The case here is for lots of comparatively small files, which should be handled more efficiently (though not “optimally”, as Syncthing doesn’t take the entire distribution of data into account, just working file by file). If it isn’t, that’s a bug – a different one compared to your case. Maybe we’re holding – and working from – an old database snapshot for too long, or something.