How does syncthing choose a relay?

I now think that this issue is mostly moot, as you correctly pointed out, the reason it picked a latient relay is probably not because the device decided that’s whats best for it, but because the remote device decided that.

As it was suggested on the other ticket, before we make any changes, I’d like some more concrete evidence of whatever we do actually helping.

About the base latency argument of @calmh: It is true that you skip the smaller buckets for short latencies if you have a base latency penalty on a line you use. Some really exotic devices get high penalties (ike satllite links), but all the rest should well stay below 40ms. We see here depending on the provider 1ms to 6ms on fibre, 8-12ms on coax or copper wire, 20ms on mobile and for dial-up I don’t remember… Add maybe 3 to 5ms for Home-WiFi. So I think 100ms is quite a corner case.

Maybe it is needed to take a base latency penalty into account. But for sure this must be measured on the first hop. Why not subract it? If that penalty really exists and is relevant enough, this is probably a good improvement to the squarerooting bucket sizes idea.

To the geolocating approach: Since syncthing always uses the net, all packets travel over the wire and there time-distance is measured by latency probes. So this gives exactly what your looking for, if you want “closest to the client”. Geolocating is not very accurate and doesn’t take the topology of the net into account.

@AudriusButkevicius

Only in one case, if there is just one relay close and all others far away, leads to the situation you describe with the square root approach. But this problem exists with the 50ms approach as well: When the distance between the nearest and all other relays exceeds 50ms, you get the same situation. Because only two buckets are smaller than 50ms in the suggested change and all other buckets are bigger, it might even happen more often with 50ms buckets to encounter the problem you describe.

You could solve this by having more relays in a better distribution (out of scope unfortunately), by always taking at least two or more relays (goes against the bucket idea) or measuring throughput. I also could imagine to have an ongoing rating of all relays, which are taken into account when choosing one. This will need a lot more coding, than changing only one line.

Staying with the simple bucket approach, square rooted or not, this problem won’t be solved and is there for both. My intention was to propose an improvement of the existing code which doesn’t introduce lines and lines of code, also because I wanted to get it happen. A bigger change needs much more time, but to who I’m telling this anyway – you know this already…:slight_smile:

Right, and I am saying that just switching to sqrt will make it worse it certain scenarios, so sounds like it’s a no go anyway?

I was thinking of some tests to collect data to show why I assume bucket sizes should grow with latency. Maybe I get convinced of something else too, but I don’t believe my suggestion is an improvement for everyone… :slight_smile:

In geography they say that social interaction decreases somewhat with distance. Since syncthing is file sharing between people that tend to interact otherwise too (why else should they share files?), one could assume that local connections are used more often. Can you guys see such correlation within the statistical data of syncthing?

The idea would be: If it is true that local connections are used much more, a latency based relay choice which prefers close (when available, very close) relays is right.

I would like to sample all relays from different location having a distinct profile and compare the resulting buckets filled with either method. Unfortunately I’m missing access to some regions… Does any one of you has access to a box in South America, Africa, South-East Asia or eastern Russia to run a bash one liner? If not nevertheless, I can start in Central Europe, Japan and North America.

Thank you guys for coming here to discuss the issue.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.