How does syncthing choose a relay?

I’m in Hungary, yet after starting synthing three times, it chose three different Czech relays, while there are closer (both geographically and network wise) ones according to the relay list. Why is that? How does syncthing choose a relay server?

It checks the latency of relays and puts them in 50ms buckets, trying the lowest-latency ones first. Which relay is chosen within those buckets is random.

Thanks! So it probes all relays during startup. Is it possible to view this information?

BTW, for one of the czech servers ping looks like this:

--- 128.0.191.10 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9012ms
rtt min/avg/max/mdev = 21.264/23.800/25.014/1.094 ms

While for the two hungarian servers:

--- 195.228.252.133 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9010ms
rtt min/avg/max/mdev = 5.210/6.898/13.202/2.267 ms

--- 193.227.196.10 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9012ms
rtt min/avg/max/mdev = 11.333/13.028/15.285/1.232 ms

All are below 50 ms, so they will be in the same bucket and the random selection means the countries with the most servers will win (here the Czech, because it overweights all neighboring countries in the number of available relays and they are close network-wise). In understand the rationale behind the random selection in the buckets, it’s a wise choice (doesn’t need coordination and gives good balancing).

But 50 ms basically means whole Europe even from the eastern side (considering that most of the servers are in western Europe), so the farthest servers will overweigh the closest ones.

50 ms draws approximately a 5-8000 km radius on good networks, which means I will mostly communicate over international lines instead of local ones.

Could you please consider a smaller bucket size? Like 20 ms. I guess that could be a good tradeoff between too small buckets and the current long distance transfers.

Are you suggesting that crossing national borders is the real problem, rather than the not-quite optimal choice of servers?

Because there can easily be cases where the best servers are, in fact, across borders, even if that’s not true for you.

Well, it depends on whose problem is it. For the network operators, if the syncthing traffic would be a big chunk of the whole, the national/international thing would be the real problem.

For the end user (me), the real metric is the througput (optimal server selection). I thought it’s clear that I’m talking about that, sorry. :slight_smile:

To illustrate this, I’ve written a small script, which sends and receives data through syncthing’s testutil (which acts as a bidirectional pipe and can be fed/consumed from stdin/stdout, so seem to be fine for these kind of tests).

So I took my syncthing log from today and tested all the relays it has used. All speeds are MiBps (Megabytes per second) and I’m on a symmetric Gb residential connection. I placed the two hungarian (domestic) servers to the front, but please note that these were not selected by syncthing:

country  | relay           | top speed | avg speed
---------+-----------------+-----------+----------
HU       | 195.228.252.133 | 29.24     | 27.26
HU       | 193.227.196.10  | 26.17     | 11.38 (session-rate=10)
UA       | 45.137.155.56   | 5.20      | 3.07
CZ       | 128.0.191.10    | 2.45      | 1.50
CZ       | 95.82.169.36    | 0.44      | 0.20
DE       | 85.214.100.39   | 8.07      | 4.44 (global-rate=8.34, session-rate=3.81)
CZ       | 185.8.166.21    | 0.31      | 0.13
NL       | 51.15.105.255   | 14.30     | 11.55 (global-rate=30)
DE       | 142.93.102.4    | 29.77     | 23.85

So syncthing has selected a relay 7 times (I’ve restarted it intentionally) and nearly all of them were hugely (one or two(!) orders of magnitude) slower than the domestic ones.

Well, we could call that not-quite optimal, yes. :frowning:

Relays is a measure of last resort. You should try to fix the reason why you can’t connect directly, not how the relay is chosen. I might start up a relay with 10kb/s limit next door to you and you’d still have the same problem, so the solution doesn’t make much sense.

2 Likes

Would the solution for this user be to use a direct connection using VPN or SSH?

It’s up to the user to decide what the solution is, I don’t think dropping latency to 25ms would fix this, as it could still pick a bad relay at 25ms.

The user can force syncthing to connect to a specific relay, if you have a strong preference, how to do that is explained in the docs.

Well, with the arbitrary limits in place, in the current scheme you are right. Using relays without any central coordination/balancing is a roll with the dice.

I still think decreasing the radius may give better results if relays tend to be well performing (like in this case, even the limited hungarian relay is mostly better than the chosen ones), but I don’t have any data which could support this of course…

Direct connections don’t work here (double NATs, corporate firewalls etc).

So this isn’t in focus, I understood. Thanks for answering and for the great product!

1 Like

In theory Syncthing could use several relays simultaneously, say up to 5, and periodically remove the worst of them to try a new one.

Sure, and then you have to implement tcp in software to stitch multiple different data streams with different latencies to have guaranteed in order delivery, handle failure etc. It’s much harder said than done.

Again, relays are last resort to get something connected, there are no guarantees around their performance.

As Syncthing only makes one connection to a device, it cannot use several relay servers simultaneously.