Long connection period for remote devices with 1.16.0-rc.2

1.16.0-rc.2 on two Linux laptops works as expected on local wlan, but if one laptop is trying to connect remotely it only connects after about 20-30 minutes and throughput is slow, at 15-20kps, normally 1-2mbs over VDSL. (Mobile data connection is similar.)

Revert both laptops to 1.15.1 and connection time is around 30 seconds using the same remote location and setup. Speed is also as expected.

Upgrade to 1.16.0-rc.1 and all is good again. The issues seems to be with rc.2, possibly #7551.

Anyone able to reproduce this effect?

#7551 - QUIC: failed to determine receive buffer size: doesn't have a SyscallConn · Issue #7551 · syncthing/syncthing · GitHub is about QUIC connections and possibly affects the transfer speed - not connection times.

There’s no relevant change between 1.16.0-rc.1 and 1.16.0-rc.2 (there’s hardly any change at all, except a fix for a new web UI bug).

Just to be sure: You described downgrading and upgrading back to rc.1, not having problems. Have you also tried rc.2 again after downgrading and does the problem still occur?

If yes, what type of connection do you get on rc.2 with the problems? (Sharing a screenshot with the affected remote device expanded contains the relevant info).

Which kind of connection is established? TCP, QUIC or via relay?

@bt90 I think it’s via quic. I’m trying to confirm. Using 1.15.1 on both sides is with TCP, but I think the delayed, slowed connection is quic if one side is rc.2. I have no access to the remote device overnight. More testing needed.

Thanks

Improve UDP hole punching · Issue #7471 · syncthing/syncthing · GitHub would explain the long delay for the connection to be established.

That’s not a change in a recent RC version though, right?

The connection delay for QUIC is nothing new. Just pointing out that it’s already a tracked issue with an explanation why this might happen if stateful firewalls or certain NAT types are involved.

This discussion though is specifically “it’s slow to connect in 1.16.0-rc.2, but fine in 1.15.1/1.16.0-rc.1”.

The current state of our UDP hole punching is that it works but the delay is a bit of a gamble. Without further knowledge how often OP tested this, the observed connection delay could just be bad luck and not related to the underlying performance degradation.

I seem to be unable to reproduce the long delay. Both sides, local and remote, are now on 1.16.0-rc.2 and are now connecting reliably and quickly (~30 secs). Although each time I test, the remote will only use relay-client TCP (IVP4).

The long delay connecting, reported in my first post, was definitely quic-server (IVP6) on remote laptop, using both VDSL and mobile data. For interest, I tested with wireguard VPN (Mullvad) on and off with no difference.

Can I force a QUIC connection instead of TCP for further tests?

You can disable relaying to get TCP+QUIC, and then debug why TCP doesn’t connect (port forwarding, firewalls, etc). As noted it’s not totally unexpected that QUIC can take a while to get a connection in some NAT scenarios, but while you can troubleshoot that you’d be much better served by fixing things so you can get a direct TCP connection running instead.

1 Like

It would still be interesting if QUIC is the culprit here. I expect it to be a little bit slower than TCP but not that much.

There are now two laptops (A and B) and a RPi3 in the mix, all running 1.16.0-rc3. All have relaying disabled. All in a mesh setup.

The Pi and A are connected via WLAN with local IPV4 addresses (192.168.etc). The Pi quickly connected to B via IVP6 addresses in a QUIC-client/server combination. B is using mobile data to simulate ‘remote’ WAN.

It took more than an hour for A and B to finally connect via IVP6 QUIC. (Edit: the connection has now dropped between A and B, while the Pi - B combination seems stable.)

Firewalls (UFW) allow Syncthing traffic as mentioned in the docs.

If you don’t explicitly create firewall rules, a successful connection in this scenario is only possible using UDP hole punching.

Could you modify the option reconnectionIntervalS as mentioned in #7471 and set it to a lower value like e.g 20?

@bt90 Intervals set to 20secs on all three devices. Relaying disable. Same setup as in #12. Uptime now 30mins with no connections.

EDIT: UFW now disabled for test.

Did the connection drop at the same time as the other was established? I guess Syncthing listens on and the connection is established at whatever port is “punched open”. Once a conn is established, that port is occupied and for a second conn it would need to listen/punch a different port. Which it doesn’t, as it knows nothing about established conns.

Also this is all quite interesting to experiment with, but the practical problem (reliable connection) could be solved

The connection being dropped could be caused by the firewall if not enough packets were sent for it to be considered ESTABLISHED.

I can’t be sure, but yes very close.

I have had reliable connections for years but I’m now suspecting through relays. How do I prioritise TCP connections as @calmh suggests?

Thanks your help.

BTW, 20mins without UFW and still connections.

@rustycanb you’d have to configure your firewall as described in Firewall Setup — Syncthing v1 documentation

1 Like

What @bt90 plus to prevent a misunderstanding: Firewall is often interpreted as the program running on your computer, while the main thing to configure is the router/NAT (port forwarding).