Good day everyone.
I have a situation where I can maintain a solid QUIC connection over a 9000 mile link for about 36 hours… Then the devices disconnect and no matter how long I wait, they never reconnect.
If I then change the sync protocol listen address and specify a different QUIC listen port, the connection comes back 3 minutes later and is maintained for a day and a half…
quic://0.0.0.0:28272, tcp://0.0.0.0:22000
I have another machine also 8500 miles away that maintains a QUIC connection with this same machine forever… No issues.
I suspect the bad machine (at the far end) has something weird going on with it’s NAT (edge?) device where it’s eventually closing the holepunched hole. Maybe I can find a way to change the listen address automatically. Worst case I lose connections for a few minutes every night… “Default” doesn’t help me because that’s basically the same as fixing the number at 22000 and it will drop again after 36 hours… Is that right?
Anyone have any ideas? FYI I have no control over the remote firewall. The computer there is mine, but the network it’s on is not mine…
Do you control one side firewall and can open a port there?
Otherwise, I’m not sure. The concept of a maximum connection lifetime exists in some NAT things, for sure, but what constitutes a “connection” in UDP land isn’t obvious. Maybe, after deciding that a connection should be timed out, it needs to not see packets for a while to forget about it. If you can figure out what that while is, maybe increasing the connection interval beyond it – to something like an hour or two – will get you a connection again after that long a time.
Another thing you could try is to set the listen address to quic://0.0.0.0:0. That zero in the port number should cause the system to allocate us a random port. Then schedule a nightly restart to get a new port.
As a last option, perhaps using a relay on neutral territory that both sides can connect to.
Thanks very much for this. We do control one side and I have a meeting with the IT security team about forwarding a port this afternoon. im curious to try 0.0.0.0:0 and see how that works. It’s exactly what I was looking for. Of course if IT agrees to open the port then we are set. Appreciate the help very much.
So I tried this and I got an interesting result. It seems the port number is changing very often. Every few minutes the discovery server is announcing a different port number. 15 minutes after making this change and restarting, I have 40 entries in the discovery list on the remote host. (20 with the local IP and 20 with the IP outside the NAT). So this doesn’t really work.
I created a listen on string with multiple quic addresses and it seems to take it. and from the logs, there is activity on multiple ports, but the connections haven’t been reestablished yet, and I’m trying not to stop and restart the remotes manually as I want to see if they find a way to reconnect on their own. My new connect string is something like this:
FYI This strategy seems to be working well.
Syncthing opens multiple ports, and it seems the remote system is able to connect with one of the ports when the other port has become “stale”. I suspect two ports is enough. But I’m running 4 now as shown above and I’m gonna leave it like that for a while.
I’m not sure why this happens. What type of NAT is stun reporting?
Stun should run every 16 seconds or so, and effectively utilise that port hence it shouldn’t change on the outside. The fact it does, feels like something is going wrong.