It seems that Syncthing gives up communication too quickly

RogerH · March 23, 2015, 9:57pm

I have computers on a LAN: Linux, Windows 7, Windows XP. I use the old XP as a print and file server. I synchronize directories in these computers with one another and with Windows 7 computers on the internet. BitTorrent Sync (Beta) has no problems, but Syncthing has. Topology: Cabled LAN with a Cisco Linksys router, ADSL connection to the internet. A subset of the topology looks like this: Fuji(W7)—Router—Alina(XP) Router—Internet—Opti(W7) The router forwards port 22000 (TCP) and port 21025 (UDP). Often, but not always, Syncthing finds an UPnP device, albeit with a malformed UUID string, but it uses it. The Window Firewall allows Syncthing to use TCP and UDP. Syncthing in Fuji and Alina see each other and can sync. Syncthing in Fuji and Opti see each other and can sync. Syncthing in Alina and Opti cannot establish contact.

As Fuji can find Opti but Alina cannot, and the connections are equivalently set up, the problem can hardly be in Opti but should be in Alina.

Alina is a slower computer than Fuji. That may be the point, if the Syncthing protocol times out before a response has been received or processed. On Syncthing startup, nothing is said about unsuccessful connection attempts in the console window, but I can see that the Alina port that Syncthing uses to communicate with Opti’s port 22000 comes to state Sent but not to state Established. I guess that the expected response from Opti does not come in due time. As the remote IP address is known, at least some kind of basic communication must have been established.

Unfortunately, now and then also Fuji cannot find Opti, maybe for the same, never clarified, reason.

Timeout problems also occur in Fuji(W7) from time to time, although Opti is up and running, for example, "[D2ISY] 09:39:14 INFO: Connection to XYZ… closed: WSARecv tcp 192.168.1.100:22000: An existing connection was forcibly closed by the remote host.“ or ”[D2ISY] 09:43:33 INFO: Connection to XYZ… closed: WSARecv tcp 192.168.1.100:22000: An established connection was aborted by the software in your host machine.“ or "D2ISY] 19:42:41 INFO: Connection to 5HHF6XN-4IIFHDD-3DL5EJM-5EIC42T-QSBDROY-N7H3LL4-ZTT5U26-UNIYOQB closed: ping timeout”. A few minutes later the connection is reestablished. Now and then a message flashes in the GUI. It has a red bar at the top and says something about lost connection, but it disappears after l second or less, so I have never been able to read the whole text. But the connection is not lost, and there comes no message in the console window.

What I have seen makes me suspect that underlying problem is that Syncthing has one or more too short timeouts. In contrast to VoIP, file synchronization isn’t time-critical, so why not implement more patience?

When using file synchronization programs, the topology and the involved directories are basically static. The same fellow computers are expected to sync with one another in the same way every day. Sometimes, they cannot come into contact. Then its interesting to find out why they failed. The reason why they gave up should be displayed in the console window and logged, at least the first few failing attempts, so the problem can be investigated.

Based on this experience I suggest that the development team considers the possibility to

Use longer timeouts than currently (v0.10.29), maybe optionally settable by the user.
Supply diagnostic information about why connection attempts fail.

Please do not say just that I should upgrade from outdated hardware. In the world there are thousands or millions of XP computers that are potential Syncthing hosts.

AudriusButkevicius · March 23, 2015, 10:36pm

I don’t think we use a timeout at all. There is a keepalive timeout, but I feel that the kernel decides when a TCP connection has been closed. That is usually 30 seconds, which is more than enough.

Regarding Opti-Alina not connecting, you need to forward same external port as the same internal port. Given you have two devices on behind the router, it would be impossible to do that unless you’d use a different protocol port on one of the nodes.

If one side notices a forcefully closed connection, the other side always tells why (or also sees a forceful disconnection and reports that). The fact that you are seeing messages on one side and not the other implies that it’s not even connecting.

Make sure you don’t have rate limiting set too low, as that can limit the protocol so much that it eventually times out.

In regards to Opti-Fuji, UPnP has some flaws in Windows which prevents it from working sometimes. I suggest you switch UPnP off all together, and rely in manual port mappings instead, as that’s much more reliable. That could also be the reason why it sometimes fails to connect.

Another scary factor is the fact that you get red windows popping up in the GUI (I assume from a remote machine), which to me implies that the machine is under load (and unable to respond) or that you should verify if your router is doing a good job. Given you are connected via Wifi, there might be Wifi contention which causes high latency or even packet loss in a network of low quality.

RogerH · March 27, 2015, 7:12pm

Thank you for your prompt and informative answer. Since I got it I have experimented and made the computers sync with each other. The crux of the matter was that if I have more than one computer in my LAN, they must use different ports for the “Sync Protocol Listen Addresses” in the Settings, and the port forwarding in the router must be set up correspondingly (which in practice means that these computers must have static IP addresses in the LAN). NAT and port forwarding is a complex area, which most computer users are not very familiar with. I suggest that you point out the port number requirement explicitly in the documentation, and also what has to be done in the router.

Moreover, the conceptual framework is not fully clear, which often makes the information hard to understand. In the documentation and the log messages it is not evident what is meant by host, local host, remote host, computer, and node. And on which side of a router is “behind”? These terms should be defined and used consistently.

Concerning the need for port forwarding: BTSynch works fine without manual port forwarding. As BTSynch is moving in a direction where I don’t want to follow, Syncthing is a much more attractive alternative, but it is more difficult to set up. Could it be possible to modify Syncthing’s communication method so that it does not need manual setup of port forwarding in the router? I believe that simpler or automated setup would have a significant impact on Syncthing’s future popularity.

Thank you for a nice program and valuable help

Roger Hansson