infrequent & irregular disconnected devices

I’m having a recurring issue, well - more of a naggle than an issue. 11 devices have a send-only relationship with 1 “central” device. They backup a folder to a NAS, across WAN.

The NAS is Openmediavault with the Syncthing plugin, the 11 remote devices are Windows boxes.

This setup has been absolutely rock solid for nearly a year, digesting 1.5 Million files, with versioning. :+1:Great job, devs !!:+1:

Infrequently, and irregularly, a random device will show as Disconnected in the NAS’s webif, and the NAS will also show as Disconnected on the client side. That device will remain Disconnected untill I either restart Syncthing on the device, on the NAS, or sometimes i need to do both to get the connection back up.

The question is not why does a disconnect happen, as that will most likely be a connectivity issue (ISP disconnect…). The naggle is, why does ST not reconnect, in some rare cases ? What I would like to do at this point, is to make sure I’m gathering the right log data to troubleshoot this further. Which STTRACE should i be running, if any? On which device(s)?

The general log without any debug options should reasons for disconnects. You can enable STTRACE=connections for more verbose output.

This is what happened at the time the “client” was last seen by the central Syncthing Host: (logged at client side)

[JDF5B] 08:21:24 INFO: Connection to JKZ4TOI-VTZPV4T-7FBFGD4-BZ2KPWW-3XCSH23-YNCRCTO-N44WZIZ-DFM6BQ4 at 10.4.7.11:49349-83.166.144.57:22067/relay-server closed: read timeout
[JDF5B] 10:32:39 INFO: Restarting

On the NAS side, turns out I had sttrace=connection active, but all it shows is a reconnect-loop untill after i did the Restart client-side , as seen in the snippet above. But here is the cat syslog | grep JDF5B > JDF5B.log : https://www.hastebin.com/upavavijef.sql

(The timestamps are within 5 secs between those 2 machines)

So it seems it’s failing to connect directly and tries to connect over a relay. You should probably try to setup port forwarding on the routers for better connectivity.

From the log, it seems to try to connect over a relay at some point, but it takes a while.

Understood that giving each host a pinpoint accurate target to aim its connection attempts at would be ideal to resolve connectivity issues.

Unfortunately, the best I can do is providing a fixed ipwan address, port forwarding is something i would like to avoid in this particular setup.

Would providing the fixed ipwan of the central point to all clients help with this, you think ?

I am not even sure what ipwan is

sorry, an IP on the WAN side…

Having an IP is not enough, for A and B to establish a direct connection, one of them has to be available on the internet (without a NAT) or have a port forwarded to get through the NAT.

If you don’t have that, then you are at mercy of relay connections, which can break, not work, etc.

yes i know… and i feared that supplying just an IP instead of the “dynamic” parameter wouldn’t cut it.

Ah well, I’ll just keep en eye on the receiving host device, where the webif is showing me a pretty overview of all the devices’ connection status.

Thanks for your input, @AudriusButkevicius !!

Your log does have something suspicious, as in, it disconnects from the relay and then takes a long while to decide to try and connect it via again, but looking at the code I can’t understand why this would happen. I guess your log is filtered, so it’s missing some messages that explain that.

Hummmm, i just re-read yr post :

but that’s the issue, it never does reconnect to the central host…??

Your log is filtered, so the reason why it doesn’t reconnect immediately are not visible in the log.

we are both coming to the same conclusion at the same time :stuck_out_tongue: re filtering :

Feb 19 10:48:37 backup-nas syncthing[888]: [JKZ4T] DEBUG: connected to JDF5BY7-BSGD3RV-VT26EB2-XC53DF3-YGIVJRY-UYUSZCU-MNTFTMD-UJNMIAH 200 using 10.5.1.4:43794-213.239.205.247:22067/relay-client 200
Feb 19 10:48:37 backup-nas syncthing[888]: [JKZ4T] DEBUG: discarding 0 connections while connecting to JDF5BY7-BSGD3RV-VT26EB2-XC53DF3-YGIVJRY-UYUSZCU-MNTFTMD-UJNMIAH 200
Feb 19 10:48:37 backup-nas syncthing[888]: [JKZ4T] INFO: Established secure connection to JDF5BY7-BSGD3RV-VT26EB2-XC53DF3-YGIVJRY-UYUSZCU-MNTFTMD-UJNMIAH at 10.5.1.4:43794-213.239.205.247:22067/relay-client (TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305)
Feb 19 10:48:37 backup-nas syncthing[888]: [JKZ4T] INFO: Device JDF5BY7-BSGD3RV-VT26EB2-XC53DF3-YGIVJRY-UYUSZCU-MNTFTMD-UJNMIAH client is "syncthing v0.14.44" named "KATE-SERVER-ANTW" at 10.5.1.4:43794-213.239.205.247:22067/relay-client

It does reconnect after a while

only AFTER i restart ST. if i don’t, it doesn’t.

Restart which side, JKZ4T or JDF5B?

Anyways, I can’t tell you much either way, as you only provided logs from one side, and the logs are also filtered, not giving the full picture.

If a relay connection breaks, we only redail once every 10 minutes. You can adjust that by setting relayReconnectIntervalM to something like 1.

restarting either side does the trick, in this particular case i restarted JDF5B, I showed the log containing the restart in 2nd post.

I don’t mind posting full logs, just didn’t want to overwhelm since there’s a lot of info in there unrelated to JDF5B.

If reconnect doesn’t happen after a couple of hours, would decreasing the relayReconnectIntervalM help ??

here’s the full log from JDF5B : https://www.hastebin.com/utihoyotis.swift line 135 shows the restart. Unfortunately STTRACE wasn’t active yet.

No.

Anyways, you should run both sides with STTRACE=connections and provide full logs, otherwise it’s a pointless witch hunt. We’re big boys and can filter out the stuff we need ourselves.