infrequent & irregular disconnected devices

and here’s full syslog : syslog.txt (1.2 MB)

This was my original question. Turns out I had previously enabled STTRACE=connections on server, but not on clients. Have done so now.

We need logs from both sides ideally. I am walking out the door, and for a week won’t be next to a computer, so ping here on Sunday to remind me to look at this.

1 Like

I have more detailed logs,from both sides of another won’t-re-connect event. From Device Last Seen at 2018-02-19 18:21:49 to now (2018-02-21 12:39) the connection hasn’t been re-established, for just this one host.

I know this host has had fully functional WAN connection the whole time. After taking the logs, I have restarted ST on TXZZ, and the connection was re-established a couple of minutes after.

TXZZIYN_syncthing.log (2.5 MB) JKZ4TOI_syslog.log (6.7 MB)

Could be the deadlock I “fixed” in .45-rc.3.

any way to know for sure? Could just wait for .45 to go into stable release i guess…

Try it, I guess?

well no, i’m facing a random event, spread across 12 hosts…

$ ping -v -c 1 audriusbutkevicius.syncthing.net

1 Like

Snooze for 6 hours, as I just touched down in London after a red eye flight.

2 Likes

So it seems there is some bug where one of the dials gets always deferred, yet I can’t understand how that happens.

before we go investing any further time : should i hold out until .45 drops onto all of the machines, or do you not think it is the same bug which Jacob mentionned a few posts back ?

Relevant log lines:

Feb 19 18:23:12 backup-nas syncthing[888]: [JKZ4T] INFO: Connection to TXZZIYN-3IUWV6F-GBS3YOA-AQOPQZG-72BUFG3-5CA46F3-MQGJAAJ-N4724Q2 at 10.5.1.4:38106-158.58.170.183:22067/relay-client closed: reading length: EOF
...
Feb 19 18:26:53 backup-nas syncthing[888]: [JKZ4T] DEBUG: Reconnect loop for TXZZIYN-3IUWV6F-GBS3YOA-AQOPQZG-72BUFG3-5CA46F3-MQGJAAJ-N4724Q2 [kcp://81.82.197.78:22020 relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de tcp://81.82.197.78:22000]
Feb 19 18:26:53 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it as sleep is 24s, next dial is at 2018-02-19 18:32:08.273358548 +0100 CET m=+367148.943424076 and current time is 2018-02-19 18:25:13.321655807 +0100 CET m=+366733.991721332
Feb 19 18:26:53 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de as sleep is 24s, next dial is at 2018-02-19 18:26:13.321655807 +0100 CET m=+366793.991721332 and current time is 2018-02-19 18:25:13.321655807 +0100 CET m=+366733.991721332
...
Feb 19 18:29:57 backup-nas syncthing[888]: [JKZ4T] DEBUG: Reconnect loop for TXZZIYN-3IUWV6F-GBS3YOA-AQOPQZG-72BUFG3-5CA46F3-MQGJAAJ-N4724Q2 [kcp://81.82.197.78:22020 relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de tcp://81.82.197.78:22000]
Feb 19 18:29:57 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it as sleep is 24s, next dial is at 2018-02-19 18:32:08.273358548 +0100 CET m=+367148.943424076 and current time is 2018-02-19 18:28:17.381528428 +0100 CET m=+366918.051593950
Feb 19 18:29:57 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de as sleep is 24s, next dial is at 2018-02-19 18:29:17.381528428 +0100 CET m=+366978.051593950 and current time is 2018-02-19 18:28:17.381528428 +0100 CET m=+366918.051593950
...
Feb 19 18:33:02 backup-nas syncthing[888]: [JKZ4T] DEBUG: Reconnect loop for TXZZIYN-3IUWV6F-GBS3YOA-AQOPQZG-72BUFG3-5CA46F3-MQGJAAJ-N4724Q2 [kcp://81.82.197.78:22020 relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de tcp://81.82.197.78:22000]
Feb 19 18:33:02 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it as sleep is 24s, next dial is at 2018-02-19 18:32:08.273358548 +0100 CET m=+367148.943424076 and current time is 2018-02-19 18:31:22.00442208 +0100 CET m=+367102.674487602
Feb 19 18:33:02 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de as sleep is 24s, next dial is at 2018-02-19 18:32:22.00442208 +0100 CET m=+367162.674487602 and current time is 2018-02-19 18:31:22.00442208 +0100 CET m=+367102.674487602
...
Feb 19 18:36:06 backup-nas syncthing[888]: [JKZ4T] DEBUG: Reconnect loop for TXZZIYN-3IUWV6F-GBS3YOA-AQOPQZG-72BUFG3-5CA46F3-MQGJAAJ-N4724Q2 [kcp://81.82.197.78:22020 relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de tcp://81.82.197.78:22000]
Feb 19 18:36:06 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de as sleep is 24s, next dial is at 2018-02-19 18:35:26.063607296 +0100 CET m=+367346.733672818 and current time is 2018-02-19 18:34:26.063607296 +0100 CET m=+367286.733672818
...
Feb 19 18:39:11 backup-nas syncthing[888]: [JKZ4T] DEBUG: Reconnect loop for TXZZIYN-3IUWV6F-GBS3YOA-AQOPQZG-72BUFG3-5CA46F3-MQGJAAJ-N4724Q2 [kcp://81.82.197.78:22020 relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de tcp://81.82.197.78:22000]
Feb 19 18:39:11 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://158.58.170.183:22067/?id=GYX5UJS-GA7HGD6-RIYZ3TI-JW44E5I-K3GX6RT-WP4LBSC-V45W4I5-WGTORQE&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=tuxli.ch - domflow.it as sleep is 24s, next dial is at 2018-02-19 18:44:26.063607296 +0100 CET m=+367886.733672818 and current time is 2018-02-19 18:37:30.960459499 +0100 CET m=+367471.630525024
Feb 19 18:39:11 backup-nas syncthing[888]: [JKZ4T] DEBUG: Not dialing relay://31.220.45.3:22067/?id=OHWS367-33NI2O3-GRQQVWG-6SHK655-ETRIJCC-GYTLNCT-2CBUXX5-XFFHCQ4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=https://blog.veloc1ty.de as sleep is 24s, next dial is at 2018-02-19 18:38:30.960459499 +0100 CET m=+367531.630525024 and current time is 2018-02-19 18:37:30.960459499 +0100 CET m=+367471.630525024

The device advertises two relays, yet is available on the 31. one that is always deferred.

This is no way fixed in the new version, yet I suspect I know how this happens.

anything I can/should do ?

I’ll propose a pull request shortly, I guess you can try running with that build before it reaches a release.

ok, i’ll try to get that build on all hosts when it’s available. Or should I put it just on TXZ ?

Ok, so the bug happens when two of your devices connect to the same relay, as the next dial time is decided per address, which happens to be the same for two devices connected to the same relay.

Can you open a bug on github for tracking this?

2 Likes

done

Good catch

1 Like