Devices mark each other as connected and disconnected at the same time

This is basically a continuation of https://forum.syncthing.net/t/stalled-transfers-with-one-device-shows-as-connected-and-the-other-as-disconnected/16334.

There are two devices on the sam WLAN. They use static IP addresses with Syncthing set to the default dynamic.

The WLAN has both a 2.4 and 5 GHz networks set under the same name and password. One of the two devices switched from the 2.4 to the 5 GHz one about 45 minutes ago.

After that happened, Syncthing on the other device changed the device’s state to “Disconnected”. However, Syncthing on the first device, where the network switched to 5 GHz, still thinks that the two devices are connected.

  • Device 1 (no network change):

    Device 2 is marked as “Disconnected”. 45+ minutes has passed, but the state is not changing.

    These messages are repeated in the log:

    2021-05-13 01:01:28 Failed to exchange Hello messages with 7LDIBZ4 at [::]:22000-10.0.0.3:22000/quic-client/TLS1.3-TLS_AES_128_GCM_SHA256: deadline exceeded
    

  • Device 2 (switched from 2.4 to 5 Ghz):

    Device 1 still marked as “Connected”. The files are not syncing though.

    This is the last relevant entry in the log:

    2021-05-13 00:18:51 Established secure connection to D4DZUGP at 10.0.0.3:22000-10.0.0.2:22000/tcp-server/TLS1.3-TLS_CHACHA20_POLY1305_SHA256
    2021-05-13 00:18:51 Replacing old connection 10.0.0.3:22000-10.0.0.2:22000/tcp-server/TLS1.3-TLS_CHACHA20_POLY1305_SHA256 with 10.0.0.3:22000-10.0.0.2:22000/tcp-server/TLS1.3-TLS_CHACHA20_POLY1305_SHA256 for D4DZUGP
    

I think that now that I know when this happens, I can probably reproduce the problem. Can something be done with this kind of a situation though? I can pause and unpause Device 1 on Device 2, which will force it to reconnect, but this is just a manual workaround…

Edit: The devices have finally reconnected for real after being stuck in that state for about 1 hour or so. This means that the problem does get fixed by itself eventually, although 1 hour seems a little excessive.

The 1h might be TCP keepalive kicking in if syncthing doesn’t send any application level ping traffic.

What’s more interesting is that the other device seems unable to establish any connection while actively trying to dial.

If you’re able to reproduce this, could check if the devices can access each other? e.g check if ping goes through and web ui is accessible.

Edit: how does syncthing handle “duplicate” connections? e.g would half closed connection A<->B properly be replaced if the device which broke it attempts to reconnect?

I didn’t try pinging then, but Device 1 could definitely access Device 2, as I had the Web GUI of the latter open on the former all the time when the issue was going on.

We send a ping every 45-90s if the connection is otherwise silent. If that send fails, the connection is dropped. I don’t think we set any deadline though, so if the connection gets into some “hung” state without returning an error, that could explain what you see.

1 Like

Just for the record, I have a feeling that this issue may actually have nothing to do with connectivity, but is rather caused by https://forum.syncthing.net/t/syncthing-stuck-gui-seemingly-responsive-but-nothing-is-happening/16828. The GUI on one side reports the other as disconnected, while the other one says otherwise, but only because the GUI has been stalled due to GC. Once GC has finished, everything unblocks, and the devices reconnect.

The devices discussed here are the same as the ones listed in the other topic.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.