Connectivity to other devices 99% unstable

Hi,

One of the devices I help administer in a cluster of ~10 devices sharing one big (1Tb+) folder seems unable to connect reliably to other devices. A lot of the logs report failures to set up connections to other devices. A common pattern is (after a fresh restart) :

  • All devices appear disconnected (see log for failures)
  • Suddenly one device connects
  • They start exchanging indexes (up to ~5%)
  • Connection then drops
  • All devices appear disconnected for some time again (5min to 1hour)

What has been tested :

  • Manually port mapping 22000/tcp to the device (simple home router)
  • device set into DMZ for full exposure and router firewall bypass
  • Network and internet connection seems stable (ping’d some other devices without noticing anything strange)

Here are the redacted logs

2020-09-03 21:58:48 My ID: _MY_ID_
2020-09-03 21:58:50 Single thread SHA256 performance is 8.9 MB/s using minio/sha256-simd (7.6 MB/s using crypto/sha256).
2020-09-03 21:58:52 Hashing performance is 8.26 MB/s
2020-09-03 21:58:53 Overall send rate is unlimited, receive rate is unlimited
2020-09-03 21:58:53 Using discovery server https://discovery.syncthing.net/v2/?noannounce&id=_ANNOUNCE_ID_
2020-09-03 21:58:53 Using discovery server https://discovery-v4.syncthing.net/v2/?nolookup&id=_ANNOUNCE_ID_
2020-09-03 21:58:53 Using discovery server https://discovery-v6.syncthing.net/v2/?nolookup&id=_ANNOUNCE_ID_
2020-09-03 21:58:53 TCP listener ([::]:22000) starting
2020-09-03 21:58:53 Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
2020-09-03 21:58:53 QUIC listener ([::]:22000) starting
2020-09-03 21:58:53 ...
2020-09-03 21:58:53 GUI and API listening on [::]:7070
2020-09-03 21:58:53 Access the GUI via the following URL: http://127.0.0.1:7070/
2020-09-03 21:58:53 My name is "_MY_NAME_"
2020-09-03 21:58:53 Device BOAL7RP is "somename" at [dynamic]
2020-09-03 21:58:53 Device FI2QNC7 is "somename" at [dynamic]
2020-09-03 21:58:53 Device KMIBINI is "somename" at [dynamic]
2020-09-03 21:58:53 Device UP56BS7 is "somename" at [dynamic]
2020-09-03 21:58:53 Device UYGS6US is "somename" at [dynamic]
2020-09-03 21:58:53 Device YWHOXL6 is "somename" at [dynamic]
2020-09-03 21:58:53 Device FEUN355 is "somename" at [dynamic]
2020-09-03 21:58:53 Device SHJMJ7H is "somename" at [dynamic]
2020-09-03 21:58:53 Device SRQKFGJ is "somename" at [dynamic]
2020-09-03 21:58:53 Device V4ATM5R is "somename" at [dynamic]
2020-09-03 21:58:53 Device 4GM7OLJ is "somename" at [dynamic]
2020-09-03 21:58:54 Ready to synchronize _MY_FOLDER_ (sendreceive)
2020-09-03 21:59:03 Listen (BEP/relay): Get "https://relays.syncthing.net/endpoint": net/http: TLS handshake timeout
2020-09-03 21:59:03 Relay listener (dynamic+https://relays.syncthing.net/endpoint) shutting down
2020-09-03 21:59:03 c.S.listenerSupervisor: Failed service 'dynamic+https://relays.syncthing.net/endpoint' (1.000000 failures of 2.000000), restarting: true, error: "{dynamic+https://relays.syncthing.net/endpoint dynamic+https://relays.syncthing.net/endpoint} returned unexpectedly", stacktrace: [unknown stack trace]
2020-09-03 21:59:03 Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
2020-09-03 21:59:06 New NAT port mapping: external TCP address _IPV4_1_:57391 to local address 0.0.0.0:22000.
2020-09-03 21:59:15 quic://0.0.0.0:22000 detected NAT type: Port restricted NAT
2020-09-03 21:59:15 quic://0.0.0.0:22000 resolved external address quic://_IPV4_1_:22000 (via stun.syncthing.net:3478)
2020-09-03 21:59:15 Listen (BEP/relay): Get "https://relays.syncthing.net/endpoint": net/http: TLS handshake timeout
2020-09-03 21:59:15 Relay listener (dynamic+https://relays.syncthing.net/endpoint) shutting down
2020-09-03 21:59:15 c.S.listenerSupervisor: Failed service 'dynamic+https://relays.syncthing.net/endpoint' (1.759434 failures of 2.000000), restarting: true, error: "{dynamic+https://relays.syncthing.net/endpoint dynamic+https://relays.syncthing.net/endpoint} returned unexpectedly", stacktrace: [unknown stack trace]
2020-09-03 21:59:15 Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
2020-09-03 21:59:17 Detected 3 NAT services
2020-09-03 21:59:26 Listen (BEP/relay): Get "https://relays.syncthing.net/endpoint": net/http: TLS handshake timeout
2020-09-03 21:59:26 Relay listener (dynamic+https://relays.syncthing.net/endpoint) shutting down
2020-09-03 21:59:26 Entering the backoff state.
2020-09-03 21:59:26 c.S.listenerSupervisor: Failed service 'dynamic+https://relays.syncthing.net/endpoint' (2.367933 failures of 2.000000), restarting: false, error: "{dynamic+https://relays.syncthing.net/endpoint dynamic+https://relays.syncthing.net/endpoint} returned unexpectedly", stacktrace: [unknown stack trace]
2020-09-03 21:59:51 Established secure connection to UYGS6US at [::]:22000-_IPV4_2_:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256
2020-09-03 21:59:51 Device UYGS6US client is "syncthing v1.8.0" named "somename" at [::]:22000-_IPV4_2_:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256
2020-09-03 21:59:51 Device UYGS6US folder _MY_FOLDER_ has a new index ID (0xDACE31D7AF38291E)
2020-09-03 22:00:20 Failed to exchange Hello messages with V4ATM5R at 192.168.1.100:55863-62.171.170.70:22067/relay-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256: EOF
2020-09-03 22:08:17 Failed initial scan of sendreceive folder _MY_FOLDER_
2020-09-03 22:08:19 Connection to UYGS6US at [::]:22000-_IPV4_2_:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256 closed: restarting folder _MY_FOLDER_
2020-09-03 22:10:00 Established secure connection to 4GM7OLJ at 192.168.1.100:49906-_IPV4_3_:22067/relay-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256
2020-09-03 22:10:00 Device 4GM7OLJ client is "syncthing v1.8.0" named "Boursny" at 192.168.1.100:49906-_IPV4_3_:22067/relay-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256
2020-09-03 22:13:44 Exiting backoff state.
2020-09-03 22:13:45 Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting
2020-09-03 22:13:50 Ready to synchronize _MY_FOLDER_ (sendreceive)
2020-09-03 22:13:50 Restarted folder _MY_FOLDER_ (sendreceive)
2020-09-03 22:14:16 Connection to 4GM7OLJ at 192.168.1.100:49906-_IPV4_3_:22067/relay-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256 closed: reading length: read tcp 192.168.1.100:49906->_IPV4_3_:22067: read: connection reset by peer

And a lot of

2020-09-04 07:23:12 Dialing direct result tcp 192.168.1.1:49152: &{{0x247e230}} <nil>
2020-09-04 07:23:27 Dialing direct result tcp discovery-v4.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->_IPV4_4_:443: bind: address already in use
2020-09-04 07:23:27 Dialing direct result tcp discovery-v6.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->[_ipv6_2_]:443: bind: address already in use
2020-09-04 07:28:29 Dialing direct result tcp discovery-v6.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->[_ipv6_2_]:443: bind: address already in use
2020-09-04 07:28:29 Dialing direct result tcp discovery-v4.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->_IPV4_5_:443: bind: address already in use
2020-09-04 07:33:31 Dialing direct result tcp discovery-v6.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->[_ipv6_3_]:443: bind: address already in use
2020-09-04 07:33:31 Dialing direct result tcp discovery-v4.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->_IPV4_4_:443: bind: address already in use
2020-09-04 07:38:37 Dialing direct result tcp discovery-v4.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->_IPV4_5_:443: bind: address already in use
2020-09-04 07:38:37 Dialing direct result tcp discovery-v6.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->[_ipv6_2_]:443: bind: address already in use
2020-09-04 07:43:31 Dialing direct result tcp upgrades.syncthing.net:443: &{{0x2866190}} <nil>
2020-09-04 07:43:42 Dialing direct result tcp discovery-v6.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->[_ipv6_3_]:443: bind: address already in use
2020-09-04 07:43:42 Dialing direct result tcp discovery-v4.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->_IPV4_5_:443: bind: address already in use
2020-09-04 07:48:48 Dialing direct result tcp discovery-v4.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->_IPV4_4_:443: bind: address already in use
2020-09-04 07:48:48 Dialing direct result tcp discovery-v6.syncthing.net:443: <nil> dial tcp 0.0.0.0:22000->[_ipv6_2_]:443: bind: address already in use

Feels like a firewall or a proxy getting in the way. Also check logs of the other side why it think it disconnected.

Also, given the hash rate is 8mb/s I suspect it’s some weak device which just locks up when it needs to do something of reasonable size.

Also this is not great, it doesn’t say why it failed tho, but I suspect there is just too much happening for the device so weak.

Thanks, We will check CPU and mem consumption during startup and network retries.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.