Ungraded devices are unable to relay, but they can relay with older device

Hi,

Our setup has been working exceptionally well for 3 years and has only become an issue after attempting to upgrade OS versions and hard disks to a large size as we starting to run out disk space.

The upgrade process was to setup a 3rd Pi, with the new hard disk, sync all the data and devices and then decommission the older device it was meant to replace, and then repeat.

Details and logs below.

We love and rely heavily on our Syncthing setup. Any guidance on the below would be greatly appreciated.

Thanks


Setup

We setup 2 Pi’s and placed them at 2 different offices. These 2 devices connect via a private relay.

Staff have syncthing setup on their local machines and have both Pi’s setup as devices and as they move between locations, their shared folders are always update-to-date.

Hardware

  • Location A 16x.xxx.xxx.xxx
  • Location B 15x.xxx.xxx.xxx
  • Pi #1, OS: OpenWrt 21.02.0, syncthing v1.14.0, ID: 6Cxxxx-...
  • Pi #2, OS: OpenWrt 22.03.2, syncthing v1.23.0, ID: R7xxxx-...
  • Pi #3, OS: OpenWrt 22.03.2, syncthing v1.23.0, ID: NFxxxx-...

Current Situation

Two upgrades have been completed and one old one has been decommissioned.

All three devices when on the same local network are able to sync and connect without any issues.

The remaining old device #1 can relay with both new devices from ether location.

The 2 new devices are unable to relay, regardless of their location.

Troubleshooting

I have brought the devices together and moved them around and swapped locations and the result is the same.

I have upgraded the 2 new devices to latest possible syncthing version and double checked the relay address settings are correct.

All the devices successfully connect to the relay, but the 2 new devices connect about 20 seconds apart and disconnect before the other indicated by a not found log message.

Relay Logs

syncthing_relaysrv  | 2023/02/12 11:40:23 listener.go:48: Listener accepted connection from 16x.xxx.xxx.xxx:50318 tls true
syncthing_relaysrv  | 2023/02/12 11:40:23 listener.go:117: Message protocol.ConnectRequest from R7xxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx
syncthing_relaysrv  | 2023/02/12 11:40:23 listener.go:166: R7xxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx is looking for NFxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx which does not exist
syncthing_relaysrv  | 2023/02/12 11:40:23 listener.go:223: Closing connection R7xxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx: read tcp 17x.xxx.xxx.xxx:22067->16x.xxx.xxx.xxx:50318: use of closed network connection
syncthing_relaysrv  | 2023/02/12 11:40:45 listener.go:40: Listener failed to accept connection from 127.0.0.1:36683 . Possibly a TCP Ping.
syncthing_relaysrv  | 2023/02/12 11:40:45 listener.go:117: Message protocol.Pong from 6Cxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx
syncthing_relaysrv  | 2023/02/12 11:40:47 listener.go:48: Listener accepted connection from 15x.xxx.xxx.xxx:37546 tls true
syncthing_relaysrv  | 2023/02/12 11:40:48 listener.go:117: Message protocol.ConnectRequest from NFxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx
syncthing_relaysrv  | 2023/02/12 11:40:48 listener.go:166: NFxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx is looking for R7xxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx which does not exist
syncthing_relaysrv  | 2023/02/12 11:40:48 listener.go:223: Closing connection NFxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx: read tcp 17x.xxx.xxx.xxx:22067->15x.xxx.xxx.xxx:37546: use of closed network connection

Logs from the affected syncthing clients may be more helpful than the relay debug log.

From those logs alone, it looks like neither of the two devices actually join the relay. They both attempt to dial each other, but when neither is persistently connected to the relay, this can’t work.

Perhaps more clear: In order for a device to be available via a specific relay, it must be configured in Actions -> Settings -> Connections -> Sync Protocol Listen Addresses.

1 Like

I have checked the logs of the devices and could not see anything useful. I only enabled relaying for additional log debugging. Will look again.

I am reasonably confident the connection addresses are configured correctly, and what makes this so odd is that device #1 & #2 can relay and #1 & #3 can relay, form ether location. Only #2 and #3 just cant relay.

#1 is one of the old devices being replaced and #2 and #3 are the new devices.

My only theory is that as you mentioned the new devices are not persisting the relay connection but the old is. So when the the new devices connects looking for #1 they find it, but not each other.

The new devices are running v1.23.0 and I see v1.23.1 fixes an issue regarding the TCP accept function. I don’t know enough on the topic to know if that could be impacting this.

Ref: https://github.com/syncthing/syncthing/issues/8325

If you’re affected by this, you will see these warnings in the logs. You also need to be running an ancient Linux kernel for this.

The symptoms can definetly be explained with device #1 joining the relay, while #2 and #3 are not joining it (perhaps because they are not configured to).

What is your “syncthing protocol listen addresses” setting on #2 and #3? Compare that to #1?

Probably not, but #8749 may be relevant (Syncthing stops listening on a relay after a timeout.)

1 Like

Yeah but that doesn’t typically happen from the get-go, as this only applies once the relay listener has run into a timeout at least once. From the looks of this, they never even join the relay, so either the network is flaky enough to trigger a timeout right away, or they’re not even attempting to join it.

Logs from the devices in question would answer that question. Hint, hint, @NaX.

1 Like

Okay, I figure out my mistake. @Nummer378 had it with the first comment. I had checked a dozen times the config of the remote device settings, but forgot to point the device listening addresses to also point at the relay.

Only when talking it through with you guys and double checking the logs, and coming up with the theory that #1 was connected and so when new devices went looking for it, they were able to find it even thought they themselves had not joined the relay and was not listening.

But, because neither of the new devices had connected and was listening on the relay, when its counterpart came looking, neither was connected and listening and they kept missing each other.

Thanks for rubber ducking a silly mistake with me. Also, I just wanted to say that you guys run one of the nicest, most responsive and most pleasant OSS communities I have ever had the pleasure of interacting with. I really appropriate you taking the time to help me with my silly mistake.

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.