How to diagnose relay servers not connecting?

Hi,

I contact you directly to avoid public confusion. I want to understand why the relay server does not connect correctly when I do not set the listen address on the command line.

The server:

  • No firewall (double checked)
  • Public IP on a local physical interface
  • No NAT at all (also double checked)

The relay:

  • last binaries fetched from Jenkins

I can provide debug/log/pcap trace and maybe server access.

Just a sample capture showing that it seems to work without the --listen option:

5 0.055635777 195.154.119.12 -> 46.101.130.230 TCP 74 43958 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=36177627 TSecr=0 WS=128 6 0.065973578 46.101.130.230 -> 195.154.119.12 TCP 74 443 → 43958 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=196351511 TSecr=36177627 WS=64 7 0.066031128 195.154.119.12 -> 46.101.130.230 TCP 66 43958 → 443 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=36177630 TSecr=196351511 8 0.066567096 195.154.119.12 -> 46.101.130.230 SSL 249 Client Hello 9 0.076879684 46.101.130.230 -> 195.154.119.12 TCP 66 443 → 43958 [ACK] Seq=1 Ack=184 Win=30080 Len=0 TSval=196351514 TSecr=36177630 10 0.079322896 46.101.130.230 -> 195.154.119.12 TLSv1.2 2962 Server Hello, Certificate 11 0.079353208 195.154.119.12 -> 46.101.130.230 TCP 66 43958 → 443 [ACK] Seq=184 Ack=2897 Win=35072 Len=0 TSval=36177633 TSecr=196351514 12 0.079405100 46.101.130.230 -> 195.154.119.12 TLSv1.2 662 Certificate Status 13 0.079433526 195.154.119.12 -> 46.101.130.230 TCP 66 43958 → 443 [ACK] Seq=184 Ack=3493 Win=37888 Len=0 TSval=36177633 TSecr=196351514

But in the relay logs:

2016/10/03 20:14:48 pool.go:17: Joining https://relays.syncthing.net/endpoint 2016/10/03 20:14:48 pool.go:46: https://relays.syncthing.net/endpoint failed to join due to IP address not matching external address. Aborting

How do you start relaysrv?

I simply run ./strelaysrv

I have also IPv6 enabled on the same interface, do you think it may interfere?

What flags do you pass?

I’m going to move this to a forum post - I’m going to bed, then I’m going to work, so I won’t be around to be your tech support :smile:

By default no flags is passed, and it produce what I described. To make it work, I have just added --listen IP:PORT.

Sweet dreams :wink:

By the way it works but you know I want to understand :slight_smile:

Well I have read the documention another time and it appears that none of the flags are mandatory. So I expect the relay server to work simply running the binary without any flags.

How can you explain the error in that case ?

I’m guessing it’s some ipv4 / ipv6 oddness, but I don’t know enough about relaysrv to say.

I just have tried to disable ipv6 and it still does not work.

So going with all those tests, I assume there is some mandatory options that must be used to make the relay server to work correctly (like -listen or -ext-address).

Maybe updating the documentation that way can avoid confusion.

Let’s wait for the devs to weigh in before jumping to conclusions like that :wink:

Cant wait :smiley:

-ext-address is only needed if you have a firewall rule in place redirecting relaysrv’s listen port (as described in the docs), and -listen defaults to :22070 which should be fine most if the time (and works fine for me), so I do think you’re wrong to say that one of those is required on all cases.

I though I was wrong, but tests show I have to explicitly set -listen to make it work. As I described my setup is pretty simple and network config cannot be as simple as it is right now.

I am just trying to understand why in that case I had to set the listen param. Maybe something somewhere on the system or in the code, I have no idea right now (only some guesses).

The only thing you might need to set is ext-address, listen address is only useful if you want to force it to listen on a specific interface, otherwise it defaults to all.

The only cases I can think that would be solved with listen address is outgoing requests being routed via some route that has no way to connect back, or firewall rules requiring binding on a specific interface.

Well so I guess it’s time to get some dirty hands :slight_smile: Let’s look at the code.

Well looking at the relaypool server & relay server code, and checking/testing with additional debug, it appears (when all flags are omitted on the relay server) that the relay address is 0.0.0.0 (perfectly good as the server listen on all interfaces with carrier.

The thing is that the relaypool server first compare this address to an empty string (I have no idea when this case can happen but the interesting thing is coming), then if it fails when it compared to the address seen by its http server which is the foreign address of the relay trying to connect (that is the case because “0.0.0.0” != “”).

Changing the line 330 of https://github.com/syncthing/syncthing/blob/master/cmd/strelaypoolsrv/main.go To

if host == “” || host == “0.0.0.0” {

Should fix the issue and should align the behavior to the current documentation :wink:

I am waiting for your review.