Relay crashed since it joined the pool

Hi

First time since my win32 relay joigned the pool yesterday, it hanged, when it ran fine for several days as private:

2016/04/16 13:14:23 main.go:147: URI: relay://:22067/?id=IDTHERE-.....&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=1000000&statusAddr=:22070&providedBy=
2016/04/16 13:14:23 main.go:150: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2016/04/16 13:14:23 main.go:151: !!  Joining default relay pools, this relay will be available for public use. !!
2016/04/16 13:14:23 main.go:152: !!      Use the -pools="" command line option to make the relay private.      !!
2016/04/16 13:14:23 main.go:153: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
2016/04/16 13:14:25 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 14:02:28 pool.go:35: Failed to join https://relays.syncthing.net/endpoint due to an internal server error: test failed

2016/04/16 15:02:30 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 15:50:32 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 16:38:34 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 17:26:36 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 18:14:39 pool.go:35: Failed to join https://relays.syncthing.net/endpoint due to an internal server error: test failed

2016/04/16 19:14:40 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 20:02:42 pool.go:35: Failed to join https://relays.syncthing.net/endpoint due to an internal server error: test failed

2016/04/16 21:02:45 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 21:15:21 status.go:19: accept tcp 0.0.0.0:22070: acceptex: The semaphore timeout period has expired.

Any idea please. Thank you

Googling that error message, it sounds like a network driver issue or similar. The usual culprits of firewalls and antivirus are mentioned as well.

Furthermore it hasn’t joined the pool, as you can see from the messages in the log. Your relay is behind a router/firewall and is not accessible from the internet.

No Audrius, it was in the relays map and even proxied some 10MiB. Now I restarted it and it shows in the maps having relayed some GB and hosting sessions and connections.

The log says otherwise though, but perhaps it’s related to the socket errors you are having.

Do you think so ?

...
2016/04/16 15:02:30 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 15:50:32 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/16 16:38:34 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
...

What do you mean? Does joined means failed to join and retry in 48mn ? In the lap time I write this the server died again. Who lies ? The log, the dead server or the maps server hears ghosts? I found I had the Windows firewall service running, I thought it was disabled by 3rd party. After disabled the service, the relay crashed againg when I stopped it… let’s test this conf for a while… bye-bye

This happens when the relaypoolsrv was unable to establish a connection to your relay, and therefore didn’t let it join the pool.

It is correct to say that your relay joined the pool for a while. It is also correct to say that there are points where the relaypoolsrv refused your relay because it didn’t accept a test connection from the relaypoolsrv.

This is the crash you’re referring to I assume? As @calmh said, this is probably Windows fighting a firewall.

In case it wasn’t clear, there are two issues here (which may be related). One is that the relay is only intermittently allowed to join the relay pool, because the relaypoolsrv is only sometimes able to establish a test connection to it. This may be due to a firewall.

The other issue is that your relay crashes, after some fairly low-level Windows networking code fails with “The semaphore timeout period has expired.” when accepting a connection. This, again, may be due to a firewall.

I have no idea what you’re trying to say here.

I couldn’t figure how it could work seens from somewhere and not work from elsewhere. Let’s assume it was not at the same time and it’s a normal intermitent issue. My fw rules in the machine are

Allow Inbound TCP 22067
Allow Inbound TCP 22070
Allow Outbound Anywhere Anything #Anything instead of TCP because of ping because I read the server pings(1)
Deny anything else

and a portforward in the gateway according to my -ext-address & -listen ":22067" parameters.

(1) not sure what to do there: I know ping doesn’t use a port so I beleive these pings are internal to the relaying protocol and the server doesn’t use the OS native ping tool, so I surely can restrict this single outbound line to TCP only?

EDIT : in the time of writing, I set a rule just before the “Deny anything else” that

...
Allow Anything else plus Warns
...

and I got a warning : Remote: web.syncthing.net (82.196.11.4), port https (443), Outbound TCP, local port 2279 ??? this should have match the third line isn’t it? Is this “pings” I talked above?

Thank you

Pings are done via the 22067. The warn line you are seeing is the relay server trying to register with the pool.

Your relay needs to be able to establish a connection to the relaypoolsrv in order to register, which is what is happening here.

“Pings” are test connections on port 22067, made from the relaypoolsrv to your relay.

Do you precisely mean sent to -ext-address then hit the server on -listen ":22067"? Pings in fact are actually some kind of openssl s_client -connect... rather than real icmp?

Yes. When your relay registers with the relaypoolsrv (TCP, outbound, to port 443), the relaypoolsrv establishes a test connection to your relay (TCP, port 22067, as if it were a Syncthing client) to ensure that your relay can be accessed.

Damned ! :toilet: dead again although still running in taskmgr & services.msc :frowning:

2016/04/17 18:32:53 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/17 19:20:56 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/17 20:08:57 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/17 20:56:59 pool.go:52: Joined https://relays.syncthing.net/endpoint rejoining in 48m0s
2016/04/17 21:45:02 pool.go:35: Failed to join https://relays.syncthing.net/endpoint due to an internal server error: test failed

I deal with the service with nssm only (never sc as shown somewhere as nssm v2.24 seems a so good tool). Eventvwr shows nothing at 21:45

I’m really disappointed not been able to give my 2 cents to the infrastructure.

Most likely a firewall randomly blocking incoming connections, again.

Why incoming when Failed to join is logged with a remote target ?

Please read what others spend time typing:

I appreciate your efforts. I read all and reread carefully. It is just I didn’t understand/guess the implicit background. Do I understand well if I guess the process this way?: My relay connects to web.syncthing.net (or 82.196.11.4) port https (443) then waits some time to get an answer not over the just opened connection (which is just some knock-the-door) but instead from an incoming new connection from the target-above:21067. If it gets such it logs a “join success/renew in 48mn”, if it doesn’t get it logs an “internal server error” ?

No. The first part is correct, up to and including the HTTPS connection. What then actually happens is that the relay pool server tries to connect to your relay to verify that it’s possible to do so. It then responds to the original join request (over HTTPS) with a thumbs up or thumbs down depending on the result.

relays.syncthing.net, not web.syncthing.net.

Correct.

Specifically, you get internal server error: test failed - the “test failed” indicates that the relaypoolsrv was unable to establish a connection to your relay.

Maybe just my firewall that does some dig/nslookup ip. I tried from a linux that said the same: web. Surely only a dns alias or A record precedence. I don’t think it’s the issue as it would never work.

Thanks for your time