Hosts remain disconnected, cluster setup?

I’ve tried turning off the Windows Firewall, which has solved the problem. So it appears that not just the %ProgramFiles%\SyncTrayzor\syncthing.exe application need to be allowed.

Seems like the data is not bound to syncthing.exe, hence why the rule isn’t being applied:

The firewall log shows this:

The Windows Filtering Platform has blocked a packet.

Application Information: Process ID: 0 Application Name: -

Network Information: Direction: Inbound Source Address: 23.246.204.13 Source Port: 0 Destination Address: 208.76.248.162 Destination Port: 0 Protocol: 6

Filter Information: Filter Run-Time ID: 74332 Layer Name: Transport Layer Run-Time ID: 13

I haven’t played with syncthing-utils before; they seem interesting. Build 106 is the earliest available build I could see - I did try decrementing the URL. Using build 106 on Windows x64, stdevice returned no output here, either. I also tried adding the -server string specified in the usage.

Using Wireshark I can see UDP packets being sent to 194.126.249.5:22027 but nothing coming back. I have set my firewall to allow any traffic from 194.126.249.5.

stfinddevice returns exit code 0 for each query.

Try: openssl s_client -connect host:port

n.b. angle brackets <> are commonly mentioned to enclose a mandatory command, but you do not include them when you enter the command. Square brackets are used for optional commands.

ping [-t] <host>

Would translate to: -t is an optional extra, host is mandatory. Both of these are valid:

ping -t example.com

ping 127.0.0.1

I think the last build which was usable with the current iteration of the protocol was 103. Since Jenkins keeps only 10 builds, this is now lost, and the tools speak the new protocol, which is not yet rolled out, hence why it doesn’t work.

Thanks for the support. I’ve found the issue as posted above (the firewall had syncthing.exe bound traffic allowed, but for some reason traffic was being send on TCP port 0 (which isn’t bound to syncthing) and as a result dropped.

I found that I had to manually allow all traffic from the IP’s used by my hosts to allow them to communicate. This works, but for each host an exception needs to be made.

Is there any fix for this in the making? It’s impossible to make an exception for all IP’s in a cluster when there are more than lets say 5 devices. (because each and every device needs to have this exception firewall rule and it needs to be updated on each device when a new device is added)

I’m thinking the issue might be that Syncthing is using it’s own way of sending a ICMP ping, instead of using the build in to windows ping? (because the log file tells me it times out sometimes, but a normal ICMP ping doesn’t show any packetloss or latency spikes)

Syncthing does not send pings, and we always reuse the same port for server connections. For client connections the OS assigns a random port, but that should not be firewalled, so I am not even sure what you are talking about.

Well, if syncthing doesn’t send pings then why do I see: [Q3VME] 00:18:18 INFO: Connection to FWFDCL3 closed: ping timeout? or is that a synchtrayzor function?

Anyway just try setting up syncthing between two Windows servers (or set the desktop firewall into the same strict mode) and then log the firewall entries in the event log, then you can see what I’m talking about :wink:

The Windows Filtering Platform has blocked a packet.

Application Information: Process ID: 0 Application Name: -

Network Information: Direction: Inbound Source Address: 23.246.204.13 Source Port: 0 Destination Address: 208.76.248.162 Destination Port: 0 Protocol: 6

Filter Information: Filter Run-Time ID: 74332 Layer Name: Transport Layer Run-Time ID: 13

As you can see there is some communication which for some reason isn’t being detected as being bound to a port or application and as a result is being blocked (since the firewall on servers by default blocks all incoming traffic that has no rules).

So despite having a firewall rule that allows syncthing to have incoming and outgoing traffic this traffic is still being blocked and the two devices can not communicate with each other.

To work around this problem I made a rule on each device that allows any kind of incoming traffic from the IP of the other device. This works, but it’s impossible to maintain this in a larger cluster.

This is ping message in the protocol via TCP, not ICMP pings.

We clearly know who is making what request where, and if Windows can’t work it out in their own Kernel, there is not much we can do, as its not in our jurisdiction.

I haven’t seen any other Windows applications that have such issues, so I think it’s a problem with the syncthing implementation.

Just curious of the new relay functionality in v0.12 would solve my problems.

Well the problem is exhibited in the firewall, not in syncthing, so its not a problem with syncthing. Syncthing just uses go standard library for networking, so it potentially could be an issue in golang, but I’d be surprised if you could get them to care about this enough to fix it, as windows seems to be an outisder there. To be honest, thinking about the underlying issue is really telling me its a firewall issue misclassifying stuff because its incapable to deal with the connections the kernel opens, and not a golang thing. Windows probably expects you to use C# and Visial Basic, and breaks when you are not using it.

Relaying will probably just make stuff worse, as it will connect, but at turtle speeds, becuase default relays will most likely bandwidth limits.

Is golang opensource, if so anyone could fix this problem. Correct? I’m not quite sure how to report it to them, since I’m not a developer and don’t know where the issue could be. (making a post on their forum/github about having an issue with a application(synctray) that is based on golang and that might have a bug caused by golang is not going to be very useful to them I guess…)

I thought the relay would only be used to establish the connection? But if I understand you correctly all the data transfers are being send trough the relay?

Yes, its opensource, anx anyone who cares could potentially fix this. If you can provide a reproducible cut down to a few lines of go code case, you should open a github issue.

Relays relay all traffic, not just help to connect.

C# and VB call directly into the winapi to perform networking, so there’s no way that the kernel can tell the difference between them and any other application using the winapi (such as Go).

Well there is obviously some difference, because ports are shown as 0’s when called from Go.

Some googling suggests that similar windows firewall logs (no process id, source and destination port of 0) can appear in other scenarios, so I wouldn’t necessarily blme Go here. The best lead I’ve got so far is that it’s logging part of a fragmented TCP packet (so the port info in the L4 header isn’t available). Maybe we’re (somehow) sending packets that exceed the MTU?

MTU is defined by the NIC settings usually, atleast on Linux. Plus the first few packets we send is the TLS handshake, cluster config message, which I doubt if they are over the MTU.

@gijs007 is it the initial handshake that’s blocked by the firewall, or does the disconnection happen after the connection has been established? What is the MTU reported by the relevant NIC?

The MTU hasn’t been altered on any of the servers, they use the default 1500 MTU.

Perhaps the packets are fragmented when going over the tier 1 networks for WAN traffic, but I doubt that since a ping like: ping [remote IP] -l 1472 -f works fine. (which sends a of ping message of 1472 + 28 bytes for the IP/ICMP headers = 1500 bytes packet correctly)