Hosts remain disconnected, cluster setup?

I’m currently raplacing BitTorrent synch with Syncthing using SyncTrayzor, due to issues with Bittorent Synch (high CPU usage, long delays when hosts have been approved)

I’ve added two hosts and both scan the same folder, they both report them self as Global Discovery being OK and the Indexing of the folder is Up to Date. But the two hosts remain disconnected and they can’t see each other (last seen time is never.)

What I’ve already done:

  • Add each host (so they both know each others ID)
  • I’ve added firewall rules on both ends to allow %ProgramFiles%\SyncTrayzor\syncthing.exe to access the internet.
  • Restarted SyncTrayzor and Syncthing

Additionally I’d like to know how to setup a cluster. When adding my existing host to a new host I simply check the Introducer checkbox, will that share all the connected devices from my existing host to the new host?

You need to forward ports or enable UPnP, otherwise it will not be able to punch through NAT. For growing the cluster yes, ideally one device should be marked as introducer on every other device, though there js no easy wait to remove a node, and that has to be done on every device manually.

The hosts are directly connected to the WAN in a datacenter (there is no NAT or UPNP)

I’ve tried troubleshooting by looking at the settings and configuring the IP and port listening address and then setting the same setting on the connecting host (instead of using dynamic mode) (did this on both ends)

However there is still no connection…

The status tab still shows Address and then a question mark.

The log report says: [Q3VME] 17:01:49 INFO: Device N5G5JDB is “Los Angeles” at [208.76.248.162:22000] [Q3VME] 17:01:49 INFO: Device Q3VME7J is “WIN-UE8ASUJKKO0” at [dynamic] [Q3VME] 17:01:49 INFO: Device FEZLNR5 is “Dallas” at [dynamic] [Q3VME] 17:01:49 INFO: API listening on 127.0.0.1:8384

Right, there is a tool called stfinddevice which allows to lookup what ip advertised on the discovery server on build.syncthing.net syncthing-utils job, but make sure you get some old which has the current version of the protocol (you can try the first available sucessful build which has artefacts), see if the advertised ips make sense.

Check with openssl s_client <ip:22000> (might need word connect before the ip) if you can connect to verify that the firewall rules are correct and actually allow connections.

  1. I’ve tried the stfinddevice program but it doesn’t output anything when I add the device-id.

E.G.: C:\Users\gijsv\Desktop>stfinddevice.exe Q3VME7J-4PN4QF4-OAVJZQA-BMLLAPT-5IZMKFI- FKYILMX-IM5WNY4-NBP6BAD

  1. Regarding openssl:

Where can I get the latest openssl for windows?

I tried running it on my currently installed openssl and I’m being told that “The syntax of the command is incorrect.” when running: openssl s_client <127.0.0.1:22000>

I also tried it without the brackets, which outputs the commands for openssl. I saw there was a host command so I’ve tried this:

C:\Program Files\Apache Software Foundation\Apache24\bin>openssl s_client -host 85.17.189.119:22000 WARNING: can’t open config file: c:/openssl-1.0.1p-win64/ssl/openssl.cnf Loading ‘screen’ into random state - done connect: No such file or directory connect:errno=0

It gives me this message both for the local and external synch host.

  1. I’ve tried disabling the Ipv6 stack on all machines, global discovery changes to 1/2, but still no connection is made.

Are you sure you are using the oldest available build of the utils? I am not sure how to use openssl on windows, you can google around, maybe you are missing thr connect keyword. I guess you could also try telnet or something, just to see if you can connect and that the firewall is doing what it says.

I’ve tried turning off the Windows Firewall, which has solved the problem. So it appears that not just the %ProgramFiles%\SyncTrayzor\syncthing.exe application need to be allowed.

Seems like the data is not bound to syncthing.exe, hence why the rule isn’t being applied:

The firewall log shows this:

The Windows Filtering Platform has blocked a packet.

Application Information: Process ID: 0 Application Name: -

Network Information: Direction: Inbound Source Address: 23.246.204.13 Source Port: 0 Destination Address: 208.76.248.162 Destination Port: 0 Protocol: 6

Filter Information: Filter Run-Time ID: 74332 Layer Name: Transport Layer Run-Time ID: 13

I haven’t played with syncthing-utils before; they seem interesting. Build 106 is the earliest available build I could see - I did try decrementing the URL. Using build 106 on Windows x64, stdevice returned no output here, either. I also tried adding the -server string specified in the usage.

Using Wireshark I can see UDP packets being sent to 194.126.249.5:22027 but nothing coming back. I have set my firewall to allow any traffic from 194.126.249.5.

stfinddevice returns exit code 0 for each query.

Try: openssl s_client -connect host:port

n.b. angle brackets <> are commonly mentioned to enclose a mandatory command, but you do not include them when you enter the command. Square brackets are used for optional commands.

ping [-t] <host>

Would translate to: -t is an optional extra, host is mandatory. Both of these are valid:

ping -t example.com

ping 127.0.0.1

I think the last build which was usable with the current iteration of the protocol was 103. Since Jenkins keeps only 10 builds, this is now lost, and the tools speak the new protocol, which is not yet rolled out, hence why it doesn’t work.

Thanks for the support. I’ve found the issue as posted above (the firewall had syncthing.exe bound traffic allowed, but for some reason traffic was being send on TCP port 0 (which isn’t bound to syncthing) and as a result dropped.

I found that I had to manually allow all traffic from the IP’s used by my hosts to allow them to communicate. This works, but for each host an exception needs to be made.

Is there any fix for this in the making? It’s impossible to make an exception for all IP’s in a cluster when there are more than lets say 5 devices. (because each and every device needs to have this exception firewall rule and it needs to be updated on each device when a new device is added)

I’m thinking the issue might be that Syncthing is using it’s own way of sending a ICMP ping, instead of using the build in to windows ping? (because the log file tells me it times out sometimes, but a normal ICMP ping doesn’t show any packetloss or latency spikes)

Syncthing does not send pings, and we always reuse the same port for server connections. For client connections the OS assigns a random port, but that should not be firewalled, so I am not even sure what you are talking about.

Well, if syncthing doesn’t send pings then why do I see: [Q3VME] 00:18:18 INFO: Connection to FWFDCL3 closed: ping timeout? or is that a synchtrayzor function?

Anyway just try setting up syncthing between two Windows servers (or set the desktop firewall into the same strict mode) and then log the firewall entries in the event log, then you can see what I’m talking about :wink:

The Windows Filtering Platform has blocked a packet.

Application Information: Process ID: 0 Application Name: -

Network Information: Direction: Inbound Source Address: 23.246.204.13 Source Port: 0 Destination Address: 208.76.248.162 Destination Port: 0 Protocol: 6

Filter Information: Filter Run-Time ID: 74332 Layer Name: Transport Layer Run-Time ID: 13

As you can see there is some communication which for some reason isn’t being detected as being bound to a port or application and as a result is being blocked (since the firewall on servers by default blocks all incoming traffic that has no rules).

So despite having a firewall rule that allows syncthing to have incoming and outgoing traffic this traffic is still being blocked and the two devices can not communicate with each other.

To work around this problem I made a rule on each device that allows any kind of incoming traffic from the IP of the other device. This works, but it’s impossible to maintain this in a larger cluster.

This is ping message in the protocol via TCP, not ICMP pings.

We clearly know who is making what request where, and if Windows can’t work it out in their own Kernel, there is not much we can do, as its not in our jurisdiction.

I haven’t seen any other Windows applications that have such issues, so I think it’s a problem with the syncthing implementation.

Just curious of the new relay functionality in v0.12 would solve my problems.

Well the problem is exhibited in the firewall, not in syncthing, so its not a problem with syncthing. Syncthing just uses go standard library for networking, so it potentially could be an issue in golang, but I’d be surprised if you could get them to care about this enough to fix it, as windows seems to be an outisder there. To be honest, thinking about the underlying issue is really telling me its a firewall issue misclassifying stuff because its incapable to deal with the connections the kernel opens, and not a golang thing. Windows probably expects you to use C# and Visial Basic, and breaks when you are not using it.

Relaying will probably just make stuff worse, as it will connect, but at turtle speeds, becuase default relays will most likely bandwidth limits.

Is golang opensource, if so anyone could fix this problem. Correct? I’m not quite sure how to report it to them, since I’m not a developer and don’t know where the issue could be. (making a post on their forum/github about having an issue with a application(synctray) that is based on golang and that might have a bug caused by golang is not going to be very useful to them I guess…)

I thought the relay would only be used to establish the connection? But if I understand you correctly all the data transfers are being send trough the relay?