v0.14.40 fails to connect

Xeenych · November 9, 2017, 8:28am

After upgrading to v 0.14.40 syncthing cannot see other devices. Connection established adn dropped immediatelly. Log says: broken pipe

Rooling back to 0.14.39

imsodin · November 9, 2017, 9:08am

Please provide a little more detail, starting with the actual log.

calmh · November 9, 2017, 9:17am

Indeed. If you previously could only connect via relays you might get a KCP connection now, which might have issues. Moar data please.

usernamegoeshere · November 9, 2017, 9:38am

Help here the same, today people call and say they needed to mail files back and forth with other friends or family as they wouldnt see their shared folder and their changes on either side.

I am looking at two select machines here, one linux x64 and one windows x64, they both upgraded and arrived at 0.14.40 automagically.

On both sides nothing network-wise has changed, same home router, home gateway, the linux machine has opened up tcp port 22000 ever since initial install on the internet router (portforward) the other user on the windows side is more like a notebook roaming roadwarrior user.

The syncthing process on both sides is running, i see the localhost config and information webpage of it.

The odd stuff is, the linux and the windows machine, on the right lower side, where the other machines/peers are listed, that the last contact was all like one day ago. Just before or when the 0.14.40 release was pushed to the public?

The peers are not paused. Restarted syncthing but they wouldnt connect.

Normal small network home network scenario here, so no fancy network rules or blocking of anything, be it udp or tcp. Only thing has changed methinks is the new 0.14.40 release and maybe we have a race condition or faulty decsision process on what transport udp tcp kcp to use somewhere in the code?

One more thing: the windows machine, (i am administrating the syncthing process there) would show a hovering layer on the syncthing config page, that the anon stat data type or whatever internally has changed and if I wanted to swap over to the new data format or something the like. Maybe this decision is blocking the rest of the functionality when it has not been decided and actively surfed and answered yet? Just wondering.

What can I report more? Thanks for resorting this quickly.

p.s. i have copied over and pseudonymised some log lines that are currently visible on the linux syncthing. it fails with hello messages the log says.

[A1B2C3] 10:35:56 INFO: Failed to exchange Hello messages with XXXXX-BBBBB-AAAAAA-CCCCC (1.2.3.4:7729): i/o timeout [A1B2C3] 10:36:16 INFO: Failed to exchange Hello messages with YYYYYY-FFFFF-BBBBBB-DDDDDDDD (192.168.1.2:22020): i/o timeout [A1B2C3] 10:37:11 INFO: Failed to exchange Hello messages with XXXXX-BBBBB-AAAAAA-CCCCC (1.5.7.1:22067): read tcp 192.168.1.3:59728->1.5.7.1:22067: i/o timeout [A1B2C3] 10:37:31 INFO: Failed to exchange Hello messages with YYYYYY-FFFFF-BBBBBB-DDDDDDDD (192.168.1.2:22000): read tcp 192.168.1.3:53590->192.168.1.2:22000: i/o timeout

more info: restarted the linux syncthing, and two external (other) machines one macosx and one win x64 (but different than above) do connect from externally, both reporting them being as well on 0.14.40, they connect via tcp ipv4, maybe this is another hint. The first x64 machine externally is also having ipv6 address by its ISP.

The log with the failing hello I posted from above is from that failing win x64 machine for example but also from other ones.

usernamegoeshere · November 9, 2017, 9:52am

To sum it up, I see on the linux machine, two disconnected nodes, both are win x64, one win7 and one win10, both have discovered ip addresses when I extend the info boxes of the nodes there, and they show tcp and kcp addresses (partly disconvered etc) but for example the ipv6 address is not shown fully (ellipsized), I guess thats just a display thing obviously, hopefully ;p, but these two machines dont connect to the linux machine, they are the ones that the log lines show where hello fails timeouts.

then there is one external macosx that shows tcp server in tooltip and one ipv4 address, and one more win10 x64 machine, that is connected as well and green status, and has also one ipv4 address (tcp server) in its box nothing else.

HALP

usernamegoeshere · November 9, 2017, 10:04am

Is there a very quick way to make syncthing use only the ipv4 layer of the OS for a quick check to see if my kind-of master node (the linux machine) would then connect to all the external pack of nodes? The options page or config settings could need some features for situations like this

p.s. i went for https://docs.syncthing.net/users/config.html#listen-addresses

and set the listening address to TCP IPv4 wildcard and port → tcp4://0.0.0.0:22000

and restarted syncthing on the linux node. will see if this helps the two external nodes that fail to hello.

usernamegoeshere · November 9, 2017, 10:28am

After having set the linux machine to listening address tcp4://0.0.0.0:22000, and restarting sycthing, i see only the healthy connections from the one macosx and one windows machine, both having showing only ipv4 addresses from externally. the other two failing nodes, that were trying to connect and timeouting hello, dont show up in the logs any more. although even their log lines were showing their ipv4 addresses, but maybe they were preferring their ipv6 address or something went wrong there.

but for many minutes now, they dont show up in the logs any more at all.

currently i can not remote admin all the participating machines and set them to tcp4://0.0.0.0:22000 as well, so thats why i only could try with the kinda central linux machine first, this maybe seems to resolve some part aspect of the problem, but the other affected machines are seemingly cut off now completely.

thats all i can say for now.

calmh · November 9, 2017, 11:06am

Your only little log snippet shows read timeouts for TCP, KCP, and relays. Since it apparently tries both TCP and relaying it’s not just KCP being broken at least.

usernamegoeshere · November 9, 2017, 11:19am

I wonder if the logic, of what connection or technology to use to reach a distant node, is confused with this new release as:

the linux machine has opened its tcp 22000 port completely towards the public internet, proper tcp portforwarding set. Why do external nodes then fail or need to use such stuff as kcp or relays?

Also: the macosx external machine, that successfully connects via ipv4 to the linux one, gets a log entry about every minute, telling that a connection to an already connected node has been made. But only this macosx machine gets that line printed in the logs of the linux machine. The other also functioning external ipv4 win x64 machine, doesnt show these logs but is still connected via a public ipv4 address.

weird.

calmh · November 9, 2017, 11:24am

Yes, odd. Post full logs from both sides and we may be able to draw some conclusions.

usernamegoeshere · November 9, 2017, 11:33am

Well what I did a little while go is went back to default, as listening address on the linux machine. and now I had the chance to grab the one remote windows machine and simply restart its syncthing, now it apparently managed to connect via its ipv4 public address back home to linux box.

Nothing else changed. Maybe some network discovery servers or interfaces and such stuff become confused?

calmh · November 9, 2017, 11:36am

Yeah we can speculate and guess about confusion or whatnot, but without log data we’ll never know.

usernamegoeshere · November 9, 2017, 11:38am

Where do I find the logs on a windows machine actually? On the linux machine I startup the syncthing binary in a terminal with screen and can monitor and read it there.

Are logs being collected and near or mid term archived in some place where I can extract or read up on them from since yesterday right after the initial 0.14.40 release/reboot?

usernamegoeshere · November 9, 2017, 3:35pm

would like to see the op of this thread post his logs as i can not bring in any more logs as the ones i posted before i think.

or does the linux syncthing client log something into some archive files? maybe i could provide then.

Xeenych · November 9, 2017, 4:00pm

I cannot do this, because I rolled back to 0.14.39

imsodin · November 9, 2017, 4:09pm

@Xeenych It would be nice if you could run 0.14.40 temporarily to capture the logs such that there is a chance for this to get fixed - new versions do generally have benefits

@usernamegoeshere Syncthing only logs to command line, so if you run it “manually” in a screen session you need to pipe (tee) it into a file yourself. If the relevant parts are still in the screen buffer you can also access it in “scroll mode”, accessible via “ctrl-a [”

calmh · November 9, 2017, 5:41pm

In Windows there is a log file by default in the configuration directory.

janc1 · November 10, 2017, 11:14am

yes, after update of syncthing on my PC I did see that it is not connected to server. So I did check server and there was not connection from another nodes. We want only local discovery in office so on all nodes we have enabled only “local discovery”.

“NAT traversal”, “gloval discovery” and “relaying” is disabled.

I did find out that on firewall I have open only http port and tcp 22000, so I did enable udp 21027. Then I did restart syncthing on my PC and server {synology} and after while all connection are UP.

Checking the log I can see that syncthing is opening port for KCP

syncthing[2393]: [QIKVK] INFO: TCP listener ([::]:22000) starting
syncthing[2393]: [QIKVK] INFO: KCP listener ([::]:22020) starting

and later

syncthing[2393]: [QIKVK] INFO: kcp://0.0.0.0:22020 detected NAT type: Port restricted NAT
syncthing[2393]: [QIKVK] INFO: kcp://0.0.0.0:22020 resolved external address kcp://176.116.108.197:1026 (via stun.counterpath.net:3478)

and later I can see that my PC is establised connection with server {call oblak} over 22020 port which is not open at firewall and then replacing it with connection over 22000. What does it mean?

syncthing[2393]: [QIKVK] INFO: Established secure connection to 43H33OS-...-AXSNQQX at [::]:22020-192.168.101.3:22020 (kcp-server) (TLS_ECDHE_RS
syncthing[2393]: [QIKVK] INFO: Device 43H33OS-...AXSNQQX client is "syncthing v0.14.40" named "oblak"
syncthing[2393]: [QIKVK] INFO: Connected to already connected device (43H33OS-...AXSNQQX)
syncthing[2393]: [QIKVK] INFO: Established secure connection to 43H33OS-...AXSNQQX at 192.168.101.10:22000-192.168.101.3:53044 (tcp-server) (TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305)
syncthing[2393]: [QIKVK] INFO: Replacing old connection [::]:22020-192.168.101.3:22020/kcp-server with 192.168.101.10:22000-192.168.101.3:53044/tcp-server for 43H33OS-...AXSNQQX
syncthing[2393]: [QIKVK] INFO: Connection to 43H33OS-...AXSNQQX closed: reading length: broken pipe
syncthing[2393]: [QIKVK] INFO: Device 43H33OS-...AXSNQQX client is "syncthing v0.14.40" named "oblak"

calmh · November 10, 2017, 12:10pm

That it established a KCP connection on port 22020 to begin with, then managed a TCP connection and switched to that as it’s better.

imsodin · November 10, 2017, 1:06pm

I guess this is a similar thing as what I usually see with relays: The connection is established quicker than direct (especially) local connections, so connections happen usually like this: unconnected -> relay/kcp -> direct tcp. Not ideal, but not a problem.