Logs from the connection issues cause by 14.40 (and still present in 14.43)

Hi there!

I had my server sync’ing to my father-in-law’s server for quite some time with syncthing, and it worked great. We upgraded to 14.40, and it broke. I commented on the issue here, and I was hoping it would resolve itself with future updates. I’m on 14.43 now, and now that I can grab logs easily from the UI, I figured I’d upload them here to see if they can help.

machine1 (5.9 KB) machine2 (14.1 KB)

Machine2 is the machine in my house. It’s a FreeNAS server that also syncs with a FreeNAS server at my parents’ house.

Machine1 is the machine at my in-laws’ house. It’s an Ubuntu desktop machine, used as a server, with a USB hard drive attached to it. We primarily run Plex with it.

Let me know if there is any additional detail I can include. I hope this issue can be resolved. Thanks so much!

I suspect it’s the slow KCP problem here. You should try to setup TCP port mappings on the router to work around it.

At my father-in-law’s place, he doesn’t have access to the router. But why would this have worked just fine until version .40 and then stopped working? Are there logs or other information I can dig up so we can fix the actual issue?

You only need to set tcp ports up on one side for it to work. This is definitely the best way to go, as it will give you the most stable and fast connection. The reason it worked before is most likely that you connected by relay. Since .40 kcp is enabled by default which takes precedence over relays. Looking at the logs you provided it is able to establish a kcp connection, but then it fails after ~1min on one device and the other device reports tons of connected to already connected devices. Does that already ring a bell @AudriusButkevicius?

In any case it would be good (or at least not hurt :wink: ) if you could collect debug logs from both sides before solving it via tcp port mappings. Just enable the connection facility in the tab next to the logs in the web UI (and probably pause and resume the remote device such that the connection is reset).

So I am pretty sure it’s some sort of delay/window size stuff in KCP, but I don’t have a setup that exhibits these issues to reproduce.

I checked a couple extra boxes in the logging tab, and then I restarted both nodes and grabbed the logs. Anything useful or interesting in there that might help in fixing the 14.40 introduced bug?

local (9.6 KB) remote (5.6 KB)

Thanks!

Send rate limit is 25 KiB/s, receive rate limit is 800 KiB/s

This is probably your problem, the rate is probably too low to even work, especially when dealing with initial data exchanges.

Also, I think you have something misconfigured, it seems the connections are happening to the GUI port:

Established secure connection to 3WAMNRI-AHOWLOL-5DR3MMZ-YSL7HX3-QC5MKTR-EHDTCCH-FKTVKG6-JNFHUQM at 192.168.2.200:31032-72.203.97.200:8043 (tcp-client) (TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305)

Thanks for the feedback. I quadrupled the rate limit to 100kbps. In terms of having something misconfigured, I’m pretty sure I went with the defaults on most things, with the exception of a rate limit. What would I tweak?

So even with the rate quadrupled, the issue is not resolved. This log message continues to loop:

2018-01-14 20:19:17 Device DXJYZRX-LNU5C22-IDE64DR-QPQMV6Y-FAJ7CH6-AVIJQSE-D7ESVCI-HVHCEQJ client is "syncthing v0.14.43" named "Blah"
2018-01-14 20:19:17 Connection to DXJYZRX-LNU5C22-IDE64DR-QPQMV6Y-FAJ7CH6-AVIJQSE-D7ESVCI-HVHCEQJ closed: reading length: EOF
2018-01-14 20:19:32 Established secure connection to DXJYZRX-LNU5C22-IDE64DR-QPQMV6Y-FAJ7CH6-AVIJQSE-D7ESVCI-HVHCEQJ at [::]:22020-76.102.228.103:22020 (kcp-client) (TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305)
2018-01-14 20:19:32 Device DXJYZRX-LNU5C22-IDE64DR-QPQMV6Y-FAJ7CH6-AVIJQSE-D7ESVCI-HVHCEQJ client is "syncthing v0.14.43" named "Blah"
2018-01-14 20:19:32 Connection to DXJYZRX-LNU5C22-IDE64DR-QPQMV6Y-FAJ7CH6-AVIJQSE-D7ESVCI-HVHCEQJ closed: reading length: EOF

This issue didn’t happen prior to 14.40. I know others have been affected as well. On the other machine, this keeps looping:

2018-01-14 20:31:47 Connected to already connected device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT (existing: [::]:22020-104.6.66.185:22020/kcp-server new: [::]:22020-104.6.66.185:22020/kcp-server)
2018-01-14 20:32:12 Connected to already connected device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT (existing: [::]:22020-104.6.66.185:22020/kcp-server new: [::]:22020-104.6.66.185:22020/kcp-server)
2018-01-14 20:32:46 Connected to already connected device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT (existing: [::]:22020-104.6.66.185:22020/kcp-server new: [::]:22020-104.6.66.185:22020/kcp-server)
2018-01-14 20:33:10 Connected to already connected device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT (existing: [::]:22020-104.6.66.185:22020/kcp-server new: [::]:22020-104.6.66.185:22020/kcp-server)
2018-01-14 20:33:31 Connection to ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT closed: read timeout
2018-01-14 20:33:44 Established secure connection to ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT at [::]:22020-104.6.66.185:22020 (kcp-server) (TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305)
2018-01-14 20:33:44 Device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT client is "syncthing v0.14.43" named "Bart's Plex Server"
2018-01-14 20:34:16 Connected to already connected device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT (existing: [::]:22020-104.6.66.185:22020/kcp-server new: [::]:22020-104.6.66.185:22020/kcp-server)
2018-01-14 20:34:27 Connected to already connected device ZFU4GFF-F7XX5V4-CVEJVFQ-GQRSPYH-SXXZ6A5-DAQ2UT5-MIAMQPL-VI3VQAT (existing: [::]:22020-104.6.66.185:22020/kcp-server new: [::]:22020-104.6.66.185:22020/kcp-server)

Let me know if there is more detailed information I can provide regarding this bug. Thanks!

Another thought… Why would the initial 25kbps limit be an issue? This Syncthing node syncs with TWO other machines. I’m having difficulty with one, post 14.40 (worked great beforehand), but the other one had no issue previously and still does not.

The issue is the fact that it uses KCP as the transport. Not tcp and not a relay. I suspect any kind of rate limiting might cause it issues. You can try removing limits all together and see if it works. Your other option would be to make sure you have TCP port mappings setup properly, so that the devices could connect using tcp.

Potentially dumb question… Why did everything work great up until 14.40? Isn’t the fact that everything worked, with no router configuration, just great, and now it doesn’t, indicative of some kind of regression? I’ve posted all the logs I know how to, but I’m happy to post more. I’ve seen others have similar issues in another thread, and I’d love to do my part in helping to solve this issue. Thanks!

Because kcp was enabled by default in 0.14.40 or 41.

Update: Your speculation that this had to do w/ a router setting was incorrect. I wiped out the settings folder on one of the remote machines, and now it’s syncing. It appears that even though 90% of the data was already in sync, something is causing it to try and re-sync everything… It looks like something in the syncthing upgrade process broke it a while back. So yeah it’s working now - just starting over-ish from a sync perspective.

Resyncing doesn’t mean redownloading. It still needs to match metadata even if content is the same.

After two days of sync’ing, with a connection constant the entire time, we’re now back where we were before with the broken pipe errors. Considering that we had over 48 hours of perfect connectivity, I don’t think it’ s router config issue. Anything I can do to help solve this? Again, this never happened pre 14.40.

So you can disable kcp by changing listen addresses on both devices. There was another thread on the forums explaining how to do this.

Out of curiosity, if it was sync’ing for two days without issue, and now it’s not, how would KCP be the issue?

Perhaps, if you have the logs you can check connection types before and now.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.