figure out why sync gets stuck after fetching a few kbyte and never finishes

I struggled hard to get syncthing working with a newly added remote device, and I’d appreciate any help in what to look for in the logs in order to enable me using syncthing more deterministic - currently the success with a new device feels like pure random.

Initial situation: existing host (ubuntu 18.04 LTS server with syncthing v0.14.43-ds1, Linux 32 bit) working fine to sync with another ubuntu host.

Two mobile clients were then attached to that one via syncthing v1.12.1 which is the latest to get via fDroid; on one (Stock Android 6.0) it worked rather straightforward, while I got just mad on the 2nd device (AOSP 10 based LineageOS, same syncthing there) which permanently failed to complete the initial sync.

The basic symptom was that after adding the new android device for getting a directory synced from the ubuntu server, it seemed to start and created lots of *.tmp files but never finished the transfer. Whenever I looked into the files transfer state in the WebGUI, the progress bars never moved onward. I waited for hours and hours, restarting service and entire hosts multiple times and removing and re-inserting the directory links. No improvement, no progress, no completion. That directory is less than 5GB, network is reasonable fast WLAN and ethernet.

When things were stuck, it looked to me like the system is convinced that all folders are up to date, which they clearly were not, as hundreds of files did not yet show up on the receiver device yet.

My current situation is that after some sleepless nights I removed directory links once again and after re-insertion the sync to the new device finally started and actually went on, without me yet having any idea why and how to get done smoothly in future (I’d have to bring in some more devices). So I’m no longer totally roadblocked but just want to reach a state to understand which log entries are hints to problems and which are irrelevant.

I have a hard time seing anything critical in the audit log, thus I’d like to ask for some assistence in interpretation here.

For logging in addition to syslog, I ran the -audit switch to get a log as detailed as possible (the “normal” logfile option never worked for me, the logfile just does not get created - but the audit works); so as I run as a user service, my ~/.config/systemd/user/syncthing.service contains ExecStart=/usr/bin/syncthing -no-browser -no-restart -audit -auditfile=/path/to/my/syncthing-audit.log and I start manually via “systemctl --user start syncthing.service”

1st, the network has been working without dropouts during the trials, and I have all hosts in the same subnet so they should be able to see each other directly. ufw firewall is opened for all syncthing ports on the ubuntu machine, and no extra firewall running on the mobile devices. Anyway I see some alien IP addresses being contacted such that I wonder if I have something going bad.

I see for example

2021-02-07 14:35:10 kcp://0.0.0.0:22020 detected NAT type: Port restricted NAT 2021-02-07 14:35:10 kcp://0.0.0.0:22020 resolved external address kcp://77.0.45.225:22020 (via stun.schlund.de:3478)

and wonder if this is normal when hosts are in same (192.168.1.*) subnet without need to communicate to outside. Also in the WebGUI I see a relay listed in addition to the known direct internal IP.

2nd, I also got (and still get) “protocol error: cluster config message in state 1” log entries for which I could not find an interpretation in forum or issue tracker, for example

2021-02-07 14:38:00 Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: protocol error: cluster config message in state 1

Does this indicate that the endpoints have a general mismatch in protocol? If yes, what could I do about this?

There seem to be other incidents with closed connection, like when I grep syslog for ‘closed’ and this uplink, I see:

Feb 7 02:16:53 miniserv syncthing[30619]: [2AWWF] 02:16:53 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading length: read tcp [fe80::f3f3:a785:2224:a5dc%enx00133b010463]:22000->[fe80::aecf:85ff:fed7:6d6e%enx00133b010463]:38420: use of closed network connection

Feb 7 02:21:00 miniserv syncthing[30619]: [2AWWF] 02:21:00 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: protocol error: cluster config message in state 1

Feb 7 02:21:33 miniserv syncthing[30619]: [2AWWF] 02:21:33 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading length: read tcp [fe80::f3f3:a785:2224:a5dc%enx00133b010463]:48996->[fe80::aecf:85ff:fed7:6d6e%enx00133b010463]:38420: use of closed network connection

Feb 7 02:22:26 miniserv syncthing[30619]: [2AWWF] 02:22:26 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading message: read tcp [fe80::f3f3:a785:2224:a5dc%enx00133b010463]:22000->[fe80::aecf:85ff:fed7:6d6e%enx00133b010463]:38420: use of closed network connection

Feb 7 02:30:02 miniserv syncthing[30619]: [2AWWF] 02:30:02 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading message: read tcp 192.168.1.101:22000->192.168.1.165:38420: use of closed network connection

Feb 7 02:34:55 miniserv syncthing[30619]: [2AWWF] 02:34:55 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: protocol error: cluster config message in state 1

Feb 7 02:37:48 miniserv syncthing[30619]: [2AWWF] 02:37:48 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading message: read tcp 192.168.1.101:22000->192.168.1.165:38420: read: connection reset by peer

Feb 7 02:50:19 miniserv syncthing[3625]: [2AWWF] 02:50:19 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading message: read tcp 192.168.1.101:22000->192.168.1.165:38420: use of closed network connection

Feb 7 02:51:13 miniserv syncthing[3625]: [2AWWF] 02:51:13 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: protocol error: cluster config message in state 1

Feb 7 02:51:15 miniserv syncthing[3625]: [2AWWF] 02:51:15 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: reading length: read tcp [fe80::f3f3:a785:2224:a5dc%enx00133b010463]:22000->[fe80::aecf:85ff:fed7:6d6e%enx00133b010463]:38420: read: connection reset by peer

Feb 7 02:52:11 miniserv syncthing[3625]: [2AWWF] 02:52:11 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: protocol error: cluster config message in state 1

Does this indicate some issue which could cause my problem? Any suggestion how to narrow down further?

Does it harm when I switch off using Relay in the config.xml for a pure-local network setup like mine? I expect it is ok to disable relays as I see log blocks in syslog like:

Feb 7 03:06:41 miniserv syncthing[10111]: [2AWWF] 03:06:41 INFO: Joined relay relay://195.219.163.89:443

Feb 7 03:07:02 miniserv syncthing[10111]: [2AWWF] 03:07:02 INFO: kcp://0.0.0.0:22020 detected NAT type: Port restricted NAT

Feb 7 03:07:02 miniserv syncthing[10111]: [2AWWF] 03:07:02 INFO: kcp://0.0.0.0:22020 resolved external address kcp://77.190.226.206:22020 (via stun.internetcalls.com:3478)

Feb 7 03:07:34 miniserv syncthing[10111]: [2AWWF] 03:07:34 INFO: Disconnected from relay relay://195.219.163.89:443

Feb 7 03:07:34 miniserv syncthing[10111]: [2AWWF] 03:07:34 INFO: Exiting

3rd, I then get log lines like {"id":58,"globalID":58,"time":"2021-02-07T03:06:17.926096021+01:00","type":"FolderSummary","data":{"folder":"mysync-myserv","summary":{"globalBytes":5212201054,"globalDeleted":63,"globalDirectories":1,"globalFiles":1087,"globalSymlinks":0,"ignorePatterns":false,"inSyncBytes":5212201054,"inSyncFiles":1087,"invalid":"","localBytes":5212201054,"localDeleted":61,"localDirectories":1,"localFiles":1087,"localSymlinks":0,"needBytes":0,"needDeletes":0,"needDirectories":0,"needFiles":0,"needSymlinks":0,"sequence":11391,"state":"idle","stateChanged":"2021-02-07T03:06:17.888145219+01:00","version":11391}}}

In this last line I understand that syncthing knows there are 5212201054 bytes on the remote side, and it claims that those are also inSync now and also same volume is local already. Unfortunately the reality at this point in time has been that many .*.tmp files have been created on the receiving target, but they never got renamed to the real filename and the sync never finished.
Such a log line gets repeated more than 600 times, but the correct final filenames never showed up on the target. I tried restarting ubuntu and android clients over and over again, no effect.

4th, syslog for a trial launch contains lines like

Feb 7 03:06:12 miniserv syncthing[10111]: [2AWWF] 03:06:12 INFO: syncthing v0.14.43-ds1 “Dysprosium Dragonfly” (go1.9.3 linux-386) debian@debian 2018-02-06 04:01:09 UTC [noupgrade]

[…]

Feb 7 03:06:15 miniserv syncthing[10111]: [2AWWF] 03:06:15 INFO: TCP listener ([::]:22000) starting

Feb 7 03:06:15 miniserv syncthing[10111]: [2AWWF] 03:06:15 INFO: Relay listener (dynamic+https://relays.syncthing.net/endpoint) starting

Feb 7 03:06:15 miniserv syncthing[10111]: [2AWWF] 03:06:15 INFO: KCP listener ([::]:22020) starting

[…]

Feb 7 03:06:16 miniserv syncthing[10111]: [2AWWF] 03:06:16 INFO: Completed initial scan of readwrite folder “mysync” (mysync-myserv)

Feb 7 03:06:17 miniserv syncthing[10111]: [2AWWF] 03:06:17 INFO: Established secure connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AANN at 192.168.1.101:56004-192.168.1.165:38420 (tcp-client) (TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305)

Feb 7 03:06:17 miniserv syncthing[10111]: [2AWWF] 03:06:17 INFO: Device BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN client is “syncthing v1.12.1” named “Nexus 6P”

Feb 7 03:06:17 miniserv syncthing[10111]: [2AWWF] 03:06:17 INFO: Connection to BSO42M3-XXXXXXX-DAXJ2BC-YG2FYUA-ZZZZZZZ-57MPMC2-WWJOSO5-NRL7AAN closed: protocol error: cluster config message in state 1

[…]

Feb 7 03:06:19 miniserv syncthing[10111]: [2AWWF] 03:06:19 INFO: for quic://192.168.1.165:38420: unknown address scheme “quic”

Feb 7 03:06:19 miniserv syncthing[10111]: [2AWWF] 03:06:19 INFO: for quic://77.190.226.206:38420: unknown address scheme “quic”

Feb 7 03:06:19 miniserv syncthing[10111]: [2AWWF] 03:06:19 INFO: for quic://[some-IPv6-adress%25enx00133b010463]:38420: unknown address scheme “quic”

I cannot determine if any of these, especially that quic stuff, indicate a problem possibly causing the sync getting stuck. Network issues are unlikely as I could connect between the devices back and forth all the time, and network uplink to outside also working. I’ve set up the folder shares in a star topology, each mobile device syncing only to the ubuntu server.

Currently I have the feeling that I cannot get enough insight into operating syncthing in a deterministic way for me, which creates a strong drive towards alternatives, which I think is a pity as the system, once working, seems really smooth+simple. I would be very happy to get more detailed info in the WebGUI to see what is being done right now and what is suboptimal. The raw log file does not help that much to extract the critical lines.

Also the WebGUI status often looks just wrong to me, especially the “last change” entry in the folder view cannot be the actually last changed file, it would be so much better to have a reliable info there (also after the sync has finished) which tells me what actually has been transferred latest. To some extend, my attempts to understand the log lines are an approach to workaround missing useful status info in the WebGUI. In the end, I’d strive to have the rate of synced files (volume) monitored by munin or the like to see in the server dashboard which host has been really updated when.

Any help deeply appreciated, thanks in advance.

This is an ancient version that isn’t compatible with current versions. Upgrade it.

2 Likes

Oh cool. The currently still supported ubuntu 18 LTS unfortunately installs only this version; I’d propose to slightly update the FAQ — Syncthing v1 documentation to mention that Syncthing | Downloads gives the apt source for latest stable; just telling users to use the package manager to upgrade is technically right but when reader not guessing the point with the necessary repo this possibly ends up with such an outdated version as I had. :frowning:

Update is done and seems to have worked smoothly (now v1.13.1, Linux (32-bit Intel/AMD) “Fermium Flea” Build 2021-01-11 (noupgrade)); I’m looking much more optimistic to adding the next device now, although I have still no clue if there is anyway a problem in my configuration.

Sorry for having bothered with old stuff. I really thought those versions are close and just the numbering scheme changed inbetween. :frowning:

Maybe consider during service startup to throw a warning in the log channel when there is a huge version diff between client and server (protocols). Would possibly have saved me many hours of stupid trials.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.