Device stays out of sync until restart

Alex · November 17, 2020, 7:54am

I have a problem with one device that does not sync correctly. This is how it looks after starting the desktop (in the remote device section, local state is up to date for all folders):

Bildschirmfoto von 2020-11-17 08-29-40_2

If I change a file on the server it syncs to the desktop, if I change a file on the desktop it does not sync and under remote devices the server shows this one file as out of sync.

The issue affects most folders but not all, some folders sync correctly from desktop to server (looks like the ones that cause the 2% completion).

Any known issue that could cause this? Any logs with debug options enabled needed? Without debug options I do not see anything suspicious.

After restarting syncthing on the desktop everything seems to work correctly after a while. I think it started with v1.11.1 (never used v1.11.0) and is still present after upgrading to v1.12.0-rc.2. While still on v1.11.1 I tried to do --reset-deltas but it did not help.

imsodin · November 17, 2020, 8:32am

This sounds like issue #7034, which was present in v1.11.0-rc.1, however that issue is fixed since v1.11.0-rc.2.

I don’t really understand this: When you restart the problem goes away, but it sounds like it isn’t really gone. Does it reoccur when you do something later or just spontaneously?

model debug logs to see what the index senders are doing would be helpful. However we’ll probably first have to clarify the questions in the last paragraph and then define some steps to do with model debug on to trigger events that hopefully produce helpful output.

Alex · November 17, 2020, 8:41am

I never used v1.11.0-rcX, only the release, also I did not pause any folders.

It usually is in this broken state after starting the computer in the morning (not sure if always, but it happens often). Restarting syncthing afterwards “fixes” it and it will work until the next start of the desktop (server stays on the whole time).

I can enable model debug on both devices and provide the logs as soon as it happens again (probably tomorrow, maybe after some later reboot of the computer or restart of syncthing, I can try to provoke it).

Alex · November 17, 2020, 11:20am

I could reproduce it now after a reboot of the desktop (with different amount of out of sync items [137.987 items, ~32,5 GiB] shown for desktop)

desktop.log (89.2 KB) Device-ID: AC5PIYS

server.log (859.2 KB) Device ID: BORKF6M

Also tried to reproduce a file that does not sync (called “Leer”, added on desktop), in most folders it synced but in “musik-android” it did not.

If I understand the logs correctly “desktop” never sends an index update for the new file in “musik-android”.

imsodin · November 17, 2020, 12:25pm

Thanks a lot for the logs.

The file that does not sync is an unhandled config racyness issue - the remote being out-of-sync might be the same issue. Lets check once this is fixed (looking into it).

Alex · November 18, 2020, 8:06am

Got the latest nightly which should include your fix, the file that does not sync seems to be gone, however the server still thinks the desktop is out of sync (with different amount of items after every reboot), also from what it looks like in the GUI, index exchange seems to happen after every reboot (some MB of transfer and the amount of out of sync quickly drops over time).

New logs: desktop.log (97.8 KB) server.log (742.9 KB)

imsodin · November 18, 2020, 1:43pm

Thanks again!

I am not yet sure if this is the cause, but your logs show a bug introduced in the recent suture-v4 implementation. That’s fixed in https://github.com/syncthing/syncthing/pull/7126, along with one added debug info. Please test with that and share logs if still buggy.

One interesting thing that should however be irrelevant, is that your system times seem out of sync:

Server sends at:

[BORKF] 2020/11/18 08:51:46.090350 indexsender.go:127: DEBUG: indexSender@0xc000bd0a80 for ffxiv-config to AC5PIYS at 192.168.1.12:22000-192.168.1.100:22003/tcp-server/TLS1.3-TLS_AES_128_GCM_SHA256: Sending 81 files (<9037 bytes)

and desktop receives that earlier:

[AC5PI] 2020/11/18 08:51:08.651124 model.go:992: DEBUG: Index (in): BORKF6M / "ffxiv-config": 81 files

Alex · November 18, 2020, 2:40pm

Sadly the server still shows some the desktop as out of sync, got v1.12.0-rc.3.dev.13.g7f63afcb from the build server (I guess that includes your fix)

desktop.log (191.3 KB) server.log (306.2 KB)

About the time difference: I think my system clock on the desktop is a bit broken, some time after boot they seem to match (sync with the internet I guess), the logs here should have matching system times.

imsodin · November 18, 2020, 2:59pm

That now looks like a known, but unfortunately so far elusive problem with indexes/sequences. Starting Syncthing once with -reset-deltas on desktop should fix the problem. If the problem reliably occurs again, more detailed logs would be nice. However given this is a big production environment, this would easily be GBs of logs (with model,db facilities) - so lets first establish what exactly happens before going there.

Alex · November 18, 2020, 3:27pm

It is still somehow broken:

after --reset-deltas it first worked (all up to date)
then rebooting the desktop → server shows desktop as not in sync
restarting syncthing on desktop again → everything up to date as it should be

new logs: desktop.log (231.3 KB) server.log (754.1 KB)

logs contain everything since reboot including the restart of syncthing without reboot

Without much idea about how this works, but this looks weird to me (clocks do not seem to match again, similar 38s difference as before ):

[BORKF] 2020/11/18 16:19:38.680610 indexsender.go:285: DEBUG: Device AC5PIYS folder uni is not delta index compatible
[AC5PI] 2020/11/18 16:18:59.274599 indexsender.go:274: DEBUG: Device BORKF6M folder uni is delta index compatible (mlv=360277)

Desktop thinks server is delta index compatible (for this folder) while server thinks desktop is not, after the restart they agree that both are delta index compatible? (now with matching clocks)

[BORKF] 2020/11/18 16:23:10.400195 indexsender.go:274: DEBUG: Device AC5PIYS folder uni is delta index compatible (mlv=525121)
[AC5PI] 2020/11/18 16:23:10.428138 indexsender.go:274: DEBUG: Device BORKF6M folder uni is delta index compatible (mlv=0)

imsodin · November 18, 2020, 3:27pm

@Alex is testing nightlies, which includes a bugfix that is not in v1.12.0-rc.x. If you see 0% remotes and new changes not being synced, that’s likely what we already debugged and fixed.

Alex · November 19, 2020, 8:38am

The latest build lead to out of sync after starting the desktop this morning, restart did not fix it (only reduced the amount of items that are out of sync), only --reset-deltas did it

For now I’m back with v1.10.0 which seems to work. There is still something a bit broken, sometimes desktop resends index to server I guess the “(mlv=0)” indicates that in the line:

[AC5PI] 2020/11/19 09:33:18.461620 model.go:1090: DEBUG: Device BORKF6M folder archiv is delta index compatible (mlv=0)

but it always seems to get to “up to date” after that.

Will try to see if I can reproduce the issue with a separate instance of syncthing on the same devices (and maybe a copy of the same data), constantly having broken sync is too annoying (and possibly risky) with all my data.

Andy · November 19, 2020, 9:41am

For me it seems similar issue on my side

First all my Synology´s, now also all my Windows 10 computers back on v1.10.0 and it runs well and smoothly again.

Starting with v1.11.x all peers were green, but some of the devices were at a high percentage level (mostly 95%-99%) and got stuck. And that was random in every time different peer(s) and more or less directories in such peer(s).

I also try to use -reset-deltas (seems not work on Synology´s) but without change. I was only able to fix this by disconnecting the affected peer(s) and reconnecting them.

Only with v1.11.1, android (Syncthing-Fork / AArch64) up to now I have no problem.

Alex · November 19, 2020, 7:16pm

I could also reproduce it on a test setup with two devices, desktop and server, (not reliable, but it just happened at least once). The devices share 6 folders with the same content as real folders I sync (except some large files I deleted to reduce the size a bit). Both devices use syncthing v1.12.0-rc.3.dev.13.g7f63afcb.

After start of desktop, server shows out of sync for desktop again. Restarting syncthing on the desktop fixes it. I could provide logs for this case if you want but I guess it is similar to the case with the real setup.

Any other infos needed? Logs with more debugging options enabled?

imsodin · November 19, 2020, 8:42pm

I did not forget/abandon this. The logs you provided already do contain a good bit of info and I might also have found the cause. I’ll let you know when I have a PR to test. Thanks a lot for your help and patience in debugging this problem!

The log I added is actually not always correct. It should be “has no index ID for us” instead of “is not index compatible”, as while that can actually mean it’s not compatible, it can also mean it just doesn’t have any index data (-reset-deltas).

Alex · November 19, 2020, 9:07pm

good to know, not compatible would be weird in this case (but actually it should have the index at that time since the log was from the next start after -reset-deltas and not the one where it was reset)

Will run the build with the test setup as soon as it is available. There the issue does not seem to happen that often so it may need some time to be sure that it does not happen again.

Thanks for looking into this.

imsodin · November 19, 2020, 9:23pm

That’s exactly the issue (at least I hope it’s the, not just an issue) now: https://github.com/syncthing/syncthing/pull/7132

Alex · November 20, 2020, 7:32am

That description in the PR sounds exactly like what is happening.

“This problem likely existed for a long time” I can confirm that v1.10.0 also seems to drop indexes on one side, however it looks like they are resent so it does not matter that much.

Running the build from the PR now on the test setup, before the upgrade I had the issue once again there, so it also seems to happen quite often. Will see if it happens again, if not I will try it on the real setup in a few days.

Andy · November 20, 2020, 12:27pm

Maybe such processes are normally under circumstances, I don’t know how it works with the indexes. I couldn’t judge that. However, there must be a more substantial difference between v1.10.0 and the following. Quiet seems returned since I downgraded all devices.

imsodin · November 20, 2020, 1:05pm

Yes Andy, there have been significant changes related to this in v1.11.1 and v1.12.0 - as documented in the changelog. Thanks to all the debugging in this thread we have actually found problems and are fixing them right now. Downgrading may be a necessary and effective measure, but it’s not a permanent solution nor does it help fix problems.