Device stays out of sync until restart

That now looks like a known, but unfortunately so far elusive problem with indexes/sequences. Starting Syncthing once with -reset-deltas on desktop should fix the problem. If the problem reliably occurs again, more detailed logs would be nice. However given this is a big production environment, this would easily be GBs of logs (with model,db facilities) - so lets first establish what exactly happens before going there.

Just fyi, as I unfortunately wasn’t able to capture logs about this. I’ve encountered this problem too after upgrading devices A + B from v1.12.0-rc.2 to v1.12.0-rc.3 a few minutes ago.

  • dev A showed remote dev B as “up2date”.
  • dev B showed remote dev A as “syncing 0%” with thousands of files mentioned below that being out of sync.

Side notes:

  • Before I upgraded the RC, everything was fully in sync.
  • I did pull up the web UI’s log window and there weren’t any errors (but I did not enable DEBUG facilities, just with default log).
  • After waiting 10 minutes, the “syncing 0%” did not change on dev B.
  • After STRG+F5 and deleting browser caches, the “syncing 0%” did not change on dev B.
  • After restarting Syncthing on dev B, it showed remote dev A immediately and CORRECTLY as “up2date” again.

That’s why I guess I was also hit by the above described problem.

It is still somehow broken:

  • after --reset-deltas it first worked (all up to date)
  • then rebooting the desktop → server shows desktop as not in sync
  • restarting syncthing on desktop again → everything up to date as it should be

new logs: desktop.log (231.3 KB) server.log (754.1 KB)

logs contain everything since reboot including the restart of syncthing without reboot

Without much idea about how this works, but this looks weird to me (clocks do not seem to match again, similar 38s difference as before :smiley: ):

[BORKF] 2020/11/18 16:19:38.680610 indexsender.go:285: DEBUG: Device AC5PIYS folder uni is not delta index compatible
[AC5PI] 2020/11/18 16:18:59.274599 indexsender.go:274: DEBUG: Device BORKF6M folder uni is delta index compatible (mlv=360277)

Desktop thinks server is delta index compatible (for this folder) while server thinks desktop is not, after the restart they agree that both are delta index compatible? (now with matching clocks)

[BORKF] 2020/11/18 16:23:10.400195 indexsender.go:274: DEBUG: Device AC5PIYS folder uni is delta index compatible (mlv=525121)
[AC5PI] 2020/11/18 16:23:10.428138 indexsender.go:274: DEBUG: Device BORKF6M folder uni is delta index compatible (mlv=0)

@Alex is testing nightlies, which includes a bugfix that is not in v1.12.0-rc.x. If you see 0% remotes and new changes not being synced, that’s likely what we already debugged and fixed.

1 Like

The latest build lead to out of sync after starting the desktop this morning, restart did not fix it (only reduced the amount of items that are out of sync), only --reset-deltas did it :confused:

For now I’m back with v1.10.0 which seems to work. There is still something a bit broken, sometimes desktop resends index to server I guess the “(mlv=0)” indicates that in the line:

[AC5PI] 2020/11/19 09:33:18.461620 model.go:1090: DEBUG: Device BORKF6M folder archiv is delta index compatible (mlv=0)

but it always seems to get to “up to date” after that.

Will try to see if I can reproduce the issue with a separate instance of syncthing on the same devices (and maybe a copy of the same data), constantly having broken sync is too annoying (and possibly risky) with all my data.

For me it seems similar issue on my side

First all my Synology´s, now also all my Windows 10 computers back on v1.10.0 and it runs well and smoothly again.

Starting with v1.11.x all peers were green, but some of the devices were at a high percentage level (mostly 95%-99%) and got stuck. And that was random in every time different peer(s) and more or less directories in such peer(s).

I also try to use -reset-deltas (seems not work on Synology´s) but without change. I was only able to fix this by disconnecting the affected peer(s) and reconnecting them.

Only with v1.11.1, android (Syncthing-Fork / AArch64) up to now I have no problem.

I could also reproduce it on a test setup with two devices, desktop and server, (not reliable, but it just happened at least once). The devices share 6 folders with the same content as real folders I sync (except some large files I deleted to reduce the size a bit). Both devices use syncthing v1.12.0-rc.3.dev.13.g7f63afcb.

After start of desktop, server shows out of sync for desktop again. Restarting syncthing on the desktop fixes it. I could provide logs for this case if you want but I guess it is similar to the case with the real setup.

Any other infos needed? Logs with more debugging options enabled?

I did not forget/abandon this. The logs you provided already do contain a good bit of info and I might also have found the cause. I’ll let you know when I have a PR to test. Thanks a lot for your help and patience in debugging this problem!

The log I added is actually not always correct. It should be “has no index ID for us” instead of “is not index compatible”, as while that can actually mean it’s not compatible, it can also mean it just doesn’t have any index data (-reset-deltas).

good to know, not compatible would be weird in this case :smiley: (but actually it should have the index at that time since the log was from the next start after -reset-deltas and not the one where it was reset)

Will run the build with the test setup as soon as it is available. There the issue does not seem to happen that often so it may need some time to be sure that it does not happen again.

Thanks for looking into this.

1 Like

That’s exactly the issue (at least I hope it’s the, not just an issue) now: https://github.com/syncthing/syncthing/pull/7132

1 Like

That description in the PR sounds exactly like what is happening.

“This problem likely existed for a long time” I can confirm that v1.10.0 also seems to drop indexes on one side, however it looks like they are resent so it does not matter that much.

Running the build from the PR now on the test setup, before the upgrade I had the issue once again there, so it also seems to happen quite often. Will see if it happens again, if not I will try it on the real setup in a few days.

2 Likes

Maybe such processes are normally under circumstances, I don’t know how it works with the indexes. I couldn’t judge that. However, there must be a more substantial difference between v1.10.0 and the following. Quiet seems returned since I downgraded all devices.

Yes Andy, there have been significant changes related to this in v1.11.1 and v1.12.0 - as documented in the changelog. Thanks to all the debugging in this thread we have actually found problems and are fixing them right now. Downgrading may be a necessary and effective measure, but it’s not a permanent solution nor does it help fix problems.

1 Like

Tested latest fixes on the test setup for a few days without issues so I also tried it today on the real instances: also no out of sync or other issues even after multiple reboots :slight_smile: So i guess my problems are fixed.

2 Likes

Great news!

I know I sound like a broken record, but I can’t help it: Thank you so much for providing comprehensive and speedy info while debugging this issue (well, issues :slight_smile: ).

2 Likes

Thank you for the quick fixes :slight_smile:

1 Like

Which are meant?

Those should be the PRs fixing the issues found here (or even more), latest nightly has them, next release containing them will probably take a while:

2 Likes

Is the result maybe v1.12.0-rc.4 ?

New 0.12.0 RC would be for fixing things we specifically broke in 0.12.0, I don’t think these are that.

1 Like