Database migration 1.6.0~rc2-rc4

rustycanb · May 15, 2020, 4:22am

After upgrading to 1.6.0~rc2, I get this message in the yellow notice box: 2020-05-15 14:18:13: Non-increasing sequence detected: Checking and repairing the db…

Waited 15 minutes and no change…

Local device is showing 4 files out of sync for one folder, while remote device is showing up-to-date. Global state is 685 files on both sides. Rscyncing (dry-run) shows the files are synced.

Starting STRECHECKDBEVERY=1s syncthing makes no difference.

Local device is Ubuntu 20.04 laptop, remote is RPI3 with Raspian Stretch. Both were running ST 1.5.0 successfully before upgrade.

Any ideas?

imsodin · May 15, 2020, 7:46am

Can you please post screenshots from both sides.

On a side note:

That does not necessarily mean the issue happened on v1.6.0: We resend the entire index on upgrade, during which sequences can be checked and thus the warned about sequence inconsistency detected.

rustycanb · May 15, 2020, 8:29am

This screenshot is from my laptop model (E7270) showing the out-of-sync files.

I didn’t take a screenshot of the RPI3 because it was all green, ie up-to-date.

I had to nuke the databases and revert to 1.5.0 in order to continuing working (sorry) but I did keep a copy of the database version 1.6.0~rc2 on my laptop, E7270.

After reverting to 1.5.0, everything is working again as expected.

BTW, this is my second attempt to upgrade to 1.6.0~rc1 and ~rc2 both failed with the same message. Both times I had to reset the database as reverting to 1.5.0, ST would fail to start, presumably because of the database migration.

I’m happy to try again, if needed. There is nothing special about my setup, all peers are default, no ignores, all permissions set, etc.

imsodin · May 15, 2020, 8:45am

Thanks for preserving. Do you per chance also have the db from the pi? Could you please run stindex -mode idxck /path/to/the/copied/index-v0.14.0.db on it/them. You can get stindex from: https://build.syncthing.net/buildConfiguration/Syncthing_Tools/65011?buildTab=artifacts

When both are up-to-date but the laptop shows the pi as syncing, that usually means the pi didn’t send some information to the laptop (it could also mean the laptop dropped that info, but I am not aware that happened before).

Do I understand correctly: You have done a try going directly from 1.5.0 to 1.6.0-rc.2?

And yes, db downgrade from 1.6.0 to 1.5.0 is not possible.

rustycanb · May 15, 2020, 9:08am

@imsodin Thanks for your quick response.

No, I didn’t keep the PI side database (dumb, sorry).

The stindex result on the nuked laptop db is: 1 block list entries out of 1026 needs GC

Yes, correct. I have tried going from 1.5.0 to 1.6.0~rc1 and failed as desribed. And going from 1.5.0 to 1.6.0~rc2 and that resulted in my post.

imsodin · May 15, 2020, 9:11am

That’s ok, just informative, not indication of a problem.

If you upgrade again and encounter the same problem, please run stindex on the PI side and report again.

rustycanb · May 15, 2020, 9:20am

I will. Maybe not tonight, I’m still cleaning up all the conflicted files from the reversion to 1.5.0.

Thanks again, you are all awesome.

Nummer378 · May 15, 2020, 11:59am

Sorry for thread hijacking, just wanted to chime in that I encountered similar things on upgrade to 1.6.0-RC.1 and RC.2.

Initially, after upgrading to 1.6.0-RC.1 (on a single Windows device) I also got the message “Non-increasing sequence detected: Checking and repairing the db…”. As everything looked okay, I did not take any further actions.

A while later I looked at the GUI of the Windows device and it showed a single remote device out of sync, with about 3 files apparently not synced. The remote device on the other hand claimed to be up to date, which seemed to be correct.

After upgrading to 1.6.0-RC.2 I got the message again (I think - at least I remember clicking the yellow warning box again), and again the Windows device showed 5 files out of sync for a remote device - which again, seems to be just a visual thing, I believe everything is synced just fine. The “apparently” not in sync files were different from those shown on RC.1, but from the same (syncthing-) folder.

However, after a couple of reboots, it seems that the out of sync items have changed again: Now the remote devices dialog claims that 2 items are out of sync, for a single remote device. When I click on the out of sync list, it just shows a single item - from the same folder as above.

Just for fun, I ran stindex -mode idxck on the Windows machine and it printed ~hundreds of lines like this (it only outputted such lines, completed with a final line concerning GC):

Missing need entry for needed file "Eclipse_Oxygen/features/org.eclipse.egit.mylyn_4.9.2.201712150930-r/META-INF/ECLIPSE_.RSA", folder "hxbhn-gsktf"

I did random samples on the mentioned files and every tested file was actually non-existent on the local device - the printed files all seem to be old files that have been deleted long ago. They’re from at least five different folders.

As stuff seems to work okay from my perspective I haven’t touched anything expect running stindex just now. If you want to debug something out of this setup I’m fine with that, otherwise I will just leave it as-is and check if it blows up at some point.

Catfriend1 · May 15, 2020, 12:12pm

Maybe not related ?! But: I’ve upgraded my Android phone to test v1.6.0-rc.2 together with my Windows ST v1.5.0. the first time I started my app to sync outstanding changes over everything went fine. The second time I wanted to do the same, the phone reported no connection to the Windows device while the win st had a connection attempt of the phone in its log (but nothing more after that line). I suspect there was some error on the phone side. But due to missing root perms I couldn’t grab the log. Both devices are on the same local network, configured and always working with static ip’s before that event occured. I’ll report more details if it happens again . After closing and restarting ST Android the connection was fine again and syncing completed well.

imsodin · May 15, 2020, 12:33pm

Haven’t looked at your case closely, but chances are you are affected by https://github.com/syncthing/syncthing/issues/6650

rustycanb · May 17, 2020, 6:02am

An experiment: new installation on cloned bootable USB keys, but with separate installations of ST 1.6.0~rc4. About 900 files in one folder, synced initially after both sides needed to restart ST.

I deleted several files from both sides. One side shows the correct number of files and the other shows the device sync O%, 502MB but the number of files in the folder is correct.

Downgrade one side to 1.5.0 and delete its database and all is quickly sync’ed on both sides. Both devices and folders are showing up-to-date.

Screenshots of the broken state, both sides 1.6.0~rc4:

Please correct the title to reflect latest rc.

imsodin · May 17, 2020, 3:14pm

Not sure I get entirely what happened. Is the following correct?
You started completely fresh on 1.6.0-rc.4 on both sides, they got in sync correctly. Then you restarted Syncthing. Then deleted a few files on both sides. Result is what you show in the screenshots.

What’s shown when you click on the 925 out-of-sync items?

rustycanb · May 17, 2020, 11:31pm

Yes, you are correct, that’s exactly what I did.

When I click on the out-of-sync items, the modal dialogue is empty.

I have the databases from both side in the failed state.

rustycanb · May 19, 2020, 1:13am

If one side or the other is reverted to 1.5.0 and the database reset (necessary because of the db migration), ST functions as expected.

imsodin · May 19, 2020, 7:44am

That’s a clear indication the problem comes from the new “remote need accounting” in v1.6.0. Reverting to v1.5.0 thus clearly fixes it, but unfortunately doesn’t bring any clarity.

If it happens again, please collect logs and make them available.

rustycanb · May 19, 2020, 9:44am

This is an experimental installations, so screwing it up is ok.

I upgraded the 1.5.0 side to 1.6.0-rc4 and the results were as reported earlier. I then reset the databases on both sides, added and deleted files on both sides and everything seems OK, ie all green up-to-date, global and local counts matched and rsync confirms matching files (folder mod times don’t match, as expected).

Simon, I can roll back as I kept the DBs from both sides in their failed state. What logs might help?

On my production installation, I am reluctant to go to 1.6.0-* just yet, because after my earlier experience (first post) it took some time to clean up the conflicting files and get other nodes in sync again.

imsodin · May 19, 2020, 10:06am

Please spell stuff like the “as reported earlier” out: Maybe a link or at least a very short outline. Is it the remote device 0% syncing issue here?

As asked before, please start by providing the logs of the “latest experiment” (upgrade to 1.6.0) so I can potentially get an idea what to target in further debugging.

rustycanb · May 19, 2020, 11:07am

Simon, thanks for your response.

When both sides are on 1.6.0-*, having migrated from 1.5.0, there is the problem I have described in the earlier posts with screenshots (Database migration 1.6.0~rc2-rc4).

When one side is reverted to 1.5.0 and the other is on 1.6.0*, ST works as expected.

Both sides work successfully on 1.6.0-rc4 if I reset databases on both sides. (But you warn against this radical step in other posts; in my experimental setup, it doesn’t matter, the data to merely 500mg of copied pdf files.)

I said earlier that I can roll back because I have kept the databases from the migration from 1.5.0 to 1.6.0-*. What logs to you need to help debug this issue? I have run stindex -mode idxck on both sides in the failed state and I get 11 block list entries out of 1031 needs GC.

I don’t know how to describe this in any other way, I’m a simple journalist, not a developer! ST has been a great tool for me, I am simply trying to help prevent a potential bug from going wild in release 1.6.0. If this is a widespread problem, surely others will also report it, @Nummer378 seemed to have a similar issue earlier in this thread.

imsodin · May 19, 2020, 11:16am

I clearly need to intersperse some statements saying the following more:
I very much appreciate you reporting problems, especially in RCs, and then being willing to debug it - thanks a lot!
That sentiment is definitely on my mind, but I guess it doesn’t get through in terse responses asking for clarification/additional info.

As to the previously reported problems: The thing is some of them were on earlier RCs (1-3), which had known problems, that’s why we released a new RC. So that earlier information is now important context, but no longer directly applicable to RC 4.

I am looking for the regular syncthing log. Where those are saved depends on how you run Syncthing. If I had to guess I’d say probably as systemd service? Then you can get logs with the journalctl command. If you start it manually, logs are printed to the console and not saved unless you redirect the output. There’s no need for stindex output at this point, it looks like the problem is somewhere else.

rustycanb · May 19, 2020, 11:41am

Thanks, Simon. I appreciate your clarification and patience.

It’s late on this side of the covid-infested planet, but I will start again tomorrow with a clean installation on both side using rc4 and dummy data, see what happens and report back. (Although I tried to understand the changes between rc1-4, I missed/misunderstood the change you mentioned.)

BTW, I always update to the release candidate and over many years this is the first time doing so has caused me any problems. Once again, you, @calmh and @AudriusButkevicius and other contributors are amazing.

Thanks, R