v1.8.0-rc.3 panic during "upgrade restart"

Hi,

I did the Syncthing upgrade a few minutes ago because of the versioning bug that’s been solved in rc3. After hitting the upgrade button on the Web UI, the upgrader restarted Syncthing without me doing anything.

Syncthing came back to life again. It was scanning before the update and continued scanning a folder after the update, according to the Web UI.

I noticed that three panic logs were created shortly after the upgrade started to carry out.

Link: https://pastebin.com/2HewtFfM

All three look similar, pointing at d.(*Snapshot).GetGlobal and “panic: device present in global list but missing as device/fileinfo entry”.

Might be related to https://github.com/syncthing/syncthing/issues/6855 but my instance did not go offline forever. Only looking at the web UI I wouldn’t have even noticed there was a panic.

Kind regards, Catfriend1

Meaning it recovered after panicking three times? That’s weird.

And there’s no “Checking db due to upgrade” in the pastebin - did it occur after updating to rc.3?

Hmm I thought I’ve uploaded the first log but now I’m unsure if I clicked the wrong of the three files and maybe the expected line was in the first log after upgrade… hmmm. I would also expect that line being there. Could this checking DB mechanism be omitted by the code for some reason during upgrade? Will look more closely next time an upgrade comes in.

My cluster is the one mentioned in the topic ( [v1.6.1] Local and global state swapped between two nodes ) . We’re talking about devB here, yes, and it recovered by itself, still up and running for 54 min after the upgrade with the three crash logs.

image

The thing is, devC is the one constantly crashing like I wrote on the github issue. devC also was the device which had the mysterious “24 items out of sync” issue where no items were displayed on the Web UI when this was clicked. It seems to have something to do with receiveOnly folders and out-of-sync state before upgrading? restarting?

Don’t you have the logs (not panic files) anymore?

And responding to your message on the PR (because the restarting stuff discussed here is essential to it):

can I run this straight off the PR’s teamcity build to get more diagnose info what’s wrong with my cluster? I expect the panic to happen again as soon as Syncthing v1.8.0-rc.3 finishes the rescan on the node where it crashed constantly before.
https://github.com/syncthing/syncthing/pull/6861#issuecomment-665026753

That PR “enhances” the check/repair happening on upgrade. As such if it is relevant to your problem, it will temporarily fix it and thus provide the insight that your problem was related to blocks and remove the actual problem (i.e. no more info on it available). If you do this, please first run without the PR with STRECHECKDBEVERY=1s, just in case the db check really didn’t run, and then apply the PR if it didn’t help.

As to the device C and what you describe on the issue: The old device ID (shortID) showing in the version is perfectly normal, even if that device doesn’t exist anywhere anymore. That’s not an indication of a problem but working as expected.

All in all I still don’t have an idea what might be causing these panics.

Edit: Actually if you have the db of the device that started panicking after a full db reset while scanning, I’d be interested in that. I know I asked for/got lots of dbs from you without delivering anything, I hope you still some faith left :wink:

1 Like