Repeated panics

Running syncthing v1.30.0 on Linux Mint 20.3 (Ubuntu 20.04), with the state DB on an encrypted home directory hosted on an SSD, and with the synchronized directories hosted on physical HDD (both ext4 fs), with total size around 3TB.

I’m getting repeated panics on a host that knows of 5 remote devices (a sixth was recently removed and 3 of the 5 are active). It seems that usually this is the first:

panic: device present in global list but missing as device/fileinfo entry [recovered]
        panic: device present in global list but missing as device/fileinfo entry

It auto-restarts successfully, but then subsequently there is this, which is unrecoverable (panics immediately upon restart)

panic: filling Blocks: leveldb/table: corruption on data-block (pos=56567367): checksum mismatch, want=0x369539f6 got=0x28d09563 [file=020012.ldb]

The FAQ says to delete the database and start over if there is such a panic. I have done this at least twice (which takes a loooong time) and had panics again after each. Removing the suspect .ldb file also allows recovery (more quickly) but the errors also recur.

FAQ also says this might be due to hardware errors. I have not been able to detect problems with the HDD or the SSD using fsck or smartctl and have not noticed any other signs of corruption.

Note also this is a device that has had syncthing since well before v1.0. As noted, the database has been regenerated, but it’s conceivable that some obsolete configuration has been inadvertently retained.

Any idea what might be going wrong? Is there any other troubleshooting I can do?

This may not be applicable, but is Syncthing running as the same user as the home directory owner?

I always try to go back to the “last point of human intervention” (since its usually a human that did something to mess things up – even an upgrade). Something changed somewhere.

In this case I would be tempted to edit the config on this machine to change every folder to “send only” and pause, plus pause all the devices. (Save a copy of the config first).

Let the database rebuild (there won’t be much with the all the folders and devices paused), then unpause one folder at a time until you get the panic. Wait for all the scanning and hashing to settle before unpausing the next one – meaning, wait several minutes after scanning is complete on a folder before unpausing the next. Maybe watch Syncthing cpu usage to drop first.

If you successfully unpause all the folders, begin to unpause each device, give it time to exchange data and compare etc – again, Syncthing CPU usage may be a helpful proxy.

If you get a panic with everything paused then the problem is possibly the home directory permissions or access.

Just my 2c, this may give you better ideas!

Nuke it, upgrade to 2.0, see if it helps.

1 Like

Is the encrypted home directory on ecryptfs by any chance? I think I read some trouble reports about keeping the database on that, especially with the new SQLite implementation in v2. Do some research first and see if you can get rid of ecryptfs if in use.

Taking all of your suggestions, I am rebuilding the database using v2, having moved it off the ecryptfs, and taking it slow. So far the database has rebuilt from new scans (seemingly a bit faster than v1). But haven’t synced with other computers yet. Crossing my fingers. Thanks for the ideas.

It turns out the likely root of the problem was a failing memory chip. Syncthing seems to be a good hardware fault detector :slight_smile:

Given the suggestions, I moved the syncthing home directory to a non-default location outside ecryptfs. Related question however: What is the correct way to do this permanently in Debian/Ubuntu? I had modified the startup script /lib/systemd/system/syncthing@.service to provide the --home option. But this was overwritten when syncthing was upgraded from 2.0.11 to 2.0.12.

1 Like

This should be a good starting point:

sudo systemctl edit syncthing@YOUR_USER_NAME.service
1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.