Corrupt Database vs. Old Version

increa · July 21, 2019, 12:30pm

BLUF - Can a bad database file read “panic crash” be caused from a version incompatibility, or is the only way for this to happen is a bad file read? The hard drive in question shows no other problems.

More info

I am forced to run Syncthing on two older systems that don’t have options to upgrade to new OS (and therefore new Syncthing). I hate to even mention the OS or the Syncthing version number, because I know everybody will slam me for running old versions. Suffice it to say they’re back in the 0.14 genre of software. If you can help me with one little piece of information I’ll go debug my own problem. Problem occurs when only my two old Syncthing nodes run, so I don’t think it has anything to do with the v1.2 upgrade.

On one of my two old Syncthing platforms, I started getting corrupt database panic crashes when trying to read .ldb files. I deleted the entire directory containing .ldb files. Rehashed everything. Problem returned.

AudriusButkevicius · July 21, 2019, 12:49pm

You haven’t provided the actual error message, so

increa · July 21, 2019, 1:51pm

Audrius,

I’m rehashing the entire node again this morning and will cut-n-paste the precise “bad CRC .ldb file read Syncthing crash” error message here when it shows up in the log again.

I read about the new version incompatibility, but my newest node is v1.1.4.

I was wondering if other version incompatibility can cause the Syncthing software to ~perceive~ a bad CRC read on one of the .ldb files. In other words, I’m not sure if the the developers chose to re-use the “bad CRC read” message for different conditions, also.

calmh · July 21, 2019, 2:06pm

Read errors from the database layers are not caused by incompatbility issues. They are caused either by bugs in the old version, if that’s where the panic occurs, or by some sort of hardware issue (RAM, CPU, disk, …).

increa · July 21, 2019, 3:04pm

After deleting the db this morning, the node was happily hashing for about 2 hours off the net. I plugged in the network cable and it hashed away for about another hour. Then the checksum crash occurred again. Here’s what I see:

...
[BHYUW] 2019/07/21 10:50:34 INFO: Detected 0 NAT services
[BHYUW] 2019/07/21 10:51:10 INFO: Connection to C2PVFR6-privacy-redacted-PBZUEQL at 10.0.0.104:1402-10.0.0.
115:22000/tcp-client closed: <nil>
panic: leveldb/table: corruption on data-block (pos=1612790): checksum mismatch, want=0x23a20014 got=0x63c6d20c [file=002190.ldb]
[monitor] 2019/07/21 10:51:10 WARNING: Panic detected, writing to "C:\Documents and Settings\User\Application Data\Syncthing\panic-2019072
1-105110.log"
[monitor] 2019/07/21 10:51:10 WARNING:
*********************************************************************************
* Crash due to corrupt database.                                                *
*                                                                               *
* This crash usually occurs due to one of the following reasons:                *
*  - Syncthing being stopped abruptly (killed/loss of power)                    *
*  - Bad hardware (memory/disk issues)                                          *
*  - Software that affects disk writes (SSD caching software and simillar)      *
*                                                                               *
* Please see the following URL for instructions on how to recover:              *
*   https://docs.syncthing.net/users/faq.html#my-syncthing-database-is-corrupt  *
*********************************************************************************

[monitor] 2019/07/21 10:51:11 INFO: Syncthing exited: exit status 2

If I go open 002190.ldb manually (outside of Syncthing), it opens fine, so I’m not sure the hard drive really has a read problem. I guess we just attribute this up to a bug in the old version? It’s just that it’s run stable for over a year and then all of a sudden this. One more test: I’m going to redelete the database and NOT plug in the network cable until all directories are scanned. Maybe the error happens only when it can “touch” v1.1.4 on node C2PV…

increa · July 21, 2019, 3:21pm

BTW, is closing the debug window on Windows too abrupt of a shutdown, like “killing” the process, which could corrupt the database? Just wondering. That’s not likely the problem this time because after deleting the db, I got the error before ever shutting down Syncthing.

AudriusButkevicius · July 21, 2019, 5:15pm

This is nothing todo with version.

We are sure you have bad hardware, it’s either the disk or the ram. The fact you can open/read/write the file yourself proves nothing, as these errors can be transient.

calmh · July 21, 2019, 8:26pm

Also, that the file can be read does not mean you get the data that was written. That’s what “corruption” means here.

increa · July 21, 2019, 9:53pm

Sounds like you guys are pretty sure my hard drive or memory is going bad. I will run some BIOS checks and scan the disk. In the meantime, results of more testing…

With the syncthing node isolated (network cable disconnected or all reachable syncthing peers turned off), there is no database error message. However, obviously no syncing goes on either!

If either peer node is brought on line (old syncthing on a Mac, or v1.1.4 on Windows 10) then the database is reported as corrupt after a delay of seconds or minutes.

Nummer378 · July 22, 2019, 12:22am

For hardware checks, don’t forget to have a look at your disk’s SMART data. It often reports dying harddisks nicely - I just recently had an app reproducibly crashing after performing a specific set of actions and after blaming the application for a week, SMART revealed that one of my disks had silently lost a lot of sectors.

system · August 21, 2019, 12:22am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.