Syncthing crashes with "corruption on data-block"

This has happened several times now. After it happened two days ago I deleted the index and let it rebuild everything (taking a couple of hours). All files finished syncing and everything seemed to be working fine for a day. Then it happened again today. It won’t restart without deleting the index.

The crash happens on a Windows 2012 Server that is continually running. There are no hard drive problems.

I did have it running within SyncTrazor. I don’t know if that make a difference, but it doesn’t restart even when run separately.

Any ideas?

Here’s the error:

[HOD5B] 15:38:48 INFO: syncthing v0.11.9 (go1.4.2 windows-amd64 default) unknown-user@syncthing-builder 2015-06-14 11:52:00 UTC [HOD5B] 15:38:48 INFO: My ID: HOD5BMZ [HOD5B] 15:38:48 INFO: Database block cache capacity 65536 KiB [HOD5B] 15:38:50 OK: Ready to synchronize Server2 (read-write) panic: leveldb/table: corruption on data-block (pos=838243): checksum mismatch, want=0xd2dbcbda got=0xc72aff56 [file=012607.ldb] goroutine 1 [running]:, 0xc085ad41c8, 0x4, 0x8) /go/src/ +0xde7, 0x4, 0xc0820aa580, 0xc082442420) /go/src/ +0x28c*Model).AddFolder(0xc082216000, 0xc0820e2440, 0x4, 0xc0820fc260, 0x11, 0xc082112280, 0x2, 0x4, 0x0, 0x708, …) /go/src/ +0x1a8 main.syncthingMain() /go/src/ +0x2150 main.main() /go/src/ +0x2325 goroutine 5 [syscall]: os/signal.loop() c:/go/src/os/signal/signal_unix.go:21 +0x26 created by os/signal.init·1 c:/go/src/os/signal/signal_unix.go:27 +0x3c goroutine 7 [chan receive]: main.trackCPUUsage() /go/src/ +0x496 created by main.init·2 /go/src/ +0x2c goroutine 8 [select]:*Supervisor).Serve(0xc0820d8240) /go/src/ +0xf6b created by*Supervisor).ServeBackground /go/src/ +0x39 goroutine 10 [select]:*BufferPool).drain(0xc08200a1c0) /go/src/ +0x225 created by /go/src/ +0x253 goroutine 11 [select]:*DB).compactionError(0xc0820aa580) /go/src/ +0x2db created by /go/src/ +0x850 goroutine 12 [select]:*DB).mpoolDrain(0xc0820aa580) /go/src/ +0x151 created by /go/src/ +0x86a goroutine 14 [select]:*DB).mCompaction(0xc0820aa580) /go/src/ +0x28a created by /go/src/ +0x8c8 goroutine 15 [select]:*DB).jWriter(0xc0820aa580) /go/src/ +0x19e created by /go/src/ +0x8e2 goroutine 21 [select]:*ProgressEmitter).Serve(0xc0852faac0) /go/src/ +0x8f9 created by /go/src/ +0xb8c goroutine 22 [select]:*Supervisor).Serve(0xc0820f3440) /go/src/ +0xf6b·007() /go/src/ +0xfd created by*Supervisor).runService /go/src/ +0x125 goroutine 23 [select]:*Matcher).clean(0xc086f994a0, 0x68c61714000) /go/src/ +0x1d0 created by /go/src/ +0x177 goroutine 24 [semacquire]: sync.(*RWMutex).RLock(0xc0820ea0a0) c:/go/src/sync/rwmutex.go:36 +0x66*Model).CurrentLocalVersion(0xc082216000, 0xc082009ff0, 0xc, 0xc0820ae000) /go/src/ +0x52*Model).CheckFolderHealth(0xc082216000, 0xc082009ff0, 0xc, 0x0, 0x0) /go/src/ +0x2aa*rwFolder).Serve(0xc086a61c00) /go/src/ +0x17e3 created by*Model).StartFolderRW /go/src/ +0x5e6

This is usually due to abrupt shutdowns, some caching software/layer or failing hardware. Search the forum and the issue tracker, for similar issues and suggestions.

I already did. I have no hardware issues and the server wasn’t shut down. It runs continually.

I don’t have any software that would do additional caching beyond what Windows does by default. It’s a pretty basic Windows server that just runs as a file server.

Are you sure there are no hardware problems? Are you using ReFS? If not, is there anything else on the machine doing any data integrity checking?

If not, I would do it manually as an experiment. You say the problems keeps recurring so this should be easy. I would hash the data “by hand”, outside of Syncthing, and save the results on another machine (using something like FreeBSD’s mtree). Then I would let Syncthing rebuild its index, and wait for the problem to recur. When it does, I’d redo the external hash and compare with the earlier result to make sure that the data was unchanged.

Of course, if your dataset changes normally, this won’t work. You’d have to make a copy and work with that.

I’m not an expert on Syncthing internals but this seems reasonable. It’s what I would do if Syncthing or git (which also relies on hashing file contents) started complaining about checksum mismatches on any filesystem that doesn’t check data integrity.

You could also use rsync as described in the link below. The rsync based solution is usually a PITA because it requires an extra copy of the whole dataset but in this case, you already have one! And rsync is much more mature than Syncthing.

Using rsync may be easier than wrangling something like mtree to work on Windows. Sorry, my knowledge of Windows is very thin, and there may be an easy way to hash a directory tree in the base system of the OS.

Discussion of hashing on Windows:

I’m using NTFS. There’s no other software running on the server that would do integrity checking. It is running on a RAID 1 mirror, but it doesn’t seem like that would make a difference. The mirror doesn’t show any errors.

The server is syncing with a live web server that has new files written to it every few minutes. So the dataset has already changed. Even if I made a copy to test it I don’t think it would help. I don’t think syncthing is crashing while nothing is happening. I think its crashing during the sync process as new files are added. And somehow the database is getting corrupted in the process.

Is syncthing’s own database somehow synced by accident? Also, perhaps the raid controller is acting up?

No, it’s not syncing it’s own database. The RAID doesn’t show any errors and I don’t see any disk errors in the Windows log.

Perhaps running it within SyncTrazor is causing problems for some reason. I’ll try deleting the database and running Syncthing separately to see if the problem happens again.

I don’t think that will help in any way, as SyncTrayzor just runs the binary and pokes at it through the rest interface.

SyncTrayzor will forcibly terminate the Syncthing process in 4 scenarios:

  1. You select Syncthing -> Kill from the menu
  2. You select Syncthing -> Kill all Syncthing processes from the menu
  3. You log off / shut down your computer, and Syncthing does terminate within 250ms after receiving a ‘Shutdown’ command over its Rest interface
  4. SyncTrayzor crashes, and shows the ‘SyncTrayzor crashed, please open an issue on GitHub’ page

Aside from these, all shutdowns are graceful

EDIT: To update this, SyncTrayzor now gives Syncthing 2 seconds to shut down instead if 250ms (which is the maximum a program can take before Windows warns the user). Syncthing also won’t be killed automatically on the event of a crash, and there’s no longer a menu option to kill Syncthing instead of shutting it down.

I did have SyncTrazor crash on me a couple of times 3 days ago, but Syncthing continued running. I had to manually stop it in order to restart SyncTrazor. But after restarting it, everything seemed to be working fine and all files were in sync.

Then yesterday, Syncthing itself crashed (but not SyncTrazor) and I got the above error when trying to restart it.

Please report all SyncTrayzor crashes as an issue.