Syncthing crashes with "corruption on data-block"

TrinTragula · June 16, 2015, 11:33pm

This has happened several times now. After it happened two days ago I deleted the index and let it rebuild everything (taking a couple of hours). All files finished syncing and everything seemed to be working fine for a day. Then it happened again today. It won’t restart without deleting the index.

The crash happens on a Windows 2012 Server that is continually running. There are no hard drive problems.

I did have it running within SyncTrazor. I don’t know if that make a difference, but it doesn’t restart even when run separately.

Any ideas?

Here’s the error:

[HOD5B] 15:38:48 INFO: syncthing v0.11.9 (go1.4.2 windows-amd64 default) unknown-user@syncthing-builder 2015-06-14 11:52:00 UTC [HOD5B] 15:38:48 INFO: My ID: HOD5BMZ [HOD5B] 15:38:48 INFO: Database block cache capacity 65536 KiB [HOD5B] 15:38:50 OK: Ready to synchronize Server2 (read-write) panic: leveldb/table: corruption on data-block (pos=838243): checksum mismatch, want=0xd2dbcbda got=0xc72aff56 [file=012607.ldb] goroutine 1 [running]: github.com/syncthing/syncthing/internal/db.ldbCheckGlobals(0xc0820aa580, 0xc085ad41c8, 0x4, 0x8) /go/src/github.com/syncthing/syncthing/internal/db/leveldb.go:1053 +0xde7 github.com/syncthing/syncthing/internal/db.NewFileSet(0xc0820e2440, 0x4, 0xc0820aa580, 0xc082442420) /go/src/github.com/syncthing/syncthing/internal/db/set.go:55 +0x28c github.com/syncthing/syncthing/internal/model.(*Model).AddFolder(0xc082216000, 0xc0820e2440, 0x4, 0xc0820fc260, 0x11, 0xc082112280, 0x2, 0x4, 0x0, 0x708, …) /go/src/github.com/syncthing/syncthing/internal/model/model.go:1166 +0x1a8 main.syncthingMain() /go/src/github.com/syncthing/syncthing/cmd/syncthing/main.go:607 +0x2150 main.main() /go/src/github.com/syncthing/syncthing/cmd/syncthing/main.go:379 +0x2325 goroutine 5 [syscall]: os/signal.loop() c:/go/src/os/signal/signal_unix.go:21 +0x26 created by os/signal.init·1 c:/go/src/os/signal/signal_unix.go:27 +0x3c goroutine 7 [chan receive]: main.trackCPUUsage() /go/src/github.com/syncthing/syncthing/cmd/syncthing/gui_windows.go:37 +0x496 created by main.init·2 /go/src/github.com/syncthing/syncthing/cmd/syncthing/gui_windows.go:17 +0x2c goroutine 8 [select]: github.com/thejerf/suture.(*Supervisor).Serve(0xc0820d8240) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/thejerf/suture/suture.go:411 +0xf6b created by github.com/thejerf/suture.(*Supervisor).ServeBackground /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/thejerf/suture/suture.go:373 +0x39 goroutine 10 [select]: github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).drain(0xc08200a1c0) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/util/buffer_pool.go:205 +0x225 created by github.com/syndtr/goleveldb/leveldb/util.NewBufferPool /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/util/buffer_pool.go:236 +0x253 goroutine 11 [select]: github.com/syndtr/goleveldb/leveldb.(*DB).compactionError(0xc0820aa580) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db_compaction.go:153 +0x2db created by github.com/syndtr/goleveldb/leveldb.openDB /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db.go:126 +0x850 goroutine 12 [select]: github.com/syndtr/goleveldb/leveldb.(*DB).mpoolDrain(0xc0820aa580) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db_state.go:73 +0x151 created by github.com/syndtr/goleveldb/leveldb.openDB /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db.go:127 +0x86a goroutine 14 [select]: github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(0xc0820aa580) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db_compaction.go:759 +0x28a created by github.com/syndtr/goleveldb/leveldb.openDB /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db.go:131 +0x8c8 goroutine 15 [select]: github.com/syndtr/goleveldb/leveldb.(*DB).jWriter(0xc0820aa580) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db_write.go:37 +0x19e created by github.com/syndtr/goleveldb/leveldb.openDB /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/syndtr/goleveldb/leveldb/db.go:132 +0x8e2 goroutine 21 [select]: github.com/syncthing/syncthing/internal/model.(*ProgressEmitter).Serve(0xc0852faac0) /go/src/github.com/syncthing/syncthing/internal/model/progressemitter.go:52 +0x8f9 created by github.com/syncthing/syncthing/internal/model.NewModel /go/src/github.com/syncthing/syncthing/internal/model/model.go:144 +0xb8c goroutine 22 [select]: github.com/thejerf/suture.(*Supervisor).Serve(0xc0820f3440) /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/thejerf/suture/suture.go:411 +0xf6b github.com/thejerf/suture.func·007() /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/thejerf/suture/suture.go:516 +0xfd created by github.com/thejerf/suture.(*Supervisor).runService /go/src/github.com/syncthing/syncthing/Godeps/_workspace/src/github.com/thejerf/suture/suture.go:519 +0x125 goroutine 23 [select]: github.com/syncthing/syncthing/internal/ignore.(*Matcher).clean(0xc086f994a0, 0x68c61714000) /go/src/github.com/syncthing/syncthing/internal/ignore/ignore.go:161 +0x1d0 created by github.com/syncthing/syncthing/internal/ignore.New /go/src/github.com/syncthing/syncthing/internal/ignore/ignore.go:53 +0x177 goroutine 24 [semacquire]: sync.(*RWMutex).RLock(0xc0820ea0a0) c:/go/src/sync/rwmutex.go:36 +0x66 github.com/syncthing/syncthing/internal/model.(*Model).CurrentLocalVersion(0xc082216000, 0xc082009ff0, 0xc, 0xc0820ae000) /go/src/github.com/syncthing/syncthing/internal/model/model.go:1540 +0x52 github.com/syncthing/syncthing/internal/model.(*Model).CheckFolderHealth(0xc082216000, 0xc082009ff0, 0xc, 0x0, 0x0) /go/src/github.com/syncthing/syncthing/internal/model/model.go:1680 +0x2aa github.com/syncthing/syncthing/internal/model.(*rwFolder).Serve(0xc086a61c00) /go/src/github.com/syncthing/syncthing/internal/model/rwfolder.go:258 +0x17e3 created by github.com/syncthing/syncthing/internal/model.(*Model).StartFolderRW /go/src/github.com/syncthing/syncthing/internal/model/model.go:192 +0x5e6

AudriusButkevicius · June 16, 2015, 11:57pm

This is usually due to abrupt shutdowns, some caching software/layer or failing hardware. Search the forum and the issue tracker, for similar issues and suggestions.

TrinTragula · June 17, 2015, 12:01am

I already did. I have no hardware issues and the server wasn’t shut down. It runs continually.

I don’t have any software that would do additional caching beyond what Windows does by default. It’s a pretty basic Windows server that just runs as a file server.

lfam · June 17, 2015, 1:44am

Are you sure there are no hardware problems? Are you using ReFS? If not, is there anything else on the machine doing any data integrity checking?

If not, I would do it manually as an experiment. You say the problems keeps recurring so this should be easy. I would hash the data “by hand”, outside of Syncthing, and save the results on another machine (using something like FreeBSD’s mtree). Then I would let Syncthing rebuild its index, and wait for the problem to recur. When it does, I’d redo the external hash and compare with the earlier result to make sure that the data was unchanged.

Of course, if your dataset changes normally, this won’t work. You’d have to make a copy and work with that.

I’m not an expert on Syncthing internals but this seems reasonable. It’s what I would do if Syncthing or git (which also relies on hashing file contents) started complaining about checksum mismatches on any filesystem that doesn’t check data integrity.

lfam · June 17, 2015, 2:21am

You could also use rsync as described in the link below. The rsync based solution is usually a PITA because it requires an extra copy of the whole dataset but in this case, you already have one! And rsync is much more mature than Syncthing.

http://www.techrepublic.com/blog/it-security/use-rsync-for-filesystem-integrity-auditing/

Using rsync may be easier than wrangling something like mtree to work on Windows. Sorry, my knowledge of Windows is very thin, and there may be an easy way to hash a directory tree in the base system of the OS.

Discussion of hashing on Windows:

TrinTragula · June 17, 2015, 2:33am

I’m using NTFS. There’s no other software running on the server that would do integrity checking. It is running on a RAID 1 mirror, but it doesn’t seem like that would make a difference. The mirror doesn’t show any errors.

The server is syncing with a live web server that has new files written to it every few minutes. So the dataset has already changed. Even if I made a copy to test it I don’t think it would help. I don’t think syncthing is crashing while nothing is happening. I think its crashing during the sync process as new files are added. And somehow the database is getting corrupted in the process.

AudriusButkevicius · June 17, 2015, 9:23am

Is syncthing’s own database somehow synced by accident? Also, perhaps the raid controller is acting up?

TrinTragula · June 17, 2015, 9:51am

No, it’s not syncing it’s own database. The RAID doesn’t show any errors and I don’t see any disk errors in the Windows log.

Perhaps running it within SyncTrazor is causing problems for some reason. I’ll try deleting the database and running Syncthing separately to see if the problem happens again.

AudriusButkevicius · June 17, 2015, 10:29am

I don’t think that will help in any way, as SyncTrayzor just runs the binary and pokes at it through the rest interface.

canton7 · June 17, 2015, 10:48am

SyncTrayzor will forcibly terminate the Syncthing process in 4 scenarios:

You select Syncthing → Kill from the menu
You select Syncthing → Kill all Syncthing processes from the menu
You log off / shut down your computer, and Syncthing does terminate within 250ms after receiving a ‘Shutdown’ command over its Rest interface
SyncTrayzor crashes, and shows the ‘SyncTrayzor crashed, please open an issue on GitHub’ page

Aside from these, all shutdowns are graceful

EDIT: To update this, SyncTrayzor now gives Syncthing 2 seconds to shut down instead if 250ms (which is the maximum a program can take before Windows warns the user). Syncthing also won’t be killed automatically on the event of a crash, and there’s no longer a menu option to kill Syncthing instead of shutting it down.

TrinTragula · June 17, 2015, 10:55am

I did have SyncTrazor crash on me a couple of times 3 days ago, but Syncthing continued running. I had to manually stop it in order to restart SyncTrazor. But after restarting it, everything seemed to be working fine and all files were in sync.

Then yesterday, Syncthing itself crashed (but not SyncTrazor) and I got the above error when trying to restart it.

canton7 · June 17, 2015, 10:57am

Please report all SyncTrayzor crashes as an issue.