Preventing Data Corruption using Syncthing

doylep · March 6, 2015, 12:20am

When I discovered data corruption in one of my files this afternoon, I realized that Syncthing might happily propagate the error to all my copies of the file! Luckily, I keep a manual backup for just this occasion.

Obviously, this is not Syncthing’s fault, but it did give me an idea. Would it be possible to (easily) identify and repair bit rot using Syncthing?

Simple case:

File X lives on Device A and Device B
Several months/years pass
A cosmic ray hits the hard drive of Device A and File X gets corrupted to File Y
Several months/years pass
Novice user tries to open File X on Device A, gets File Y instead. Saves File Y after trying to repair it.
Syncthing propagates the changes to Device B (File X → File Y)
Novice user loses File X unnecessarily

Now that I’ve written it down, this sounds like a very specific use case. Syncthing wouldn’t be able to solve all types of data corruption (e.g. write errors) and, by itself, may not be very effective at solving this problem. That said, maybe my post will prompt some interesting ideas.

If this sounds like a viable feature, I have been learning the Syncthing codebase and would be happy to work to implement it.

More thoughts:

If I understand correctly (which I may not), Syncthing polls the ‘last modified’ time for file updates. Once a file is identified, the file hash index is used to find the changed bits and transmit the update. To identify bit rot, Syncthing would have to rehash the entire folder and identify files that have changed but have a ‘last modified’ time that was before the last rehash of the folder.

The ‘bit rot detection’ feature could run in the background (maybe throttled?) on a predetermined interval (~1 month) and identify any files that have been corrupted in that time. It would then request the correct version from another device.

Case with ‘bit rot detection’:

File X lives on Device A and Device B
Several months/years pass during which Syncthing rehashes the folder on both devices monthly
A cosmic ray hits the hard drive of Device A and File X gets corrupted to File Y
On its monthly rehash, Syncthing detects that File X has changed but its ‘last modified’ time is the same
Syncthing requests a fresh version of File X from Device B. File X is corrected.
Novice user remains happily oblivious

What kind of accidental errors could this cause? (e.g. a file is intentionally updated, but its ‘last modified’ time is not changed, and the changes are overwritten)

Thanks for reading!

AudriusButkevicius · March 6, 2015, 12:28am

So novice user would not get file Y, as we have hashes for each block, and given there was a bit flip, the hash would not match, the block would be refused, and retried from a different peer (given one is available) or just error out saying nobody has the block.

Also, versioning should protect you from this, sort of.

doylep · March 6, 2015, 12:35am

Sorry, I guess I didn’t clarify (or I don’t understand how Syncthing works): novice user opens and saves the corrupted file, thus updating the ‘last modified’ time. To Syncthing, this appears like a genuine update to the file, and the corrupted file is propagated to the other device(s).

Good point - versioning would mostly protect against this.

mrmachine · March 6, 2015, 1:16pm

I’ve been looking for a tool that has backup/parity data already (e.g. BTsync, CCC, Syncthing) to provide protection against bitrot, too.

I have a ZFS NAS which provides protection against bitrot by comparing the current hash against a stored hash on every read, and transparently repairing bitrotted files by then reading from the mirror/parity disks, and assuming that read passes the hash test it replaces the corrupted data.

I don’t want other devices which don’t have this protection to propagate bitrotted files in their less robust file systems to the ZFS NAS, which would negate the benefit of having it.

Having file versioning (provided by ZFS, BTSync, Syncthing, and backups like Time Machine and Arq) don’t do much to help. By the time you realise a file has become bitrotted, it may be too late.

It’s not always even obvious a file has been damaged when you do access it. Imagine a few bits are flipped in a long plain text document. Over months or years you edit this document, not noticing it has some corruption. When you eventually discover it, all your incremental backups also have it by that time, and even if they didn’t just reverting to an older version would lose you any legitimate edits you’d made since the rot started.

Syncthing would have to constantly (or at least periodically) recalculate and verify the hash for already synced files, and when it finds a mismatch, replace the file with fresh data from other devices.

But it would need to be able to distinguish between bitrot and legitimate edits. Imagine Syncthing is shutdown (or has crashed) for a period of time, during which legitimate edits are made. How would Syncthing reliably know that those changes should be propagated, and are not bitrot?

The creator of CCC told me that there are often files which are legitimately changed but have the same size and mtime, making it difficult to know for sure.

doylep, you might want to look into the use of another tool dedicated to the task of detecting betroth, and then manually restore from other peers or from Syncthing’s version history or another incremental backup. I recently discovered https://github.com/ambv/bitrot and intend to give it a workout.

doylep · March 6, 2015, 2:53pm

@mrmachine Thanks for the advice! I was looking into ZFS as a personal solution, but (as you point out) it only works if all the devices are using it.

I’m not familiar with CCC - can you share a link?

ambv/bitrot looks promising; I’ll give it a try. Unfortunately, it appears to rely on ‘last modified’ time so it may incorrectly identify bitrot, but at least the user will have the option to choose which file(s) to update (instead of automatically correcting the modifications).

Frank_Imeson · July 6, 2020, 11:32am

See: Bit Rot Prevention Scheme

all-the-data · April 25, 2024, 6:40pm

That’s my use case. I’ve posted over in the other thread, just because it is significantly newer; thanks for your contribution though. Bit Rot Prevention Scheme - #13 by all-the-data

mopani · April 26, 2024, 5:39pm

As I mentioned in the linked thread, git-annex is probably a better choice for detecting bit-rot and managing archival storage.

a24 · April 26, 2024, 5:50pm

All modern storage has checksums and redundant data to recover incorrect bits, and so do interfaces. However, there is (usually) no safety net for RAM and for programming errors. The only way to protect changing files in such scenarios is just to keep all the versions forever.