When I discovered data corruption in one of my files this afternoon, I realized that Syncthing might happily propagate the error to all my copies of the file! Luckily, I keep a manual backup for just this occasion.
Obviously, this is not Syncthing’s fault, but it did give me an idea. Would it be possible to (easily) identify and repair bit rot using Syncthing?
Simple case:
- File X lives on Device A and Device B
- Several months/years pass
- A cosmic ray hits the hard drive of Device A and File X gets corrupted to File Y
- Several months/years pass
- Novice user tries to open File X on Device A, gets File Y instead. Saves File Y after trying to repair it.
- Syncthing propagates the changes to Device B (File X → File Y)
- Novice user loses File X unnecessarily
Now that I’ve written it down, this sounds like a very specific use case. Syncthing wouldn’t be able to solve all types of data corruption (e.g. write errors) and, by itself, may not be very effective at solving this problem. That said, maybe my post will prompt some interesting ideas.
If this sounds like a viable feature, I have been learning the Syncthing codebase and would be happy to work to implement it.
More thoughts:
If I understand correctly (which I may not), Syncthing polls the ‘last modified’ time for file updates. Once a file is identified, the file hash index is used to find the changed bits and transmit the update. To identify bit rot, Syncthing would have to rehash the entire folder and identify files that have changed but have a ‘last modified’ time that was before the last rehash of the folder.
The ‘bit rot detection’ feature could run in the background (maybe throttled?) on a predetermined interval (~1 month) and identify any files that have been corrupted in that time. It would then request the correct version from another device.
Case with ‘bit rot detection’:
- File X lives on Device A and Device B
- Several months/years pass during which Syncthing rehashes the folder on both devices monthly
- A cosmic ray hits the hard drive of Device A and File X gets corrupted to File Y
- On its monthly rehash, Syncthing detects that File X has changed but its ‘last modified’ time is the same
- Syncthing requests a fresh version of File X from Device B. File X is corrected.
- Novice user remains happily oblivious
What kind of accidental errors could this cause? (e.g. a file is intentionally updated, but its ‘last modified’ time is not changed, and the changes are overwritten)
Thanks for reading!