Hello,
Iām a happy user of Syncthing since some years. I recently got some time for private projects, and dug into the topic of bitrot. Iām familiar with the ideas of ZFS and BTRFS. I have a QNAP NAS with 4 disks in RAID6 with regular scrubbing at home. I also had a look into cloud-filesystems which also have redundancy and techniques for scrubbing the data to detect and fix bitrot. The topic of ECC RAM I also came across.
I stumbled across the āgarageā-project.
And I like theirs dedication on building a software solution that works without the need of special hardware. Directly on theirs entry-page (you need to scroll down), one can read the hardware requirements:
Keeping requirements low
Build a cluster with whatever second-hand machines are available
And at the same time they claim to be
Highly resilient to network failures, network latency, disk failures, sysadmin failures
Of course, this project is not directly comparable to Syncthing, but I can see a few common things:
- Distributed, connected devices that automatically exchange static/rarely changing data which is protected from bitrot and accidents by redundancy.
Of course, the part with having ādata protectionā is only halfway true for Syncthing. And Synchting targets a different set of usecases, but nevertheless I was wondering if a āsimpleā bitrot protection could be done with Syncthing as well. I stumbled across this discussion here and understood that the biggest issue with including it into Syncthing is the problem with the detection of bitrot.
Or, being more precise: The differentiation if a change was done intentionally or caused by bitrot.
But what if this could be solved externally? Lets just assume for a moment that this would be possible with external tool support. Then Syncthing would need a small API that could be used to inform about file, directory or metadata corruption. Then Syncthing could simply restore the corrupted parts by downloading them from other nodes.
If I would like to implement this, where would I need to start within Syncthing?
I have an idea on how to get the external detection running even on limited devices like a smartphone. It includes to use the help of the filesystem directly. Sadly, I could not find anything in this direction so far. So this means one needs to do a small, but important extension to existing filesystem implementations. It should be a small change for any filesystem that already supports extended attributes (xattr). And it seems to me that any major and modern filesystem supports it directly or indirectly.
The idea is, that Synchting uses xattrs to store the latest computed hash for a file additionally within the filesystem itself. It uses a name for that xattr that tells the filesystem to delete it whenever there is an intentional change done to the file via the filesystem API. This way, Syncthing can detect any intentional change by the nonexistence of the otherwise existing xattr.
To me, this seems to be a rather small change to any existing filesystem implementation. Also the performance overhead should not be noticeable, when done correctly.
One could start with a simple decorator- or overlay filesystem, that passes trough all requests to the mounted real filesystem. And additionally clears the relevant xattrs when any modification is done.
Of course, to get this running on a regular smartphone without root access would take some time, because of the filesystem extension needed. But hey, I could imagine that other tools that do watch the filesystem for changes could also make use of this. Initially we would use the decorator overlay filesystem. Eventually, we would use directly the filesystem if it supports it.
OK, this actually seems to be a quite ambitious goal. But what about just the general idea of providing an API that allows to flag files, directories or metadata to be corrupted and thus to be replaced as soon as possible?
It would be a starting point for allowing any kind of external detector tools. What do you think?