Any thoughts on setting up versioning with deduplication?
I sometimes deal with huge media files, where I edit meta data or do other small changes. I am thinking about running ZFS on my server with ZFS deduplication enabled. (no experience with ZFS at all, would be a first)
I would suggest start by setting copyrangemethod on the folder. ZFS dedup is quite memory intensive, and a one way street. Look carefully before enabling.
If you dont need versioning on every host, you might also look into using zfs-snapshots. This might be a lighter on your storage space without the need for activating deduplication, because only new blocks are written.
Interesting. As I understand it, this is a way to increase performance. copy_file_range looks like an obvious choice.
We are lucky to have one of the ZFS gurus in our hackerspace, and I asked him yesterday. He echos your concern in that dedplication is extremely heavy on RAM (even if I have a 64 GiB in my server). He suggested using compression instead, then highlighted various ZFS features such as background scrubbing (of course).
Thanks, that’s an interesting suggestion! The disadvantage is that it won’t track every little change, unless there is an option to somehow run snapshots continuously.
I also have an automated Restic backup set up on my server, which runs once per day. So I have some versioning from that, but only at daily granularity.
Example use case when versioning: I edit a document and accidentally delete a section, and there is no undo option. I then just pull the version saved a few minutes before.
Other use case: I accidentally delete a sub directory, and I need to get that back.
Currently, I have all my data inside Dropbox, and the versioning feature gives me peace of mind.
It does increase performance, but primarily in this case it does so by making CoW “copies” of data, ie essentially the snapshot mechanism in ZFS. So more or less deduplication for the data being versioned by Syncthing.
Oh, I see. CoW only copies once the data is modified. I presume this also works on EXT4, as the documentation says it has been tested with that. (I have EXT4 currently on my Syncthing SSD.)
Not continuously but very often, as in multiple times per minute. There are tools to automatically create and rotate snapshots like zfs-auto-snapshot or zfs_autobackup. The caveat on this is, this comes only to effect if the sync to the zfs host was completed.
Guess I’ll ask our local ZFS guru about it. I haven’t decided yet on
which method to use for providing a path to undo accidental changes.
It’s either Syncthing’s versioning or ZFS snapshots.
I just did a quick test by renaming a directory Zephyr to Zephyr2:
The directory with the old name gets backed up into .stversions.
If the directory is big, and if there is no form of deduplication,
then this can be a real issue. Copy-on-write is kind of mandatory to
avoid filling up storage quickly.
The directory file names stay the same in .stversions, but names
of files get prepended with some UID, possibly a timestamp.
Restoring a directory looks painful. One would need to rename all
the files inside.
IMO Syncthing’s versioning is “poor man’s backup” and your observations are examples of why I think like that. I would provide snapshots. Or look into what different real backup solutions offer, such as Restic.
ZFS snapshots can be run very frequently, but if you need to later list the snapshots it can be a chore to sort through them to find the snapshot you want to pull from.
Another piece to consider is zrep, to copy ZFS snapshots from one ZFS server to another. zrep can run every minute, and the rolling zrep snapshots can be expired on a daily rotation. Its no issue to have manually created and zrep snapshots intermingled.
I do agree this is a kind of serious issue. There is a lot of discussion around here in a thread about more elegantly handling file and folder renames and moves.
As I understand the current solution is a rather rudimentary handling of renamed files in the same directory if and only if those changes are transmitted at more or less the same time. (So this breaks if you rename a very large file that takes more than 60 seconds (?) to scan, as syncthing may delete the old named file before it discovers the source has a new file with the same contents under a new name.
Anyway you really have to consider syncthing handles renames and moves like the old names were deleted and the new names were created and the files just happen to have the same contents.