Sanity check: adding .stversions and Samba recycle bin folders into Syncthing

tl;dr

I want to maximize block reuse and reduce network transfer as much as possible. I will do this by pulling files from all possible locations into Syncthing.

Showerthought that lead to this

What if on one PC I had infinite CPU, infinite storage, and an algorithm that generated every permutation of every file known to man, and placed this in Syncthing as an unshared folder? Would this mean that any Syncthing partnerships to this device would (aside from metadata transfer) not use any network at all, pulling from an endless supply of blocks?

Syncthing

Each folder has a .stversions which is excluded by default. calmh says this ignoring behaviour potentially prevents dodgy situations. I’m guessing he’s probably right. This folder is a good source for deleted files to be able to harvested for blocks.

Samba

Each folder (mine correspond with Syncthing ones 1:1) can be set up, through the use of the official vfs_recycle module, to not respect client file deletes, but redirect files to a folder, by default .recycle, in the root of that folder. Since it’s a regular folder, it’s not ignored by default, but I will (more later). Another good source for blocks.

Setup

Now these folders need to be in Syncthing’s database. They need not be actually synced, just scanned and unshared.

Either we (1) unignore the above folders (risky), (2) add them into syncthing, creating 3 folders for every 1, or (3) something smart.

Each folder has one .stversions and .recycle inside. With many folders, this is unmanageable. We create a staging folder, and into it bind mounts of each folder’s contents.

Example contents:

folder1-stversions

folder1-smbversions (etc).

Explanation

By using bind mounts, we fool Syncthing into thinking these are actual files, rather than symlinks whose contents are not followed. Each host’s Syncthing (and Samba) folders should be tracked in this fake folder. Note that the database might grow with these files.

Now, when I delete a file in Samba on one host, it is moved into .recycle there and into .stversions remotely. If that file is restored (via move, or recreation) on any host, other hosts will have .stversions or .recycle to look into, for block reuse.

Deletion outside Samba on one host would still result in saving in .stversions on remote ones, so still less data retransfer needed.

Aside genuinely needing their contents during accidental deletions, these .* folders are meant to be set-and-forget. With normal filesystem use, if things go well, they should build up a nice amount of files, and with it potential “cache hits”.

Scenario

For this situation I am thinking specifically of something I’ve observed. Let’s say we move or delete a huge number of files. Syncthing is scanning, etc. Then we undo (the move was accidental) or we make further huge amounts of changes. Bonus: we make many changes on multiple hosts at the same time.

As much as we love all Syncthing it’s a hardworking little one but it has its limits and now we’ve confused it, what should be a rename, thanks to a large backlog, is a delete and create, etc. By giving Syncthing as many sources of deleted files to be aware of, it has more places to pull files from when re-transferring.

Clarifications

(1) The .recycle folder should remain unsynced, because if it is, and two hosts sync that folder, but one has a superset and another subset via ignoring, superset deletions of ignored files would result in their transferring to subset hosts, where the file should not even exist. Plus its behaviour becomes aligned with .stversions

Keeping it synced, would mean a centralized trash for all Samba servers. The bonus would be we would not have block reuse, but wholesale file-copy operations instead (no files are ever deleted, only moved)

(2) There is nothing inherent about Samba to make this work, and relying on .stversions-in-database alone should help a lot. However, we know that local Syncthing does not (and cannot) know of local deletes, as those are filesystem-level (feature request: syncthing-fs). Making the assumption that files are accessed via Samba, and not, say, ftp or nfs, we are essentially allowing Syncthing, in a certain sense, to be aware of “local deletes”.

(Technically, you could say that this is not unlike “remap rm command to mv ./[file] /.stversions-v2/[file]”)

File pruning

To maximize “cache hits” we can set .stversions to empty for as long as our storage space allows. I have no reservation against 3+mos, for the time being. .recycle has no built-in mechanism for pruning so this is something that needs to be thought out.

Your thoughts are appreciated!

This seems like a lot of effort for a use case that I’m not seeing as being widely applicable.

Well, you wouldn’t be wrong. Mainly this is meant to solve two problems:

  1. Some of the PCs running syncthing are in a country where internet is controlled, and practically speaking it’s very slow
  2. I am undecided on folder structure, and in the meantime there’s a lot of moving things around, sometimes needlessly, which can result in suboptimal syncthing behaviour.
1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.