Currently, Syncthing cannot handle moved or renamed files, without transferring them over the network as if they never existed on one side before.
In my case (and clearly for others based on searching the forum and github page), that makes it a not-viable solution for syncing large amounts of data, when high-level folders containing GB or TB of data, can be and often are renamed. (In my case it’s part of my workflow. Also in my case, I’m only concerned with one-way sync. aka mirror, but that’s not necessarily relevant to this request.)
Some utilities, like
--track-renames can do this, based on file content checksums. Also, chunk-based backup utilities do it, while a very different problem domain, by largely not caring about any notion of “files” anyway, at least beyond connecting paths to chunks in a database, and restoring correctly.
Even trusty old
rsync can even be convincingly tricked into being fully tolerant of moves and renames, with surprising ease, via a wrapper script that updates “shadow” directories - such as ‘hrsync’.)
What they have in common, is caring more about checksums, than metadata (i.e. filname, directory location, timestamps, security, etc.). The former deals with whole file content checksums, the latter with chunk content.
Syncthing could be far more efficient (in this particular way at least), if it paid more attention to file content checksums.
For example, let’s say we have a one-way sync situation set up, for simplicity.
- FileA.doc exists on both source and target.
- A human renames source/FileA.doc to source/FileB.doc.
- Source: Notices (via periodic scan or inotify) that FileB.doc newly exists, so it scans content for checksum and stores it locally.
- Target: Notices that source/FileB.doc exist but target/FileB.doc doesn’t, so it checks for existence of source/FileB.doc’s same content checksum anywhere on target.
- Target: Finds target/FileA.doc with same checksum as source/FileB.doc.
- Target: Notices that source/FileA.doc no longer exists.
- Target: Renames target/FileA.doc to TargetB.doc, and syncs other metadata to match.
No file content needed to be transferred, only a single checksum.
Notice that it doesn’t matter what, where, or why target/FileA.doc originally was. All we care about is that 1) FileA.doc no longer exists on source, 2) source/FileB.doc exists on source but not target, and has the same checksum as target/FileA.doc.
In other words, imagine both source and target filesystems as buckets of not necessarily unique content and oh by the way, each with corresponding metadata.
Then, similar logic can be applied to all kinds of operations, not just 1:1 renames. For example:
- Move: Same logic as rename, except the metadata being changed is the containing folder, not filename.
- User makes a copy on source side, file “A.doc” to “A (copy).doc”: Since the checksum of “source/A (copy).doc” already exists on target (because “target/A.doc” had alredy been synced before), the copy can be performed solely on target, without transferring content over the network. (Or even better, if
copy --reflink=alwaysis supported on target, then the copy can be near-zero-cost in terms of time and storage, while still maintaining all unique metadata and ability for content to diverge later.)
- File is renamed on source, but it’s ambiguous which file that was on the target: It doesn’t matter. Since we’re focusing only on content checksum, and metadata like paths are ancillary things that need to either be copied, changed, or created - either just do a copy only on source (based on checksum), then delete whatever needs to be after that, or rename the first file encountered (or a random one) that has the same checksum, but exists on source and not target, then deal with the rest either via copies only on target, and/or deletes.