I understand that syncthing detects file changes by changes in modification date, size or permissions.
But what happens if I have a file where contents changes without changes in modification date, size or permissions?
For example, rsync allows me to use the “-c” option in such cases.
The use case is, for example, virtual encrypted disks (also called volumes) as created by Veracrypt/Truecrypt. These are encrypted files of fixed size, which can be mounted and used like a disk. File modification time does not change when changing file contents (intentionally, though can be switched off).
That’s a very niche use-case for which you already have a work around (disable fudging mtime), so I think it’s suboptimal to spend effort on a feature that already has a workaround and no other use cases.
I guess from your point of view you think it’s important to hide the mtime of the encrypted container, but it’s a false assumption, as last modification time would still be tracked by syncthing, so it’s information that is leaking even if you decide to hide it.
Not sure, that having some things encrypted is really niche today. And actually it is not my decision to keep modification time unmodified. That is intentional standard for Veracrypt (and for Truecrypt). Personally, I do not care, I just kept the standard behavior. By the way, I do not quite get why information about modification time should still leak.
I acknowledge that many file syncing or backup software fails on this scenario. Yet, I think it is at least worth a note in the documentation that in such a scenario file changes may remain unnoticed (just to avoid a pitfall some people may stumble into).
And may be in the future somebody is interested in adding this feature to syncthing
It’s not that it would be hard to add, it’s that it would be painful to the point of uselessness to read and hash your possibly multi-gigabyte encrypted disk image on every scan just to see if it has changed.
This is not a fringe case. There are lots of times where file contents change without metadata changes. Many databases have that as on option, for example.
A scan should not be necessary to detect changes, even on Windows. On every operating sytem except Windows, it should be rather trivial. You can tell inotify, for examply, to notify you on file writes, and it will regardless of whether the modification timestamp on the file changes or not. In Windows you might have to resort to using change journal records. Means you have to sift through a while volume’s worth of change notifications looking for the ones you are interested in, but this should be able to be done efficiently. I think the fsnotify peeps were working on change journal support at one point.
Databases don’t update mtimes because databases usually use memory mapped files and not real files. Memory mapped files don’t fire inotify events as its effectively modifying memory and not files, so this would not work.
I don’t think nor veracrypt nor mmapped files is a real use cases that need solving.
My files are not multi-gigabyte. But I acknowledge, that other people may have files of sizes that render regular hashing prohibitive. Should you ever run out-of ideas what to add to syncthing, please feel free to re-consider this subject.
inotify does indeed fire on mem mapped files when they are msynced to disk. Windows update journalling certainly captures database file writes. And both APIs are eminently usable for VeraCrypt containers, which should make file write detection without metadata changes pretty simple. Syncthing’s rsync-lite-ish protocol also seems eminently applicable for VeraCrypt containers. Honestly Syncthing seems like a perfect candidate technically and philosophically to embrace the usage case of sharing encrypted containers. I’m rather surprised it doesn’t already.
From my tests a few years ago, “inotify” which I guess we are using as a term for many implementations, does not fire consistently across all platforms/implementations.
I still don’t see a use case that can’t get away without having this.
You already have a work around for veracrypt, so it does not feel like a significant enough argument to spend effort on this.
Syncthing absolutely embraces syncing shared encrypted containers. Lots of people do precisely that. Syncthing just doesn’t embrace detecting changes in setups that intentionally hide the fact that a file changed.
knarf, I have the exact same use case as you (an encrypted container that does not change date upon mod). I have a workaround that I’ve used for years and it works fine. I have a shortcut on my Windows desktop that includes:
%windir%\system32\cmd.exe /k copy file.tc +,,
After I close the repository, this shortcut updates the date and then the file is caught in the sync. It essentially duplicates a Linux touch command. (I also have shortcuts that automate opening and closing the repository, but they are incidental to the point of this thread.)
Wouldn’t it be possible to add a Full Rescan with file re-read interval? I could even see it useful to have it fire once a month or so by default and throw a sync conflict if the files change unexpectedly with the possibility to surpass the warnings for people with encrypted containers and stuff rather than not doing anything at all and let it bite people who might use Syncthing for backups and whatnot.
By default?! Hashing potentially a whole lot of data using tons of resources and causing potentially long timespan without synchronisation is a terrible idea (by default!) - just look around for topics about long scans.
Functionality to recheck files has been brought up, more in the context of detecting accidental data corruption (disk defects and the like). I think that would definitely have merit, not on my roadmap though.
I believe it is a good idea in principle. It could be folder’s option (not the global one) and, of course, should be disabled by default. Another counter to full rehashing could be added to the interface together with some radio-button to select behavior: 1. As when changed regularly; 2. Conflict; 3. Replace by remote copy.
Another option that I would like to see is access to the hash value through REST API. And, potentially, to each piece of a big file to control partial downloading. For example, I have a huge zip archive and what to extract few files. So, I could sync only the header and the part which contain the data.
It seems such options do not contradict the general concept of Syncthing.