This requires case-folding to happen exactly the same on all platforms and within Syncthing. This is something that following Unicode standards should allow, but considering the track record of the main proprietary OS vendors on that front, I wouldn’t dare take it as a given.
And why can’t these two be auto-detected based on the connected devices instead of slapping the default on everyone?
That data loss happens at very specific occurrences, why can’t this require manual action like 1c?
What you’ve said is correct only if the current system can’t be improved, but that’s really not the case.
There are three issues with this that make such kind of autodetection a huge endeavor in and of itself:
A given machine may not be connected to every other device in the swarm. Worse, it may not even be aware of all the devices and their properties. Consider the config where machines A and B share a folder, while A shares the same folder with C, D, and E (none of whom is aware of B), and B shares the same folder with F, G, and H (none of whom is aware of A). How is C supposed to even know about H?
A shared folder is on a Linux box with case-sensitive EXT4, but there is a subdirectory mounted inside it, and that mountpoint holds a case-insensitive FAT32 filesystem. Autodetecting these situations reliably is not something I’d aspire to implement.
A further complication of cases 1. and 2., where either a new case-sensitive machine is added to otherwise case-insensitive swarm (or vice versa), of a new case-sensitive subdirectory is added to a case-insensitive folder (or vice versa). Good luck trying to implement reliable autodetection of these cases!
Because 1c without user interaction does not lead to data loss, while 2c does. Data loss is worse than inconvenience, period. We can have a user jump extra hoops to make things work as he/she expects (or at all), but we can’t, shouldn’t and mustn’t have a situation where simply installing and running Syncthing without checking a very specific option somewhere in it results in losing user data.
It’s actually not as bad as it sounds. During my attempt to storm this issue last year we (I’m still awed by the notorious patience and cooperation of all the maintainers during those weeks) managed to implement a case-insensitive version of FakeFS (the filesystem mock parts of the Syncthing code are tested against) that reliably mimicked the behaviour of filesystems on all the OSes we compile Syncthing for. It’s part of the test suite now, so should something break or change, we’ll be aware of it.
That’s good to hear, because I’ve occasionally had issues with case folding that just seem bizarre. Although, come to think of it, the more recent weirdness may be partly because of OS X’s decision to use NFD rather than NFC.
That specific weirdness we already do handle, at least.
And, yes. Why, Apple.
Except when someone relies on syncing being case-sensitive.
No. There is a real difference between saying “there is a conflict, we didn’t sync this file, you need to twiddle a config” and “you had two files that only differed in case, so we clobbered the data in one of them to match the other, sorry”.
It seems like you’re trying to argue that your convenience in avoiding to flip a default setting trumps other people’s risk for data loss. That’s not how this project works, and it’s a frankly inane discussion to even have. Even more so because none of this exists as code and the potential problems and solutions haven’t been fully explored yet.
It’s not inane to discuss if it’s even necessary to create hassle for any people. If none of this exists as a code it is the perfect time to discuss this. Especially if you’ve just said that “the potential problems and solutions haven’t been fully explored yet”.
You brought up two scenarios, let’s take those.
Why can’t this case:
you had two files that only differed in case, so we clobbered the data in one of them to match the other, sorry
Be turned into this instead:
there is a conflict where two files only differ in case, we could’ve synced these but this might result in data loss, do you want to change the config Y/n
That’s what I’m trying to understand and get an answer to (like nekr0z nicely did once already for one question), instead of getting dismissive non-technical “it’s a project goal”, “it’s a data loss risk” or a thread closed.
One more thing though, if it hasn’t been clear I have never said that I want people to lose their data or even risk losing their data, it’s an understandable concern and I’m grateful for the priority as a very long-time Syncthing user.
I try, but fail, to grasp how that is not exactly the solution @calmh is suggesting. Answering that question requires user interaction, so whatever is said in the message and whatever buttons it offers, is by definition not part of what happens without user interaction.
Well, this is exactly what we aim at when we say “Syncthing should be case-insensitive by default”.
When we say that Syncthing will default to case-insensitivity, we don’t imply that somehow all your "fOO"s, "Bar"s, and "BaZ"s will sync as "foo"s, "bar"s, and "baz"s, respectively, or will undergo some other case-folding. No, we shall be preserving the case, at least that’s what we aim for. But in case Syncthing sees “FOO” and “foo” in the same directory, it will stop working on this folder and display the error message, basically along the lines of what you’ve just suggested.
This will add the extra step for users that have only case-sensitive systems and expect “foo”, “Foo”, and “FOO” to sync as three different files in one directory; this is a perfectly valid usecase that we will continue to support, but it will require the user to click on the (intentionally) very scary red button “Yes, I do want this folder case-sensitive, and I understand that as soon as I share it with at least one machine with a case-insensitive filesystem, my data will be screwed up beyond any restoration!”
We believe this one extra step for this one category of users is a perfectly acceptable price to pay for avoiding silently destroying data on machines of many other users. And when I say “many”, I’m basing my guesstimation on the fact that our stats show about half the users use Syncthing on Windows and OS X, and it is likely very common that these users at some point purchase a Linux-based NAS for their homes, at which point their data is in peril.
Huh. There’s a point I’m not sure has been raised before. Case folding and case sensitivity are not the same thing. SQL, for instance, is case folding but case sensitive. (Try quoting lowercase identifiers in SQL.)
Also, a point I just realised I should have made in the earlier reply:
It would create a conflict, yes, but how is that different from the usual case of inconsistency between file contents? Calling this ‘data clobbering’ only makes sense from a perspective where case sensitive behaviour is the only conceivably correct thing, which is flatly not the case – even if you prefer it. (As, frankly, I do, but that’s neither here nor there.)
The issue being discussed here is not that systems A and B have files “FOO” and “foo”, and they get synced (or conflicted) — this one is simple and is basically an ordinary conflict, nothing too scary.
The real issue is this: A is case-sensitive and has “Foo” and “foo”, B is case-insensitive and tries to sync both, syncs them into one file and simply loses the contents of one of them, propagating it further.
The even bigger problem is that trying to rename “BaR” to “bar” on B results in the file being wiped totally: B tells A it created “bar” and removed “BaR”, A does that and tells B that yes, it removed “BaR”. B looks for “BaR” on its filesystem, sees it (because “bar” is there, which is the same for B), removes, the remove is propagated, and there you go: you tried to rename the file, and that file got removed completely.
Hmm. It seems I missed where the quote came from, specifically. My misreading; sorry.
Let’s take this case again. What I’ve been trying to figure out is why can’t device
B that does the data-destruction throw up a big scary prompt instead of the device
A that just has the files?
It’d be safer than the current way and the way you’ve proposed because it’d eliminate the chance of someone adding a destructive case-insensitive device to a case-sensitive swarm, that is the actual problem as you’ve highlighted.
Is the distinction understandable or did I still phrase it confusingly?
First of all, no single device does data destruction; data destruction happens because of the combination of the devices in the swarm. Every device does its best to do the right job, but the combination of efforts happens to be destructive. In some cases one might argue that it is the case-sensitive device that is a data destroyer, but that’s not the point.
Filesystems don’t report whether they are case-sensitive or not, and from the software’s (i.e. Syncthing’s) perspective the only way of knowing that the filesystem is case-insensitive is create a random file
foo, fill it with some data, and then read the file
FOO from the same directory and find all the same data there — then we may assume that either
foo is a hardlink for
FOO (we created it, so it hardly is), or the filesystem is case-insensitive. Doesn’t look like much trouble doing this check, but as I said earlier, every directory can be a mountpoint (or become one) and end up case-insensitive, and re-checking every subdirectory on every write is just not worth it.
Hence, there’s no simple way to know for a case-insensitive system that it is one. However, we can detect danger when we see
FOO in one directory (and you should obviously be on a case-sensitive filesystem for this to happen) and put up the alert; this, at least, looks doable from the programming point of view. Of course on an all-case-sensitive swarm this can be normal, but since we have no way of knowing that the swarm is all-case-sensitive (for the reasons I have described earlier in this thread) we’ll need user to explicitly enable this kind of behaviour.
Now, there may be better approaches to the whole thing, granted. As @calmh has already mentioned, implementing case-insensitivity is really hard, and we’re still not very close to getting there (not because of the lack of trying, mind you), and for all of this to even be of importance we must first get case-insensitivity sorted out. But at least this approach looks sane to the maintainers, and I have yet to see someone suggest a better one.
It’s also tricky for the case insensitive device who already has
Foo and gets an update for
fOO. Is it a problem that should be flagged? Is it an update to the same file just spelled differently because it comes from another also case insensitive system? Was the file just case-only renamed? I’m not 100% sure how to tell.
On a case sensitive system it’s easy to see when we have two such files and set a hypothetical “requires case sensitivity” flag on both, causing the warning/error to happen on the case insensitive side.
But what about two case sensitive systems with one insensitive in between? How would it understand what’s going on?
And my gut feeling is still that if someone has two computers, one with “Club meeting 20191230.txt” and the other with “Club Meeting 20191230.txt” the odds are greatly in favor of them actually intending these to be the same file and not two separate meeting notes beside each other. Regardless of what the file system would think. Case insensitive by default acknowledges this. I’m sure there are people who love to have files beside each other that just differ in case, but it’s not going to be the majority.
Majority or not, there simply seems to be no obvious way to solve the data-loss problems that we have as it is without subjecting these users to this little inconvenience we’re discussing.
Reading FS mounts sounds like a reasonable approach on Linux to detect case-insensitive filesystems being mounted and used somewhere?
It isn’t just an inconvenience though, it’s very likely data loss if someone isn’t being cautious. The very minimum I’m sure Syncthing could save someone a tremendous amount of problems if it did at least one case insensitivity check when adding a folder to be synced?