Implementing case insensitivity

But OTOH a gesture of thanks towards the developers who solve this pernicious problem would be nice, no? The compensation for working on this in general is not great.

2 Likes

You’re incorrect in that statement.

However it is nice that idea has been now buried. But case insensitivity by default is still terrible. There are better ways, for example if all the clients added to the “swarm” are Linux then there’s no need for a default that rudely ignores how the entire OS works on those machines. If a case-insensitive FS enters the swarm the user can be warned and the default changed only for those folders where one destination is reduced-capability. The Github issue was abruptly closed before a reasonable solution instead of a blanket-set default was found.

Also, if the only OSs that benefit from this default is Windows (or OSX to some extent) then 100% it is related to “Windows or some lowest common denominator”.

Whether or not activating case insensitivity by default or under which circumstances is pointless bikeshedding at this point. That’s not the hard and important part, what is is getting a system in place that can handle case insensitivity. Pointing out that you don’t want that doesn’t do any good, it’s clear that it’s useful and needed in many cases. If we ever get there and you feel like your user case is threatened, that’s the moment to speak up. However so so calmly and politely, chances are much better then that you are taken seriously. Most definitely don’t quote someone partly and out of context, that’s just super annoying.

3 Likes

I said it as a default is not a good decision, not that having case insensitivity support is bad. Talk about partial quotes, you took half of my sentence. (It being the default or not is also what the linked GitHub issue I reacted to talks about)

I don’t see anything non-calm about what I have said. Maybe not the most delicate but definitely not rude. Seeing how people in the GitHub thread were very polite but got nowhere and the issue got locked without a solution I had to bring up the topic here. I digress, discussing the tone of how I voice my concern about an anti-feature being the default is indeed bikeshedding.

I’m sorry, but it actually is. And not only a good decision, but the only viable decision at this point.

Syncthing’s first priority is (and was as long as I remember) not to lose any user data unexpectedly. Losing user data is absolutely the worst, and trumps any other concern such as performance, usability, etc. — says so in the Project’s Goals.

If (or, hopefully, when) we finally implement case-insensitive behaviour, we only have two options: have in enabled by default or have it disabled by default. So there are six possible user stories:

1a: Insensitive by default, user only has sensitive filesystems. This is inconvenient; Syncthing will not lose data, but may not sync some of it, and needs to be told to be sensitive. Bad, but not critical.

1b: Insensitive by default, user only has insensitive systems. Everything just works.

1c: Insensitice by default, user has a mix of systems. Some of the data may not start syncing right away and require user interaction, but nothing is lost unexpectedly. Not really inconvenient, because what else would you do.

2a: Sensitive by default, sensitive filesystems. Just works (the way it does now).

2b: Sensitive by default, insensitive filesystems. Works (the way it does now).

2c: Sensitive by default, mix of systems. Data loss is guaranteed, the way it is now. Very bad, should be avoided at any cost.

See? Having case-sensitive behaviour by default is basically incompatible with topmost priority project goal.

4 Likes

This requires case-folding to happen exactly the same on all platforms and within Syncthing. This is something that following Unicode standards should allow, but considering the track record of the main proprietary OS vendors on that front, I wouldn’t dare take it as a given.

And why can’t these two be auto-detected based on the connected devices instead of slapping the default on everyone?

That data loss happens at very specific occurrences, why can’t this require manual action like 1c?

What you’ve said is correct only if the current system can’t be improved, but that’s really not the case.

There are three issues with this that make such kind of autodetection a huge endeavor in and of itself:

  1. A given machine may not be connected to every other device in the swarm. Worse, it may not even be aware of all the devices and their properties. Consider the config where machines A and B share a folder, while A shares the same folder with C, D, and E (none of whom is aware of B), and B shares the same folder with F, G, and H (none of whom is aware of A). How is C supposed to even know about H?

  2. A shared folder is on a Linux box with case-sensitive EXT4, but there is a subdirectory mounted inside it, and that mountpoint holds a case-insensitive FAT32 filesystem. Autodetecting these situations reliably is not something I’d aspire to implement.

  3. A further complication of cases 1. and 2., where either a new case-sensitive machine is added to otherwise case-insensitive swarm (or vice versa), of a new case-sensitive subdirectory is added to a case-insensitive folder (or vice versa). Good luck trying to implement reliable autodetection of these cases!

Because 1c without user interaction does not lead to data loss, while 2c does. Data loss is worse than inconvenience, period. We can have a user jump extra hoops to make things work as he/she expects (or at all), but we can’t, shouldn’t and mustn’t have a situation where simply installing and running Syncthing without checking a very specific option somewhere in it results in losing user data.

3 Likes

It’s actually not as bad as it sounds. During my attempt to storm this issue last year we (I’m still awed by the notorious patience and cooperation of all the maintainers during those weeks) managed to implement a case-insensitive version of FakeFS (the filesystem mock parts of the Syncthing code are tested against) that reliably mimicked the behaviour of filesystems on all the OSes we compile Syncthing for. It’s part of the test suite now, so should something break or change, we’ll be aware of it.

4 Likes

That’s good to hear, because I’ve occasionally had issues with case folding that just seem bizarre. Although, come to think of it, the more recent weirdness may be partly because of OS X’s decision to use NFD rather than NFC.

That specific weirdness we already do handle, at least.

And, yes. Why, Apple. :frowning:

2 Likes

Except when someone relies on syncing being case-sensitive.

No. There is a real difference between saying “there is a conflict, we didn’t sync this file, you need to twiddle a config” and “you had two files that only differed in case, so we clobbered the data in one of them to match the other, sorry”.

It seems like you’re trying to argue that your convenience in avoiding to flip a default setting trumps other people’s risk for data loss. That’s not how this project works, and it’s a frankly inane discussion to even have. Even more so because none of this exists as code and the potential problems and solutions haven’t been fully explored yet.

4 Likes

It’s not inane to discuss if it’s even necessary to create hassle for any people. If none of this exists as a code it is the perfect time to discuss this. Especially if you’ve just said that “the potential problems and solutions haven’t been fully explored yet”.

You brought up two scenarios, let’s take those.

Why can’t this case:

you had two files that only differed in case, so we clobbered the data in one of them to match the other, sorry

Be turned into this instead:

there is a conflict where two files only differ in case, we could’ve synced these but this might result in data loss, do you want to change the config Y/n

That’s what I’m trying to understand and get an answer to (like nekr0z nicely did once already for one question), instead of getting dismissive non-technical “it’s a project goal”, “it’s a data loss risk” or a thread closed.

One more thing though, if it hasn’t been clear I have never said that I want people to lose their data or even risk losing their data, it’s an understandable concern and I’m grateful for the priority as a very long-time Syncthing user.

I try, but fail, to grasp how that is not exactly the solution @calmh is suggesting. Answering that question requires user interaction, so whatever is said in the message and whatever buttons it offers, is by definition not part of what happens without user interaction.

1 Like

Well, this is exactly what we aim at when we say “Syncthing should be case-insensitive by default”.

When we say that Syncthing will default to case-insensitivity, we don’t imply that somehow all your "fOO"s, "Bar"s, and "BaZ"s will sync as "foo"s, "bar"s, and "baz"s, respectively, or will undergo some other case-folding. No, we shall be preserving the case, at least that’s what we aim for. But in case Syncthing sees “FOO” and “foo” in the same directory, it will stop working on this folder and display the error message, basically along the lines of what you’ve just suggested.

This will add the extra step for users that have only case-sensitive systems and expect “foo”, “Foo”, and “FOO” to sync as three different files in one directory; this is a perfectly valid usecase that we will continue to support, but it will require the user to click on the (intentionally) very scary red button “Yes, I do want this folder case-sensitive, and I understand that as soon as I share it with at least one machine with a case-insensitive filesystem, my data will be screwed up beyond any restoration!”

We believe this one extra step for this one category of users is a perfectly acceptable price to pay for avoiding silently destroying data on machines of many other users. And when I say “many”, I’m basing my guesstimation on the fact that our stats show about half the users use Syncthing on Windows and OS X, and it is likely very common that these users at some point purchase a Linux-based NAS for their homes, at which point their data is in peril.

Huh. There’s a point I’m not sure has been raised before. Case folding and case sensitivity are not the same thing. SQL, for instance, is case folding but case sensitive. (Try quoting lowercase identifiers in SQL.)

Also, a point I just realised I should have made in the earlier reply:

It would create a conflict, yes, but how is that different from the usual case of inconsistency between file contents? Calling this ‘data clobbering’ only makes sense from a perspective where case sensitive behaviour is the only conceivably correct thing, which is flatly not the case – even if you prefer it. (As, frankly, I do, but that’s neither here nor there.)

The issue being discussed here is not that systems A and B have files “FOO” and “foo”, and they get synced (or conflicted) — this one is simple and is basically an ordinary conflict, nothing too scary.

The real issue is this: A is case-sensitive and has “Foo” and “foo”, B is case-insensitive and tries to sync both, syncs them into one file and simply loses the contents of one of them, propagating it further.

The even bigger problem is that trying to rename “BaR” to “bar” on B results in the file being wiped totally: B tells A it created “bar” and removed “BaR”, A does that and tells B that yes, it removed “BaR”. B looks for “BaR” on its filesystem, sees it (because “bar” is there, which is the same for B), removes, the remove is propagated, and there you go: you tried to rename the file, and that file got removed completely.

2 Likes

Hmm. It seems I missed where the quote came from, specifically. My misreading; sorry.

Let’s take this case again. What I’ve been trying to figure out is why can’t device B that does the data-destruction throw up a big scary prompt instead of the device A that just has the files?

It’d be safer than the current way and the way you’ve proposed because it’d eliminate the chance of someone adding a destructive case-insensitive device to a case-sensitive swarm, that is the actual problem as you’ve highlighted.

Is the distinction understandable or did I still phrase it confusingly?