Implementing case insensitivity

I am very chill, just expressing my displeasure at a proposal that would be an annoying pain for me and approximately 48% of Syncthing’s users.

Failsafes and warnings can be implemented instead of asinine nonconfigurable feature-disabling. There’s no reasonable reason to pick the latter.

Dropbox did it very nicely and I used it a lot so it’s doable.

The GH issue was currently closed with pretty much that reason, if that’s not the case it would be nice to update it.

Asinine nonconfigurable feature-disabling has never been on the table. The issue you’re referring to begins with the subject “case insensitive by default” and ends a year and a half later with me saying literally “there will always be an option for case sensitivity”. So read either the first sentence or the last and take either on faith please.

As for symlinks, here’s my latest PR and the corresponding issue. As usual not everything is spelled out super clearly, but I see some main causes of concern:

  • symlinks within a folder should be forbidden?
  • versioning is incompatible with symlink targets as they’re outside the folder.
  • what happens when following is disabled (everything is deleted on the other side)? (a followed symlink would look like a normal directory to the other side)
  • “Only symlinks to directories are supported - not symlinks to files. We could support that on the scanner side, it’s not super difficult, but it’s a little bit more involved on the puller side as we don’t want to replace the symlink when pulling and we need to create the temp files on the other side of the symlink etc.”
  • having to name the symlinks to follow was a “usage disaster”, but …
  • … following all symlinks enables all kinds of shenanigans given that the other side can also add symlinks to the folder

and so on. It’s complex. None of the complexities are due to Windows or some lowest common denominator nonsense. The main problem with symlinks on Windows is that it’s difficult to not follow them under some circumstances.

Dropbox appears to do precisely what we do: sync the symlink as a symlink, and if it happens to point to something that was already in your folder then yes that also gets synced (because it would have been synced anyway). Probably this behavior has changed since you used it (the article says mid-2019). Maybe they ran into complexities.

Oh, and getting back to the case insensitivity thing – so that noone is mislead, it is much more difficult to implement a case insensitive system than a case sensitive one, whether at the filesystem level or at the application level. Case insensitivity is not a “reduced feature set”, it’s the OS & application bending backwards for the user in order to present something closer to how humans normally work. In our case we will need to be both case sensitive and insensitive and translate between the two worlds, which is another level of annoyance. None of this is trivial. Some of it is made worse by having the file name be the unique identifier of a file, but we don’t always have something better to work with.

3 Likes

It is for me and many others.

Initially it synced everything in its folder, if it was a symlink it followed it and synced either the folder or file behind it. It never synced symlinks themselves. I don’t think anyone wants symlinks to be synced, only followed.

But OTOH a gesture of thanks towards the developers who solve this pernicious problem would be nice, no? The compensation for working on this in general is not great.

2 Likes

You’re incorrect in that statement.

However it is nice that idea has been now buried. But case insensitivity by default is still terrible. There are better ways, for example if all the clients added to the “swarm” are Linux then there’s no need for a default that rudely ignores how the entire OS works on those machines. If a case-insensitive FS enters the swarm the user can be warned and the default changed only for those folders where one destination is reduced-capability. The Github issue was abruptly closed before a reasonable solution instead of a blanket-set default was found.

Also, if the only OSs that benefit from this default is Windows (or OSX to some extent) then 100% it is related to “Windows or some lowest common denominator”.

Whether or not activating case insensitivity by default or under which circumstances is pointless bikeshedding at this point. That’s not the hard and important part, what is is getting a system in place that can handle case insensitivity. Pointing out that you don’t want that doesn’t do any good, it’s clear that it’s useful and needed in many cases. If we ever get there and you feel like your user case is threatened, that’s the moment to speak up. However so so calmly and politely, chances are much better then that you are taken seriously. Most definitely don’t quote someone partly and out of context, that’s just super annoying.

3 Likes

I said it as a default is not a good decision, not that having case insensitivity support is bad. Talk about partial quotes, you took half of my sentence. (It being the default or not is also what the linked GitHub issue I reacted to talks about)

I don’t see anything non-calm about what I have said. Maybe not the most delicate but definitely not rude. Seeing how people in the GitHub thread were very polite but got nowhere and the issue got locked without a solution I had to bring up the topic here. I digress, discussing the tone of how I voice my concern about an anti-feature being the default is indeed bikeshedding.

I’m sorry, but it actually is. And not only a good decision, but the only viable decision at this point.

Syncthing’s first priority is (and was as long as I remember) not to lose any user data unexpectedly. Losing user data is absolutely the worst, and trumps any other concern such as performance, usability, etc. — says so in the Project’s Goals.

If (or, hopefully, when) we finally implement case-insensitive behaviour, we only have two options: have in enabled by default or have it disabled by default. So there are six possible user stories:

1a: Insensitive by default, user only has sensitive filesystems. This is inconvenient; Syncthing will not lose data, but may not sync some of it, and needs to be told to be sensitive. Bad, but not critical.

1b: Insensitive by default, user only has insensitive systems. Everything just works.

1c: Insensitice by default, user has a mix of systems. Some of the data may not start syncing right away and require user interaction, but nothing is lost unexpectedly. Not really inconvenient, because what else would you do.

2a: Sensitive by default, sensitive filesystems. Just works (the way it does now).

2b: Sensitive by default, insensitive filesystems. Works (the way it does now).

2c: Sensitive by default, mix of systems. Data loss is guaranteed, the way it is now. Very bad, should be avoided at any cost.

See? Having case-sensitive behaviour by default is basically incompatible with topmost priority project goal.

4 Likes

This requires case-folding to happen exactly the same on all platforms and within Syncthing. This is something that following Unicode standards should allow, but considering the track record of the main proprietary OS vendors on that front, I wouldn’t dare take it as a given.

And why can’t these two be auto-detected based on the connected devices instead of slapping the default on everyone?

That data loss happens at very specific occurrences, why can’t this require manual action like 1c?

What you’ve said is correct only if the current system can’t be improved, but that’s really not the case.

There are three issues with this that make such kind of autodetection a huge endeavor in and of itself:

  1. A given machine may not be connected to every other device in the swarm. Worse, it may not even be aware of all the devices and their properties. Consider the config where machines A and B share a folder, while A shares the same folder with C, D, and E (none of whom is aware of B), and B shares the same folder with F, G, and H (none of whom is aware of A). How is C supposed to even know about H?

  2. A shared folder is on a Linux box with case-sensitive EXT4, but there is a subdirectory mounted inside it, and that mountpoint holds a case-insensitive FAT32 filesystem. Autodetecting these situations reliably is not something I’d aspire to implement.

  3. A further complication of cases 1. and 2., where either a new case-sensitive machine is added to otherwise case-insensitive swarm (or vice versa), of a new case-sensitive subdirectory is added to a case-insensitive folder (or vice versa). Good luck trying to implement reliable autodetection of these cases!

Because 1c without user interaction does not lead to data loss, while 2c does. Data loss is worse than inconvenience, period. We can have a user jump extra hoops to make things work as he/she expects (or at all), but we can’t, shouldn’t and mustn’t have a situation where simply installing and running Syncthing without checking a very specific option somewhere in it results in losing user data.

3 Likes

It’s actually not as bad as it sounds. During my attempt to storm this issue last year we (I’m still awed by the notorious patience and cooperation of all the maintainers during those weeks) managed to implement a case-insensitive version of FakeFS (the filesystem mock parts of the Syncthing code are tested against) that reliably mimicked the behaviour of filesystems on all the OSes we compile Syncthing for. It’s part of the test suite now, so should something break or change, we’ll be aware of it.

4 Likes

That’s good to hear, because I’ve occasionally had issues with case folding that just seem bizarre. Although, come to think of it, the more recent weirdness may be partly because of OS X’s decision to use NFD rather than NFC.

That specific weirdness we already do handle, at least.

And, yes. Why, Apple. :frowning:

2 Likes

Except when someone relies on syncing being case-sensitive.

No. There is a real difference between saying “there is a conflict, we didn’t sync this file, you need to twiddle a config” and “you had two files that only differed in case, so we clobbered the data in one of them to match the other, sorry”.

It seems like you’re trying to argue that your convenience in avoiding to flip a default setting trumps other people’s risk for data loss. That’s not how this project works, and it’s a frankly inane discussion to even have. Even more so because none of this exists as code and the potential problems and solutions haven’t been fully explored yet.

5 Likes

It’s not inane to discuss if it’s even necessary to create hassle for any people. If none of this exists as a code it is the perfect time to discuss this. Especially if you’ve just said that “the potential problems and solutions haven’t been fully explored yet”.

You brought up two scenarios, let’s take those.

Why can’t this case:

you had two files that only differed in case, so we clobbered the data in one of them to match the other, sorry

Be turned into this instead:

there is a conflict where two files only differ in case, we could’ve synced these but this might result in data loss, do you want to change the config Y/n

That’s what I’m trying to understand and get an answer to (like nekr0z nicely did once already for one question), instead of getting dismissive non-technical “it’s a project goal”, “it’s a data loss risk” or a thread closed.

One more thing though, if it hasn’t been clear I have never said that I want people to lose their data or even risk losing their data, it’s an understandable concern and I’m grateful for the priority as a very long-time Syncthing user.

I try, but fail, to grasp how that is not exactly the solution @calmh is suggesting. Answering that question requires user interaction, so whatever is said in the message and whatever buttons it offers, is by definition not part of what happens without user interaction.

1 Like

Well, this is exactly what we aim at when we say “Syncthing should be case-insensitive by default”.

When we say that Syncthing will default to case-insensitivity, we don’t imply that somehow all your "fOO"s, "Bar"s, and "BaZ"s will sync as "foo"s, "bar"s, and "baz"s, respectively, or will undergo some other case-folding. No, we shall be preserving the case, at least that’s what we aim for. But in case Syncthing sees “FOO” and “foo” in the same directory, it will stop working on this folder and display the error message, basically along the lines of what you’ve just suggested.

This will add the extra step for users that have only case-sensitive systems and expect “foo”, “Foo”, and “FOO” to sync as three different files in one directory; this is a perfectly valid usecase that we will continue to support, but it will require the user to click on the (intentionally) very scary red button “Yes, I do want this folder case-sensitive, and I understand that as soon as I share it with at least one machine with a case-insensitive filesystem, my data will be screwed up beyond any restoration!”

We believe this one extra step for this one category of users is a perfectly acceptable price to pay for avoiding silently destroying data on machines of many other users. And when I say “many”, I’m basing my guesstimation on the fact that our stats show about half the users use Syncthing on Windows and OS X, and it is likely very common that these users at some point purchase a Linux-based NAS for their homes, at which point their data is in peril.