Implementing case insensitivity

odin · January 19, 2020, 12:50pm

Huh. There’s a point I’m not sure has been raised before. Case folding and case sensitivity are not the same thing. SQL, for instance, is case folding but case sensitive. (Try quoting lowercase identifiers in SQL.)

Also, a point I just realised I should have made in the earlier reply:

It would create a conflict, yes, but how is that different from the usual case of inconsistency between file contents? Calling this ‘data clobbering’ only makes sense from a perspective where case sensitive behaviour is the only conceivably correct thing, which is flatly not the case – even if you prefer it. (As, frankly, I do, but that’s neither here nor there.)

nekr0z · January 19, 2020, 1:03pm

The issue being discussed here is not that systems A and B have files “FOO” and “foo”, and they get synced (or conflicted) — this one is simple and is basically an ordinary conflict, nothing too scary.

The real issue is this: A is case-sensitive and has “Foo” and “foo”, B is case-insensitive and tries to sync both, syncs them into one file and simply loses the contents of one of them, propagating it further.

The even bigger problem is that trying to rename “BaR” to “bar” on B results in the file being wiped totally: B tells A it created “bar” and removed “BaR”, A does that and tells B that yes, it removed “BaR”. B looks for “BaR” on its filesystem, sees it (because “bar” is there, which is the same for B), removes, the remove is propagated, and there you go: you tried to rename the file, and that file got removed completely.

odin · January 19, 2020, 1:35pm

Hmm. It seems I missed where the quote came from, specifically. My misreading; sorry.

Avamander · January 19, 2020, 4:37pm

Let’s take this case again. What I’ve been trying to figure out is why can’t device B that does the data-destruction throw up a big scary prompt instead of the device A that just has the files?

It’d be safer than the current way and the way you’ve proposed because it’d eliminate the chance of someone adding a destructive case-insensitive device to a case-sensitive swarm, that is the actual problem as you’ve highlighted.

Is the distinction understandable or did I still phrase it confusingly?

nekr0z · January 19, 2020, 6:45pm

First of all, no single device does data destruction; data destruction happens because of the combination of the devices in the swarm. Every device does its best to do the right job, but the combination of efforts happens to be destructive. In some cases one might argue that it is the case-sensitive device that is a data destroyer, but that’s not the point.

Filesystems don’t report whether they are case-sensitive or not, and from the software’s (i.e. Syncthing’s) perspective the only way of knowing that the filesystem is case-insensitive is create a random file foo, fill it with some data, and then read the file FOO from the same directory and find all the same data there — then we may assume that either foo is a hardlink for FOO (we created it, so it hardly is), or the filesystem is case-insensitive. Doesn’t look like much trouble doing this check, but as I said earlier, every directory can be a mountpoint (or become one) and end up case-insensitive, and re-checking every subdirectory on every write is just not worth it.

Hence, there’s no simple way to know for a case-insensitive system that it is one. However, we can detect danger when we see foo and FOO in one directory (and you should obviously be on a case-sensitive filesystem for this to happen) and put up the alert; this, at least, looks doable from the programming point of view. Of course on an all-case-sensitive swarm this can be normal, but since we have no way of knowing that the swarm is all-case-sensitive (for the reasons I have described earlier in this thread) we’ll need user to explicitly enable this kind of behaviour.

Now, there may be better approaches to the whole thing, granted. As @calmh has already mentioned, implementing case-insensitivity is really hard, and we’re still not very close to getting there (not because of the lack of trying, mind you), and for all of this to even be of importance we must first get case-insensitivity sorted out. But at least this approach looks sane to the maintainers, and I have yet to see someone suggest a better one.

calmh · January 19, 2020, 6:55pm

It’s also tricky for the case insensitive device who already has Foo and gets an update for fOO. Is it a problem that should be flagged? Is it an update to the same file just spelled differently because it comes from another also case insensitive system? Was the file just case-only renamed? I’m not 100% sure how to tell.

On a case sensitive system it’s easy to see when we have two such files and set a hypothetical “requires case sensitivity” flag on both, causing the warning/error to happen on the case insensitive side.

But what about two case sensitive systems with one insensitive in between? How would it understand what’s going on?

And my gut feeling is still that if someone has two computers, one with “Club meeting 20191230.txt” and the other with “Club Meeting 20191230.txt” the odds are greatly in favor of them actually intending these to be the same file and not two separate meeting notes beside each other. Regardless of what the file system would think. Case insensitive by default acknowledges this. I’m sure there are people who love to have files beside each other that just differ in case, but it’s not going to be the majority.

nekr0z · January 19, 2020, 7:00pm

Majority or not, there simply seems to be no obvious way to solve the data-loss problems that we have as it is without subjecting these users to this little inconvenience we’re discussing.

calmh · January 19, 2020, 7:39pm

Indeed

Avamander · January 19, 2020, 10:11pm

Reading FS mounts sounds like a reasonable approach on Linux to detect case-insensitive filesystems being mounted and used somewhere?

It isn’t just an inconvenience though, it’s very likely data loss if someone isn’t being cautious. The very minimum I’m sure Syncthing could save someone a tremendous amount of problems if it did at least one case insensitivity check when adding a folder to be synced?

odin · January 19, 2020, 10:33pm

Where’s the data loss?

Avamander · January 24, 2020, 10:33pm

Here.

odin · January 24, 2020, 10:37pm

You seem to be confused. This situation results from not dealing with the case insensitive systems, which is what defaulting to case sensitivity inherently means. What is being discussed is specifically a way to prevent this from happening by default, and catching it out so that it can be warned about.

Edit: To be explicit, what is meant by “being case insensitive by default” is to stop doing that.

Avamander · January 24, 2020, 10:59pm

You seem to have missed half of the thread, including the part where some people might want to retain the current default, sync files like they are because it is actually important, and how that case-sensitivity should still get a failsafe or two because it’s very likely someone is going to forget and introduce a device could cause data loss.

odin · January 25, 2020, 12:17am

The system operating case insensitively by default is precisely what enables the warning. If it does not maintain the case insensitive database it does not have the ability to warn you - that’s why it doesn’t warn you today. What you are asking for is “have the system operate case insensitively, but allow me to override its warnings” which as far as I can see nobody has categorically rejected as an option.

Alexdimarco · February 5, 2020, 6:17pm

Wild idea…

Maybe offering to turn on case sensitivity on installing the client on windows might be helpful now that windows 10 has case sensitivity available as an option? (Get-ChildItem -Recurse -Directory).FullName | ForEach-Object {fsutil.exe file setCaseSensitiveInfo $_ enable}; Write-Host $_

Not sure how to deal with this for new folders though… maybe if the option is checked an action could turn it on?

marco.trevisan · July 27, 2020, 10:44am

Hi all.

I’m a newbie of Syncthing. As a mac OS sysadmin, I can create a case sensitive APFS volume in a few seconds and set it up for my colleagues that are using Syncthing, as shared storage folder.

Nonetheless, IMHO Syncthing should follow the KISS principle, as this is one of the main objectives if I got it correctly.

Therefore, IMHO it should be by default case insensitive (but case-preserving). A case sensitive shared folder among two or more systems is more of use for IT technicians than normal users who only want to share their documents.

My 2 cents

calmh · July 27, 2020, 10:57am

We’re literally hours from merging a change that solves this nicely, letting case sensitive users continue be case sensitive and avoiding the data loss for everyone else. It’ll have a small performance hit and can be turned off when you’re 100% sure everything is case sensitive and you want no safety nets.

marco.trevisan · July 27, 2020, 10:58am

Very good to know! Thanks. Can you tell wether will it be available on next release? Thanks so much.

calmh · July 27, 2020, 11:00am

It should land in 1.9.0, barring unforeseen problems.

usefulvid · July 28, 2020, 12:56pm

Don’t forget to claim your money: