Case insensitive renames - how to solve?

calmh · June 22, 2015, 7:14am

Problem

We do two pass scanning to figure out our local contents;

Walk the folder. For each file check if it’s already in the database and is a match, otherwise hash it and add to the database. Database is indexed by file name.
Iterate over the database, issuing an os.Lstat() for each item in it. If we get back an error, note the file as deleted.

This breaks when a file is renamed case-only on a case insensitive file system. In step one we find it as a new file, since the database is case sensitive. In step two we don’t delete the old variant, becase os.Lstat() on a case variant of an existing file name still works.

Solution Ideas

Continue Pretending It’s Case Sensitive

We might be able to get away with just fixing step two in the scanning process. We’ll still hash the file unnecessarily (and lose version vector history) but this is not a huge deal… We could change the algorithm from the current Lstat()-based into something like

When iterating over the database,
- If the current item is a directory, do a listdir on it
- If it’s a file, look for it in the listdir results from the previous step

Thus we would not find the file under the incorrect case variant, and conclude it’s been deleted. (We’ll also save a bunch of Lstat calls so it may actually be more efficient.)

Correctly Handle Case Insensitivity

We could add a FlagCaseInsenstive at the protocol level and include it in FileInfos. This bit would be set by the scanner when it knows it’s operating on a case insensitive filesystem (by configuration, or we can auto detect it). When set, the file would be stored in the database under a canonicalized name (i.e. lower case) and all the set.FileSet methods would need to know about this and handle it correctly…

Thus lookups would find the file under any case varient and we wouldn’t see a case-only rename as a new file + delete. We’d probably need special handling to actually pick up the case change though.

When syncing files from case sensitive devices to case insensitive we must “taint” them with the bit in question. Or not? We need the flag internally, but there’s really no need to tell others about it…

Other ways?

canton7 · June 22, 2015, 7:37am

My gut says to do it the same way git does it:

There’s a configuration variable, which is set by default on machines that need it
Syncthing deals with files as if filenames are case-sensitive wherever possible
When looking for the file-on-disk that matches a database entry, allow different casing

E.g., Windows use adds ‘Makefile’. Syncthing indexes that as ‘Makefile’, and transfers ‘Makefile’ to other nodes. User renames that to ‘makefile’. Syncthing assumes that still refers to its ‘Makefile’ database entry, and does nothing.

If the user does actually want to change Syncthing’s record of a file’s case, then the usual ‘rename to something else, then rename back’ that Windows users are so used to should still work…

AudriusButkevicius · June 22, 2015, 10:00am

Just an idea, but perhaps os.FileInfo returned by os.Lstat has the actual capitalization the file as it’s stored on the hard disk, hence we can detect that the capitalization is different?

Otherwise, @canton7’s solution makes sense as it’s minimal effort (given it deals with the issues we currently have, which I cannot recall).

calmh · June 22, 2015, 2:02pm

I was hoping that as well, but no it just returns what it’s given.

It requires being able to find Makefile in the database based on seeing makefile on disk, which we currently cannot. One way of handling it would be something like what I’m talking about in option two above, or introducing a translation layer where we keep a mapping from lowercase(filename) to filename. But that makes everything a bit more expensive…

AudriusButkevicius · June 22, 2015, 2:13pm

So the taint thing seems very hard to maintain, and leaks to other devices.

Plus, if you store the file under the normalized name on the remote end (which is case-sensitive), how are you going to pick up the existing file from the database as you scan? I guess you just blindly stab at both non-normalized and normalized names?

To me it feels that if a system is case-insensitive, we should try and contain that within that, hence the listdir approach seems reasonable.