Allow rules for file rename

CySlidfer · November 21, 2023, 12:06pm

Since Syncthing seems to not be able to handle invalid chars at the target system, and the devs are understandably concerned about chaos after a database loss. Why not add a .syncthing file into the folder where renames happened, with a mapping of renamed name to original name?

Then I can even define my own rules how I want to rename certain characters.

Currently I’m facing having lots of : in my audiobook names and trying to sync it to Android.

calmh · November 21, 2023, 12:15pm

Renames are tricky and error-prone regardless of where we save the state since it’s usually not a one-to-one mapping (i.e., there may be multiple files on one side all mapped to a single name on the other side, etc.).

There’s the eternal PR for doing character replacements but, IIRC, it still has conceptual issues that I think makes it a tough sell.

It’s a lot easier, usually, to just do the rename you want once, on the source system.

CySlidfer · November 21, 2023, 12:42pm

Thanks for the prompt reply.

Even in multi to one, you can simply add a _2 and put it to your mapping file. I understand this is a tricky one. But I fail to see the issue with this solution beside possibly some performance concerns.

Lets assume this source situation with 2 files:

The dev: File 1.jpg
The dev_ File 1.jpg

: is invalid on target, so it gets renamed to The dev_ File 1.jpg

and mapping file is:

The dev_ File1.jpg => The dev: File 1.jpg

now The dev_ File 1.jpg is synced

The file already exists, but there is a .syncthing mapping file.
From that it is clear that this is a different file. So we rename it to:

The dev_ File1 (2).jpg

and the content of .syncthing now is:

The dev_ File1.jpg => The dev: File 1.jpg
The dev_ File1 (2).jpg => The dev_ File 1.jpg

This could be optionally as well as limmited to one way folders (send only, so the .syncthing file would be on the receiving side)

I think this would go a long way already, allowing to easily push your collection to various copy only devices without having to rename everything to make it work

PS: An even smarter solution would ofc, keep the existing valid name in tact and only rename the : verison to (2), but we are most likely talking about extemely fringe cases here.

calmh · November 21, 2023, 1:21pm

I agree that in principle this sounds like it could work. However, I think there are a lot of implementation details that would make it not great.

We’d probably need to keep the .syncthing.renames or whatever file in the same directory as the file in question, so that it would apply even if there are multiple Syncthing folders pointed at different levels of the same directory tree, and so that it would have a chance of coming along when a directory is renamed. We’d have to look at and possibly read that file for every file operation, which is costly – and we’d need to do it recursively upwards to the folder root (because we might have a path like foo:/bar#/baz? where each level got renamed into foo_/bar_ (2)/baz_ etc). This applies to a lot of places, like when there’s an incoming request “give me a data block for foo:/bar#/baz? at offset 42” where we currently can just open the file, but we’d now need to do a lot of traversal and reading to even figure out where that file might be on disk. We can’t cache too aggressively because there may be several Syncthing instances working on the same folders on disk.

When committing a file, there’s an unavoidable interval between performing the file operation and saving the .syncthing.renames during which data loss might occur. Similarly when reading; if we look at a file name but the renames file hasn’t been written yet we’ll get bad information. We could perhaps invent some locking scheme, but that’s ugly and another layer of complexity we don’t have at the moment. In both cases we might be risking data loss or overwrites of the wrong file if we get the timing wrong, and that’s typically not acceptable.

I do try to avoid the trap of having perfect be the enemy of good enough. A solution shouldn’t have to be theoretically perfect if it delivers significant value. However, when it seems like it may cause a lot of issues on top of potential confusion and there is a different solution (rename those files), then I’m a bit sceptical that it’s worth it.

AudriusButkevicius · November 22, 2023, 12:50am

We should enscribe this answer on some wall in some cave. A lot of people come with good intentions with good suggestions, however the complexity is usually much further reaching than people realise and this answer is great evidence of that.

Insert amen emoji here.

CySlidfer · November 25, 2023, 12:33am

I feel you, as a programmer myself, I totally get the KISS principle and the hidden complexity that arises from implementation details.

But

I also think this is a core problem that a tool like this should to solve one way or the other. (And I really love it already quite a lot to the point that I sent a donation. Nice job!)

Having said that, here my suggestions to your points.

We’d probably need to keep the .syncthing.renames or whatever file in the same directory as the file in question, so that it would apply even if there are multiple Syncthing folders pointed at different levels of the same directory tree, and so that it would have a chance of coming along when a directory is renamed. We’d have to look at and possibly read that file for every file operation, which is costly

I really would put a single file into the root of a share and cache it’s content to do quick mappings for what you mentioned.

I don’t really see the issue of having such files in sub-folders in case of nested sync folders. Such files in all sub-directories should be simply ignored.

and we’d need to do it recursively upwards to the folder root (because we might have a path like foo:/bar#/baz? where each level got renamed into foo_/bar_ (2)/baz_ etc).

Correct, but I think this shouldn’t be too complicated given that a mapping is loaded from a single file and then each name, be it folder or file, is first checked if it has an entry in the map

When committing a file, there’s an unavoidable interval between performing the file operation and saving the .syncthing.renames during which data loss might occur. Similarly when reading; if we look at a file name but the renames file hasn’t been written yet we’ll get bad information. We could perhaps invent some locking scheme, but that’s ugly and another layer of complexity we don’t have at the moment. In both cases we might be risking data loss or overwrites of the wrong file if we get the timing wrong, and that’s typically not acceptable.

I still think it would be totally acceptable if this would only work in SEND ONLY mode in addition with a warning “The source has files and folder names that contain characters that the target file system can not support. Should they get renamed on the target system, or excluded from sync?” I assume that in most cases where a two way sync is necessary, the system on both ends is the same. And if we assume SEND ONLY, I think first writing the .syncthing.renames file would already be sufficient. Ofc, using rename, so .syncthing.renames.new, then, when fully written, replace the old one, then send the file. Worst thing could be dead entries in that file, because the original file was deleted in the exact wrong moment. But this could be corrected on next full scan or maybe even be ignored.

I do try to avoid the trap of having perfect be the enemy of good enough. A solution shouldn’t have to be theoretically perfect if it delivers significant value. However, when it seems like it may cause a lot of issues on top of potential confusion and there is a different solution (rename those files), then I’m a bit sceptical that it’s worth it.

Wise words, but just to give you some real data for your judgment. My Audiobook collection has 2000 file errors that I would need to rename to be able to fully sync it to my Android device. (Will end up writing a script to achieve exactly that, but mainly because I can)

calmh · November 25, 2023, 7:41am

The source doesn’t really know what’s supported or not on any given destination; it’s hard enough to figure out even on the destination itself… But assuming you mean this would be applied to all outgoing data, I’ll concede it seems theoretically possible. It would be similar to our encrypted folders – we change the filenames on the way out, and convert them back on the way in, and we could use the database to store the mapping between them. There would still be a lot of corner cases, like properly handling renames of files to and from “mapped names” etc, so a lot of effort and a lot of surface area for bugs to hide.

Since your files have colons in them you’re not on Windows, and since the other side is Android I’m going to guess it’s not a Mac either¹. So running rename s/:/_/ audiobooks/** in the closest terminal should be about all the required scripting. Even if I’m wrong about my assumptions I’m pretty sure every system out there has a bulk rename facility available for it given some googling.

So this still boils down to “you run one command in your terminal, once” vs “we spend a whole lot of development effort building and maintaining this forever” and I don’t see that getting weighed towards the latter.

1) If it is, brew install rename and then the rest applies, or select them all in Finder and right-click to bulk rename…

CySlidfer · November 25, 2023, 9:29am

Even if it turns out to be so simple (thanks) I still have to rename my collection to something more ugly

Not even talking about UX. But I rest my case. I tried

One more side note that I noticed yesterday and most likely has a better solution I did not find, but was an unexpected and bad experience. I deleted a root folder on the target, and taught I could easily resync it there, but it complained about missing markers and I had to recreate the share on the sharing side, getting a new folder id to solve this. (which involves pressing a scary but harmless delete button on the share side) I missed a “start over” button of some sort for the receiving side in such a case

acolomb · November 26, 2023, 1:28am

If you want to fight an uphill battle for others to implement a system that works well with your choice of file names, then I think it will be more worthwhile convincing Google and Microsoft to lift their stupid and arbitrary limitations regarding allowed filename characters. Good luck with that…

Starting over after deleting the whole shared folder on one side is easy. Instead of recreating the folder marker (DON’T, it will lead to data loss!) you simply remove it from the Syncthing config on that side. Then wait until the sharing request pops up again and accept it. Nothing to be done on the other side. If you want to speed it up, pause and resume the device in Syncthing on either end, that triggers the prompt immediately after reestablishing a connection.