UTF-8 Error Messages

Hello,

I upgraded to Syncthing 0.10.29 with my distro, and I’m now getting some strange error messages over and over:

WARNING: File name "amiga/amikit/AmiKit/Utilities/CodeAudio/Catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.

I have a folder I’m sharing called “emu,” which contains files that I use with various emulators; this particular file (and there are lots of them) comes from an emulated Commodore Amiga. This never happened before with previous versions; should I submit a bug?

Thanks!

–Rich

Since recent version syncthing tries to deal with non-ascii characters and unify them between different os’es, but the file in question has an unsupported character in the filename. You could just rename the file to solve this.

Hm; my concern here would be that the file would then not be able to be used by the emulator or the OS it’s running (Amiga OS). It does look like all the files have to do with language properties for languages that I don’t speak. :smile:

Here’s the full list:

Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/Scalos/Locale/Catalogs/espa\xf1ol" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/Scalos/Locale/Catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/Scalos/Locale/Catalogs/\xc3e\xd3tina" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/Tools/Scalos_Comment/Catalogs/Fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/Tools/Scalos_GetHidden/catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/env-archive/scalos.68K/Fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/env-archive/scalos.AROS/Fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/env-archive/scalos.MOS/Fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:08 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/a4000/Work/software/ScalosBeta/env-archive/scalos.OS4/Fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:09 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/amikit/AmiKit/Internet/YAM/Catalogs/espa\xf1ol" is not in UTF8 encoding; skipping.
Mar 25 11:20:09 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/amikit/AmiKit/Internet/YAM/Catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:09 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/amikit/AmiKit/Internet/YAM/Catalogs/t\xfcrk\xe7e" is not in UTF8 encoding; skipping.
Mar 25 11:20:09 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/amikit/AmiKit/Locale/Catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.
Mar 25 11:20:09 enterprise syncthing[742]: [TDSQ7] WARNING: File name "amiga/amikit/AmiKit/Utilities/CodeAudio/Catalogs/fran\xe7ais" is not in UTF8 encoding; skipping.

I’m hesitant to rename these files, but will if this is the way forward with Syncthing. I’m much more interested in having my emulator configuration synced to various machines than I am in language properties that I won’t use. I imagine, though, that there’s a possibility that you’ll run into this encoding issue with anyone using old files that contain non-ASCII characters.

Thanks!

–Rich

To be honest I don’t know enough about this, hence I summon @calmh

Thanks!

I just tried an experiment: I unzipped the latest AmiKit (which you can get here) into my synced Emu folder. Immediately I got a bunch of these UTF-8 errors. So that’s an easy test: AmiKit is free; you can download it and test the file names yourself.

I am not an Amiga guru by any stretch; I have these files because I sometimes teach an Intro to Computers class for kids, using old computers and emulators as a reference. I didn’t know what encoding they used until just now when I read a Wikipedia article. I could try convmv on the files and see what happens.

These files were most likely silently ignored previously; now you get a notice about it at least. The issue is most likely that the file names contain non-ASCII characters in an Amiga encoding. That’s fine, but we want to represent the filenames correctly on other operating systems and encodings, and we can’t do that without understanding the filenames to start with. Hence we skip attempting to sync them.

Okay, I’ve confirmed now that running convmv on the files to convert their names to UTF-8 seems to work. I can still run AmiKit in the emulator (FS-UAE), and Syncthing seems to be syncing everything well.

P.S. This has been my most painful folder to sync, since it’s a folder structure with lots of tiny files for old computers that one might want to emulate, if you’re into that sort of thing. It takes the longest to sync, and it burns my CPU while it’s doing it. That’s probably the toughest use case for this software that I can think of.

We do a lot of things “per file” when syncing, which can get painful and cause significant overhead when the files themselves are tiny and trivial to sync. In particular, there’s a database update when we complete syncing a file (which is a sync write, hence slow)…

I wonder if it would be possible to detect all the tiny files that need to be synced, zip them up, send them over, unzip them, and then update the database? Maybe you could have a threshold like if the file is <100K, add it to an archive with all other files <100K, and send them all at once?

You’d still have to add all the files to the index, which is majority of the cost.

We could be smarter about the database updates for small files, I’m sure. Still though, this is usually only an issue for the initial sync, so a one time cost…

Well, sort of. During my test of this, I decided I’d make it easier on Syncthing and delete two folders that were just dumps of data I have on a DVD anyway. Then I reinstalled AmiKit and did my test. Syncthing went into a tailspin after that, reporting that of a folder with ~67,000 files, ~170,000 of them were out of sync. It would count down from this number about 4 files every 10 seconds, all the while running all the machines syncing this folder (three) at 97% CPU. It was using hardly any bandwidth, so I assume this was all database activity, registering files that were deleted (I deleted them on all the machines simultaneously, thinking that syncing would go faster).

I’ve now removed this folder from Syncthing, because I didn’t want to run all those machines that hot for that long (two of them are laptops). I may try syncing it cleanly again from scratch at some future date. Clearly, I am operating with a limited understanding of the algorithms involved. How does delete work? Do you look at every file/folder entry individually, or do you assume that if a top-level folder is deleted, everything under it was deleted too?