RFC: Syncing extended attributes

I’m considering implementing support for syncing extended attributes, and I’ve written up a short proposal on roughly how that could work. It lives here:

I’d appreciate thoughts and comments. Especially if you have insights about how this works that I’ve obviously missed, or use cases that wouldn’t be covered by the described mechanisms.

5 Likes

We aim to implement syncing of extended attributes in the cases where it makes sense – that is, between nodes sharing the same operating system. If a file has extended attributes originating on Linux we will not apply them when syncing the file on FreeBSD.

Thank you very much for not trying to shoehorn attributes between different operating systems :slightly_smiling_face:. Dropbox does that, i.e. it saves MacOS attributes into NTFS streams on Windows, and I experienced a lot of problems because of that when I was still using it. In particular, Windows backup didn’t want to cooperate with those files, keeping backing up the same files with those data streams over and over again.

3 Likes

How do you plan to detect attribute only changes? Do they trigger inotify?

In general the “requiredAttributes” could also be a regexp of stuff you care about.

2 Likes

Yeah no idea how to do the detection efficiently or at all, I didn’t look at the APIs yet. I’m hoping it triggers some sort of notification, like a permission change does. But even then, that just makes us look at the file. If changing xattrs doesn’t alter something visible in the lstat() result we would need to load all attributes every time we scan a file, and that would suck.

1 Like

I think you will have to do part of the scan anyway?

I really doubt xattrs changes anything on lstat return value. Even if it does, £10 it won’t be consistent/true across all platforms.

I guess if notifications work, its not a tragedy as the scan is targeted, hence the xattrs loading is targeted, if they don’t, or are inconsistent across platform (another £10 here), you’ll be forced to load xattrs in scan always, for everything.

1 Like

I don’t think it’s hopeless. On macOS and Linux (where I can be bothered to run manual tests right now), writing an xattr updates the “inode change time” which is visible in stat, but not something we currently look at. Maybe we might need to an extra “metadata changed time” timestamp to our FileInfo to track it.

jb@unu:~ $ stat foo
  File: foo
  Size: 4         	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 9699849     Links: 1
Access: (0644/-rw-r--r--)  Uid: (  502/      jb)   Gid: (  100/   users)
Access: 2022-05-19 08:50:46.930472587 +0000
Modify: 2022-05-19 08:50:46.930472587 +0000
Change: 2022-05-19 08:51:30.610412220 +0000
 Birth: -
jb@unu:~ $ xattr -w user.syncthing.test test3 foo
jb@unu:~ $ stat foo
  File: foo
  Size: 4         	Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d	Inode: 9699849     Links: 1
Access: (0644/-rw-r--r--)  Uid: (  502/      jb)   Gid: (  100/   users)
Access: 2022-05-19 08:50:46.930472587 +0000
Modify: 2022-05-19 08:50:46.930472587 +0000
Change: 2022-05-19 08:51:38.826400862 +0000
 Birth: -

Certainly there will be platform-specific finickiness. I think that’s unavoidable and part of the game here.

2 Likes

I think at least on linux and macos, filesystem notifications should work. Though that doesn’t replace needing efficient detection on scan, as we still need to do full scans as well.
The entire thing seems reasonable and useful. It involves platform peculiarities and filesystems, so also likely to be very, very finicky indeed :slight_smile:

1 Like

At least NTFS supports Alternate Data Streams. A feature present since the beginning, yet few people are aware of it. tomasz86 already mentioned it in this thread. My question is: Are “Alternate Data Streams” included in what you call “extended attributes” or is this considered something different. At least those streams could be arbitrarily large, so it may not be called an attribute, right? On the other hand one could state: Size does not matter, what is not the content of a file is an attribute. So what is the case here?

1 Like

Size does matter, because I’m proposing to treat this as metadata, not file data. That means it gets passed around with the rest of the file metadata and saved to the database. Hence the proposal noting limits in the number of and size of these attributes. That said, I do think NTFS alternate streams fit in as “extended attributes”.

(The alternative is more complex, and wildly inefficient in the presumably most common case of multiple small attributes.)

1 Like

I don’t know the situation on OSs other than Windows. NTFS does not only support arbitrary large ‘alternate data streams’ but also ‘extended attributes’. The documentation on EA is very sparse (Backup API ). As far as I know they were created for OS/2 compatibility. This list a few additional use cases. ADS are used more frequently, like the zone-id for files downloaded from the internet. See also

In general I think it is very important in file handling applications (like archives) to preserve as many aspects of a file (not only the main data stream but also meta-data, security, ADS …) but when it comes to save these information on a other OS, it will be difficult. Eg NTFS synced to Linux/Mac and then to another NTFS. What and how can the information be preserved.

Personally I think ADS and EA are not first priority. More file attributes like system/hidden or creation time would be more useful. Sparse files are another often overlooked topic. Compressed or (EFS)-Encrypted attributes should be ignored by syncthing.

Since this thing will essentially build support for random OS specific attributes, we might as well add things like these as “extended attributes” in a Syncthing-internal namespace.

For macOS, I’d love to see tags (aka color labels) synced. Any sync service I’ve used (OneDrive, Google, Creative Cloud) ignore the tags.

Appreciate the request for input.

This has the potential to be an idiosyncratic feature for a number of reasons, some of which are:

  • If operating systems, themselves, use attributes, and those attributes are updated asynchronously after a file has changed (e.g., a thumbnail for an image file which is generated and written by an async processor) then you have the potential to pick those change up and roam them to another replica which may, in turn, be doing its own processing/writing of attributes. This might lead to loops or other weird issues.

  • In mixed OS environments, Syncthing comes very close to allowing a user to set up a fully connected topology and not worry about what path changes take. I say “comes close” because ACLs are obviously an exception but it’s easy to turn those off. If attribute syncing works differently when a file goes Mac->Mac then it does if it goes Mac->Windows->Mac, that will be unexpected. Your proposal would allow a savvy user, assuming no other constraints exist, from setting up a combination of settings and topology that produces the expected result. I’m not claiming a smart person wouldn’t be able to get the right result. But a less smart person is going to get unexpected results and even inconsistent results because the paths that changes takes are variable.

I guess I’m wondering if attribute syncing is a general purpose solution to the problem of “attributes don’t sync” or if it’s a solution to a specific set of well-defined problems. If it’s the latter, then I wonder about whether it might make more sense to try to build a solution that both works cross platform and is driven by an allow list of specific attributes to sync. I’m not thrilled to even suggest an allow list because it’s obviously a lot more complex to sync a subset of attributes and to have to support the case where the allow list changes and you have to rescan/compute what to sync per file. But this at least would allow you to never, for example, sync things that are always meant to be computed locally.

In any case, I will likely want to not enable this feature and I would hope that it’s easy to do that and that having it off would mean that no performance penalty is incurred.

Sorry this is mostly a list of problems and isn’t particularly helpful - this is a tricky thing you’re considering.

2 Likes

If two devices are fighting over an attribute, overwriting each other, then yeah that would result in endless syncing. That’s not different from two devices each wanting to enforce specific (different) permissions for example, or even contents, so I guess I’d like to see where this is actually a problem before considering it too deeply.

Indeed. The proposal is that both topologies would work identically from the Macs point of view, the Mac attributes just wouldn’t be applied on the Windows device.

Nonetheless, the proposal includes an allow list. :slight_smile:

Could you provide some motivation for this feature? I’m not sure I understand why you’d want to sync extended attributes in the first place, especially if you are planning to just sync all of them.

For example, everything related to ACL probably doesn’t make sense on other computers if they have different users / groups, so I’m not sure I’d want to sync those.

Extended attributes are also used a lot for quarantine on macOS when you download files from the internet. I’m not sure it makes sense to sync quarantine attributes. On the contrary, if we follow Apple’s lead, we should set our own quarantine attributes to tell the user that file came from SyncThing, just like Airdrop sets the quarantine attribute.

Also,

Quoting the wiki:

The “best effort” part means we do not make any attempt to make Linux extended attributes survive a rename on Windows, etc.

This sounds a bit inconvenient. I often move files around on my Windows machine to organise my stuff, if that would break something on the Mac side it would be really inconvenient.

Right now I don’t see the upside of extended attribute syncing, and just see potential problems and probably inconsistent behavior from the users perspectove if we do.

A lot of navigational, personalization features live in xattrs. People want to sync their tags used for search, custom icons etc.

1 Like

What Audrius says, and

That’s an assumption. When syncing between two NASes serving the same user base, I expect it to be quite useful. Probably essential, even.

2 Likes

Linux has four default reverse-DNS-ish namespaces: security, trusted, system, and user. It only makes sense to sync the user.* namespaces by default. The other namespaces require elevated privileges or special system capabilities to read and write. (Syncthing doesn’t normally run as root, right?)

FreeBSD/NetBSD only has two standard namespaces system and user identified by an integer instead of a name. The proposal wiki already talks about how to map these. However, as with Linux’s system namespace, the *BSD system namespace is also restricted. Syncthing will again likely only have access to the user namespace.

Luckily, all the data the user is likely to care about is stored in user.*. All application-specific and user-generated metadata will be stored there.

MacOS doesn’t care at all. Syncthing can read and write anything it wants (although writes to protected names silently fail). Linux and *BSD user.* namespace can be synced and stored on MacOS as-is.

MacOS stores everything under the named com.apple.* namespace. However, MacOS doesn’t enforce namespace conventions but encourages reverse organization domain-names (e.g. net.syncthing.

Any extended attributed originating from MacOS needs to be mapped into the user namespace when stored on Linux and FreeBSD. The simplest conversion to user.com.apple when storing locally; and it can easily be reversed when syncing out. However, anything other than user and com.apple can’t be mapped as easily. Therefore, anything synced from Mac that isn’t com.apple should be stored in the user.net.syncthing.compat.earootprefix namespace on Linux and FreeBSD. This can be unambiguously reversed when syncing back to Mac. However, all of that being said, the user.com.apple namespace mapping should be a special-case exception for convenience.

By default, the allow list should therefor only contain the com.apple and user namespaces.


If there is any interest on going the extra mile for cross-device compatibility, file comments and tags could be special-cased handled by Syncthing. These two are probably the most common and user-facing uses of extended attributes, with prominent GUI in Finder on the Mac and Dolphin (KDE).

  • user.xdg.comment (UTF-8 string) == com.apple.metadata:kMDItemFinderComment (binary property list with one item, a UTF-16 string)
  • user.xdg.tags (UTF-8 comma-separated string) == com.apple.metadata:_kMDItemUserTags (binary property list, array of color int and UTF-16 string).
1 Like

Some Googling later, and I discover that both Resilio and DropBox use a similar extended attribute mapping schema and only whitelists a handful of MacOS specific attributes.

1 Like