Best Practice for Circular Sync

afrazkhan · February 10, 2022, 9:47am

I have multiple machines, and would like to syncronised some folders between them. I’d like it if any change on any machine, would sync to all other machines — and obviously by extension — all those other machines sync their changes back.

Let’s say there are machines A, B, and C, and for the sake of simplicity just folder-A. I’ve set A, B, and C to share folder-A with each other, i.e. A shares folder-A with A and B, B shares folder-A with A and C, and so on:

┌──────┐                 ┌──────┐
│      ◄─────────────────┤      │
│   A  │                 │  B   │
│      ├─────────────────►      │
└▲───┬─┘                 └─▲──┬─┘
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
 │   │                     │  │
┌┴───▼─┐                   │  │
│      ├───────────────────┘  │
│  C   │                      │
│      │◄─────────────────────┘
└──────┘

This looked as if it was working, but I noticed that every so often one of the machines would think that folder-A was out of sync, and start to re-download all the files again. This happens even if not even a single file has changed in folder-A. Looking into the details, it showed for example that the node B had a newer version of the file (again, the files haven’t actually changed).

Am I right in thinking that I’m going about things wrong, and the best way to achieve what I want is to have a central always-online node acting as a kind of server or single source of truth, to which all the other nodes sync to?

┌────────┐    ┌─────────┐    ┌────────┐
│        │    │         │    │        │
│        │    │         │    │        │
│   A    │    │   B     │    │  C     │
│        │    │         │    │        │
└─┬──▲───┘    └─┬───▲───┘    └──┬──▲──┘
  │  │          │   │           │  │
  │  │          │   │           │  │
  │  │          │   │           │  │
  │  │          │   │           │  │
  │  │          │   │           │  │
  │  │        ┌─▼───┴───┐       │  │
  │  │        │         │       │  │
  │  └────────┤         ◄───────┘  │
  │           │   S     │          │
  └───────────►         ├──────────┘
              └─────────┘

calmh · February 10, 2022, 9:52am

Your “circular” (I’d call it fully-connected) setup is best practices and should work fine.

That certainly shouldn’t happen; I’d go so far as to say I disbelieve it does (at least the “re-download all files” part) and that you might have misunderstood something. Logs and screenshots and we’ll sort out what’s going on, though.

In any case, which file is considered changed and/or newer isn’t going to be affected by the topology, so changing that won’t solve your problem.

tomasz86 · February 10, 2022, 9:59am

Is Android involved, and possibly an older installation?

afrazkhan · February 10, 2022, 11:27am

~~Nope, and nope~~

Sorry, yes! Android is in the mix, but it only shares two of the folders and they are never an issue.

afrazkhan · February 10, 2022, 11:42am

That certainly shouldn’t happen; I’d go so far as to say I disbelieve it does (at least the “re-download all files” part) and that you might have misunderstood something. Logs and screenshots and we’ll sort out what’s going on, though.

I’ll wait until it happens again and provide more information and screenshots.

It’s entirely possible I’ve misunderstood something, but it seemed clear from the output. The UI would tell me that folder-A was out of sync. When I clicked for details, it would then for example say that machine-C had the latest changed file set (another machine).

I don’t understand how this could happen either, since I thought that it all worked with hashes.

tomasz86 · February 10, 2022, 12:57pm

I mentioned Android because of its moving timestamps, but if that’s not it, then the culprit is probably something else. Are there any funky filesystems in use? You could also enable File Versioning and then compare what how exactly the new and replaced version of a specific file differs with one another.

afrazkhan · February 10, 2022, 1:47pm

enable File Versioning and then compare what how exactly the new and replaced version of a specific file differs with one another.

That’s a good idea. I have file versioning enabled already, so will check that when I get home.

afrazkhan · February 11, 2022, 8:20am

Well, this is fun

I couldn’t recreate the problem (TL;DR; I removed the folder and it isn’t easy to get it back into the state it was), but in looking at the backup of one of the folders I took before I removed it, I saw something odd.

The newly syncd folder had smaller size than the backup I took. Digging into the folder structure, I identified a single file for the discrepency. I then saw that ls -l showed the correct size, whilst du reported a different size. Running md5sum on the file showed that the files were actually identical.

I did notice that the backup version had MacOS extended attributes attached to it, but even removing those didn’t change the filesize difference.

I don’t know if any of this is related, but it is curious.

calmh · February 11, 2022, 8:52am

Does it contain a lot of nulls? That will result in sparse files (blocks of all zeroes not actually stored on disk) and hence smaller size in du than ls.

afrazkhan · February 11, 2022, 12:51pm

Does it contain a lot of nulls? That will result in sparse files (blocks of all zeroes not actually stored on disk) and hence smaller size in du than ls

It does, it’s an mbox file. Though I did a count, and both files contain the same number.

I think it’s a red herring, but interesting none the less