Syncthing re-syncing entire contents of identical video files

Hey all!

I’ll start in my traditional way, by thanking you graciously for an excellent product that has enabled a way of managing my life that would not otherwise be possible. I’ll also say (in an attempt to gently nudge others to do the same) that I’ve just donated once again to the project, as I will always do when I ask you for help (although I do not expect this donation to earn me any privilege or advantage in my request).

I’ve read through this, this, and this, and while they’ve given me a better understanding of how syncthing works, they haven’t helped me with my problem.

Setup

  • “Home”: Ubuntu Studio 19.10 on Asus VivoBook S15 running Syncthing 1.4.0 64-bit
  • “Hub”: Raspbian 9.11 on Raspberry Pi 3 running Syncthing 1.4.0 ARM

Problem (TL;DR)

  • Installed syncthing on “Home” computer detailed above
  • Rsync’d files/folders from “Hub” hard drive to new “Home” hard drive using -a option.
  • Copied syncthing config file from old “Home” setup to new one, changing identity strings as necessary.
  • All folders sync’d up fine, recognizing the equality of the files on both sides and rectifying metadata in a lightweight way, except my gigantic “Videos” folder. Videos folder is doing a FULL SYNC of identical files from “Hub” to “Home” - around 100Gb and has been running for days.

I can’t figure out why it thinks it needs to do a full content sync, and I’ve verified that the content of the files is identical on both sides:

What I’ve Tried

  • Completely removing the folder from the “Home” configuration, rsync’ing again, and then re-adding the folder to syncthing. It just tries to copy all the content again.

Extra Screenshots

“Home” folder pane:

Screenshot_2020-04-04_14-07-24

“Hub” folder pane:

Screenshot_2020-04-04_14-07-46

Thanks again for everything, and let me know if you have any ideas as to how to mitigate this or would like more input.

Just hit the API for some file information and saw something potentially interesting, although I’m not sure if it’s meaningful. Here’s the “Home” api call and response:

curl -X GET -H "X-API-Key: [REDACTED]" 'http://localhost:8384/rest/db/file?folder=7afli-u9ucj&file=Misc%20Snippets%2FMexico%202018%2FVID_20180309_171732.mp4'
{
  "availability": [
    {
      "id": "YYAMBQ7-[REDACTED]",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2018-03-09T17:17:32.401996454-06:00",
    "modifiedBy": "I6XESL5",
    "mustRescan": false,
    "name": "Misc Snippets/Mexico 2018/VID_20180309_171732.mp4",
    "noPermissions": false,
    "numBlocks": 2565,
    "permissions": "0660",
    "sequence": 768,
    "size": 336107179,
    "type": "FILE",
    "version": [
      "I6XESL5:1",
      "ZYBIYCM:1"
    ]
  },
  "local": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2018-03-09T17:17:32.401996454-06:00",
    "modifiedBy": "ZYBIYCM",
    "mustRescan": false,
    "name": "Misc Snippets/Mexico 2018/VID_20180309_171732.mp4",
    "noPermissions": false,
    "numBlocks": 1283,
    "permissions": "0660",
    "sequence": 251,
    "size": 336107179,
    "type": "FILE",
    "version": [
      "ZYBIYCM:1"
    ]
  }
}

And here’s the “Hub” response for the same call:

{
  "availability": [
    {
      "id": "ZYBIYCM-[REDACTED]",
      "fromTemporary": true
    }
  ],
  "global": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2018-03-09T17:17:32.401996454-06:00",
    "modifiedBy": "I6XESL5",
    "mustRescan": false,
    "name": "Misc Snippets/Mexico 2018/VID_20180309_171732.mp4",
    "noPermissions": false,
    "numBlocks": 2565,
    "permissions": "0660",
    "sequence": 768,
    "size": 336107179,
    "type": "FILE",
    "version": [
      "I6XESL5:1",
      "ZYBIYCM:1"
    ]
  },
  "local": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2018-03-09T17:17:32.401996454-06:00",
    "modifiedBy": "I6XESL5",
    "mustRescan": false,
    "name": "Misc Snippets/Mexico 2018/VID_20180309_171732.mp4",
    "noPermissions": false,
    "numBlocks": 2565,
    "permissions": "0660",
    "sequence": 768,
    "size": 336107179,
    "type": "FILE",
    "version": [
      "I6XESL5:1",
      "ZYBIYCM:1"
    ]
  }
}

I think the interesting thing I see is the presence of the I6XESL5 id, which is the computer that this new “Home” configuration replaced. That machine is no longer on the network anywhere and won’t ever be again.

Additionally, I see the numBlocks and sequence differ between the two.

Again, not sure if either is a meaningful observation…

This looks like a device with large blocks enabled, and a device with large blocks disabled, because one of them is likely running an older version before large blocks were enabled by default.

1 Like

Also in your second screenshot, the majority of the top file is filled with green, which the legend explains is “Reused”. This means that percentage of the file was re-used, not downloaded. It will take time to perform this for large files on slow devices/storage as it requires the entire file to be read and hashed, and the similar parts copied.

Yes, I noticed that and went back and looked at others. I believe it’s actually green because I restarted Syncthing after most of the file had downloaded. All files since that file show either all or a vast majority of the file downloaded and not reused. But yes, thanks for pointing that out!

Aha, good catch! That does appear to be the case. Is there a way to force my “Hub” instance to re-index all its files using the “useLargeBlocks” configuration now that it does not appear to be exposed as a settable config?

Ah, found the POST /rest/system/reset endpoint. Trying that and will report progress.

Just touching the files would cause them to be rescanned.

Ok, confirmed, it was definitely the useLargeBlocks issue. Here’s a detailed explanation for anyone finding this via google (all, please feel free to correct me if I’m wrong about any of this).

What Happened

When syncthing finds a new file, it indexes the file in chunks so it can economize on chunks that may have already been transferred from other hosts (i.e., it uses bit torrent strategies for file transfer).

In the beginning, syncthing used a single standard block size. However, for very large files, that resulted in huge indices that were cumbersome and non-performant. To combat this, syncthing introduced an option, useLargeBlocks that made the block size a factor of the file size (within a certain range). At version 1.2.0, this option became hard-coded.

However, for hosts (like mine) that had existed since before the change and whose indices contained files that hadn’t been changed for years (e.g., my old movie files), the files continued to be represented in the indices by blocks calculated during the small-standard-block epoch.

Thus, when I introduced a new brand new host which indexed all my files using the new default adaptive block sizes, it triggered a full resync, which was highly undesirable.

Solution

In my case, the solution was the following:

  1. Upgrade all hosts to latest syncthing
  2. Pause all affected hosts
  3. Call POST curl -X GET -H "X-API-Key: [YOUR API KEY]" 'http://localhost:8384/rest/system/reset?folder=[FOLDER ID] to re-index the affected folders (or without a folder to re-index the whole host).
  4. Unpause all affected hosts

Technically I don’t think you have to unpause anything, but doing so helped me feel safe about the indices that were being generated.

Thanks for the help!

2 Likes

Sounds good and in your context of really old files not really relevant, but still as a warning:

Resetting a database is a process with potentially dangerous, or at least annoying consequences. If the actual files on disk are not the same on all synced devices, this will result in conflicts.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.