Out of Sync with no progress, v1.3.3 cluster win+linux

Hi,

I’m pretty clueless out of sync. Here’s my state:

Device A: SendOnly - VBHJ3YR-WXRWUR4-GVZBFIL-4I2GE5T-TYHNPZY-NY3PKDS-Y2AFDBO-MZNP4AD
{
  "availability": [
    {
      "id": "7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2019-11-05T15:55:30.1028327+01:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Garmin\\GarminExpress.exe",
    "noPermissions": false,
    "numBlocks": 660,
    "permissions": "0755",
    "sequence": 10076,
    "size": 86506688,
    "type": "FILE",
    "version": [
      "VBHJ3YR:1"
    ]
  },
  "local": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2019-11-05T15:55:30.1028327+01:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Garmin\\GarminExpress.exe",
    "noPermissions": false,
    "numBlocks": 660,
    "permissions": "0755",
    "sequence": 10923,
    "size": 86506688,
    "type": "FILE",
    "version": [
      "VBHJ3YR:1"
    ]
  }
}


Device B: SendReceive - 7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH
{
  "availability": [
    {
      "id": "VPYAMUE-D44JYED-5DNCNOV-QNNFESH-QLVRLP4-525HK3X-GK5RXN4-QS5LMQY",
      "fromTemporary": false
    },
    {
      "id": "VBHJ3YR-WXRWUR4-GVZBFIL-4I2GE5T-TYHNPZY-NY3PKDS-Y2AFDBO-MZNP4AD",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2019-11-05T15:55:30.1028327+01:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Garmin/GarminExpress.exe",
    "noPermissions": false,
    "numBlocks": 660,
    "permissions": "0755",
    "sequence": 10995,
    "size": 86506688,
    "type": "FILE",
    "version": [
      "VBHJ3YR:1"
    ]
  },
  "local": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2019-11-05T15:55:30.1028327+01:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Garmin/GarminExpress.exe",
    "noPermissions": false,
    "numBlocks": 660,
    "permissions": "0755",
    "sequence": 10076,
    "size": 86506688,
    "type": "FILE",
    "version": [
      "VBHJ3YR:1"
    ]
  }
}


Device C: RecvOnly - ZTMOUGV-HJGYXTG-FBBGH7V-6P53PU5-H2KADVS-M64T24H-IBOXGC5-DZ5SJAU
{
  "availability": null,
  "global": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2020-01-28T16:00:39.500843573+01:00",
    "modifiedBy": "ZTMOUGV",
    "mustRescan": false,
    "name": "Garmin/GarminExpress.exe",
    "noPermissions": true,
    "numBlocks": 660,
    "sequence": 3096,
    "size": 86505358,
    "type": "FILE",
    "version": [
      "ZTMOUGV:2"
    ]
  },
  "local": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2020-01-28T16:00:39.500843573+01:00",
    "modifiedBy": "ZTMOUGV",
    "mustRescan": false,
    "name": "Garmin/GarminExpress.exe",
    "noPermissions": true,
    "numBlocks": 660,
    "sequence": 3096,
    "size": 86505358,
    "type": "FILE",
    "version": [
      "ZTMOUGV:2"
    ]
  }
}


Device D: RecvOnly - VPYAMUE-D44JYED-5DNCNOV-QNNFESH-QLVRLP4-525HK3X-GK5RXN4-QS5LMQY
{
  "availability": [
    {
      "id": "7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2019-11-05T15:55:30.1028327+01:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Garmin/GarminExpress.exe",
    "noPermissions": false,
    "numBlocks": 660,
    "permissions": "0755",
    "sequence": 10076,
    "size": 86506688,
    "type": "FILE",
    "version": [
      "VBHJ3YR:1"
    ]
  },
  "local": {
    "deleted": false,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2019-11-05T15:55:30.1028327+01:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Garmin/GarminExpress.exe",
    "noPermissions": false,
    "numBlocks": 660,
    "permissions": "0755",
    "sequence": 10995,
    "size": 86506688,
    "type": "FILE",
    "version": [
      "VBHJ3YR:1"
    ]
  }
}

Cluster connections are made “star-like”:

Device A <-> Device B <-> Device C
..................^-----> Device D

Device A is “master” sendonly by concept.

I was fully in sync with the cluster when I performed the following:

  • Device C: This a test device where I wanted to simulate bit rot. I was in sync, set the folder to “sendOnly”. Also did this to device B in order to get an out of sync notification in case syncthing detected the (future) damage to GarminExpress.exe (later).

C: Shutdown Syncthing.

C: Altered GarminExpress.exe to simulate bit rot. (changed 1 byte)

C: Nuked Syncthing’s database.

C: Started Syncthing.

C: Waited for scan to complete.

  • Device B: RESULT: Out of sync was shown on B for device C correctly as I expected it to show up. The 1 item was GarminExpress.exe . No override button was shown (unexpected result). image image

  • Device C: Put it to “recvOnly” again (like it was before on initial sync). No progress, still device B shows 1 item out of sync for C.

  • Device B: Tried changing folder type from “sendonly” to “sendreceive” (this was the point when taking above screenshot). Still no progress.

Other devices are in sync for that folder and don’t show override buttons.

Tried to switch folder type from “recvonly” to “sendreceive” - no progress.

I hope I’ve collected enough diagnose data. What can I do to get in sync again?

  • Log of Device C:
2020-01-28 16:07:34 My ID: ZTMOUGV-HJGYXTG-FBBGH7V-6P53PU5-H2KADVS-M64T24H-IBOXGC5-DZ5SJAU
2020-01-28 16:07:35 Single thread SHA256 performance is 317 MB/s using minio/sha256-simd (196 MB/s using crypto/sha256).
2020-01-28 16:07:36 Hashing performance is 274.13 MB/s
2020-01-28 16:07:36 Overall send rate is unlimited, receive rate is unlimited
2020-01-28 16:07:36 Ready to synchronize "Installationsquellen" (n4q4z-ohpfz) (receiveonly)
2020-01-28 16:07:36 TCP listener (0.0.0.0:20081) starting
2020-01-28 16:07:36 GUI and API listening on [::]:20080
2020-01-28 16:07:36 Access the GUI via the following URL: http://127.0.0.1:20080/
2020-01-28 16:07:36 My name is "vm-fs01"
2020-01-28 16:07:36 Device 7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH is "HSTNAS01" at [tcp4://10.20.10.160:20081]
2020-01-28 16:07:36 ...
2020-01-28 16:07:36 Syncthing should not run as a privileged or system user. Please consider using a normal user account.
2020-01-28 16:07:36 Established secure connection to 7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH at 10.20.10.130:34854-10.20.10.160:20081/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256
2020-01-28 16:07:36 Device 7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH client is "syncthing v1.3.3" named "HSTNAS01" at 10.20.10.130:34854-10.20.10.160:20081/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256
2020-01-28 16:07:36 Completed initial scan of receiveonly folder "Installationsquellen" (n4q4z-ohpfz)
  • Log of other devices also does not show errors.

Thanks for your help. :slight_smile:

Kind regards Catfriend1

Was it a simple mix-up that you write of “device” instead of “folder” in connection with the override button? If yes, please fix to clarify your report, if not, then there’s a misconception that might already solve your problem:

Override/revert buttons only exist on folders. They either override the entire cluster, i.e. all devices, with your current state, or they revert the local state with the global one. There’s no operation like this on remote devices. Given you reset the db on C, the changed/bit-rotten file was a conflict. As it was “bit-rotten”, there was no clear winner based on modtime, so I think the lexigraphically lower id in the version vector wins, which is B. So I would expect that device C pulls in the new file from B, or the folder there shows the override button if send-only.

@imsodin

Was it a simple mix-up that you write of “device” instead of “folder” in connection with the override button?

To clarify, I checked the folder I’m writing about if the button showed up but that was not the case.

As it was “bit-rotten”, there was no clear winner based on modtime, so I think the lexigraphically lower id in the version vector wins, which is B. So I would expect that device C pulls in the new file from B, or the folder there shows the override button if send-only.

I would expect that, too but it was not pulled. I also checked that the network is available and nodes are connected. (this is a VM network with local connectivity to the host using statically defined IP addresses, the initial sync went with about 100 MByte/sec. so not suspecting the problem there…)

As you correctly understood from my post, I’ve also “made the way” by changing the folder types accordingly that C could pull the “original correct” file from B. I’m in the dark why it didn’t sort out by itself.

Now tried on device C: “rm /…/GarminExpress.exe” Hit rescan.

And voilà: C showed the revert button for a very short second, I didn’t do anything, it disappeared by itself and C started to pull the GarminExpress.exe (correct orignal version) from B getting “in sync” again.

Now another fun fact, my file counters aren’t fine anymore. Also checked ignores, they are NOT the cause. All nodes show global state 2687/507. All except device C show 2687/507 as local state as well.

Device C:

As device C is a test-server still being in setup, I’ll do a full file compare with a third-party tool between B and C later.

Just a note:

  • Device A: Win10x64
  • Device B: Linux amd64 (VM-Host 1)
  • Device C: Linux amd64 (VM @ VM-Host 1)
  • Device D: Linux amd64 (VM-Host 2)

Ok, I had a look again at the rest api output at the top and the modification time did change, i.e. the file on device C wins the conflict. At which point in the saga in the above post did you query the rest api? At that point either B didn’t get the update from C or it classified it as “older” than what it has (which it shouldn’t).

Anyway I don’t see any single clear pattern of what’s going on and the whole thing just has so many moving parts that I don’t see a clear angle to investigate something.

At which point in the sage in the above post did you query the rest api?

I’ve queried it at the end of my described steps. This was when the sync made no progress and was before I removed GarminExpress.exe manually from device C to get in sync again (automatically).

Somehow, I’ve had two fresh setups on different hardware/networks the same day and they failed. My feeling tells me, we’ll find the culprit better when debugging the two device cluster at Fresh setup with 2 devices, sync stuck with no progress .

This setup used: NTFS (A) - ext4 (B,D) - zfs © as file systems. The referenced other topic used ext4 - zfs fs.

[[ May zfs have to do something with it? I’ve freshly setup both servers (independent from each other) and the zfs ran fine from the first moment. zpool status -v and zpool scrub showed no errors after checking both appliances. ]]

I doubt this is filesystem related. Feels like there is some bug somewhere.

1 Like

Not “wanting” to have discovered a bug, but yes, that’s my feeling too. Yesterday, when the out of sync occured, I’ve backed up the (syncthing) database folder before nuking it. Did a “stindex --mode idxck /path/to/backed/up/index-14.0-folder” and it came back without output.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.