[v1.8.0] Local and global state swapped between two nodes

Could you copy the dbs and then take the following steps, noting if anything changes about the global/local states:

  1. One after the other run the RO nodes with -reset-deltas.

  2. Run with STRECHECKDBEVERY=1s to check/fix db and recalculate meta.

2 Likes

Hi @imsodin,

I took the following steps:

  1. Shutdown devA, devB, devC
  2. Copied databases
  3. Started devC with -reset-deltas

All folders show like this: image

  1. Started devB with -reset-deltas

The affected folder showed 2.814 local and 2.814 global files as state for some seconds and then showed the error state printed above (2.838 g, 2.814 l)

devC shows 2.814 g, 2.838 l (still error state like in the inital post)

  1. Started devA with -reset-deltas

It still shows 2.814 g, 2.814 l (like in initial post)

  1. Stopped devA.

  2. Put “export STRECHECKDBEVERY=1s” in front of Syncthing start on devB, devC.

  3. Restarted devC. No change in g/l accountings. Log showed “Stored folder metadata for “…” is 442376h9m21.825384227s old; recalculating” lines.

  4. Restarted devB. No change in g/l accountings. Log showed “Stored folder metadata for “…” is 442376h11m5.898206352s old; recalculating” lines.

  5. All in all, the problem persists.

Do you need the database copies?

Foot notes:

  • On all devices, I saw the line “Reinitializing delta index IDs” coming up in the log after starting with the command line switch.

Hmm I don’t know how to go from here but the outcome of syncing seems still fine. Just the report looks strange on the UI.

Ups, this somehow escaped my attention. The state persisting is definitely weird. Yes, I’d be interested in having a look at those dbs (though no promises…).

1 Like

Hi,

it’s been a while and I’ve checked again after updating to v1.7.1. Counting with SSH “find all” I now have:

  • dev B/install: 2174 files, 507 folders- 172.623.928.912 B
  • dev C/install: 2197 files, 506 folders- 173.922.801.015 B

Syncthing reports “up2date” on both ends. There are still no failed items and no revert local change buttons visible. Restarting the Syncthing instances doesn’t help either.

From my overlook position, I expect the SR -> RO pair between dev A and dev B is okay. Both show the same file/dir counts. The pair dev B RO -> dev C RO is still wrong (like shown in the first post).

I’ve now exported the file lists of dev B and dev C to compare them manually. (find | sort command, stripping the folder root path so only contents of the /install/… folder will be compared)

It’s remarkable dev C has too many files. Looking at the manual comparison on the OS/FS level, I clearly see it’s out of sync because it dev C missed some “deletions” where I can’t say if it dropped them internally or if dev B “forgot” to announce them to dev C. I have never set any special options on that folder pair before (so never used ignoreDelete, for example). This smells like a bug, but let’s wait what experts say.

Here’s a screenshot, dev B on the left, dev C on the right.

To make your life easier than reading text, the current Web UI output is…

dev A:

dev B:

dev C:

Ok, next I’ve tried to update to v1.8.0-rc.2 on all nodes.

—> dev A + dev B are still properly “in sync”. —> dev B sees dev C as “up2date” and is not transferring anything according to bitrate.

dev C: now sees dev B as “out of sync - 24 elements, ~1.21 GiB”

image

When I click the “24 elements” label, it doesn’t say anything in the popup dialog (another bug?), the window is just empty.

I suspect the “24 elements” are those I’ve found by the manual comparison between devices B and C with the SSH find command output. dev C still shows no button or hint to resolve the issue. Shouldn’t the “revert local changes” button pop up on dev C?

I’ll capture a copy of the DB in case we need it.

I presume it also shows the same (missing 24 files) state for device A?

I suspect if you run index recheck you’ll see problems on one of the devices. By the looks of it, it looks like C caused the trouble here, and claims it sent stuff which it really didn’t, in which case I think running -reset-deltas on both A and B would fix it.

We know this happens sometimes, but we never got to the bottom of why or where this happens.

I guess this gets us closer to “where it happens”, as it seems both A and B have the issue, pointing the fingers at C causing this.

Ok, I’d try this next time when on the cluster. Device A + B don’t show anything out-of-sync, meaning everything shows “up2date”. It’s only been device C causing trouble. I’ll come back when I did the --reset-deltas.

What I was saying is that C should declare both A and B out of sync.

It would be interesting to get the output of: https://docs.syncthing.net/rest/db-file-get.html

From all 3 before you do -reset-deltas.

1 Like

Hi,

before performing any action:

#
$folder = "n4q4z-ohpfz"
$file = "Adobe/AdobeFlashPlayer/Adobe_Flash_Player_CleanUp.exe"
#
"*** devA ***"
c:\windows\curl.exe -s -X GET -H "X-API-Key: $apiKeyA" 'http://'$ipDevA'/rest/db/file?folder='$folder'&file='$file
#
""
"***  devB ***"
c:\windows\curl.exe -s -X GET -H "X-API-Key: $apiKeyB" 'http://'$ipDevB'/rest/db/file?folder='$folder'&file='$file
#
""
"*** devC*** "
c:\windows\curl.exe -s -X GET -H "X-API-Key: $apiKeyC" 'http://'$ipDevC'/rest/db/file?folder='$folder'&file='$file


*** devA ***
{
  "availability": [
    {
      "id": "7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": true,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2017-09-05T14:25:19.0109486+02:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Adobe\\AdobeFlashPlayer\\Adobe_Flash_Player_CleanUp.exe",
    "noPermissions": false,
    "numBlocks": 0,
    "permissions": "0",
    "sequence": 15661,
    "size": 0,
    "type": "FILE",
    "version": [
      "LRORKSI:1",
      "VBHJ3YR:2",
      "VPYAMUE:1",
      "ZTMOUGV:1",
      "7E4EKDA:1"
    ]
  },
  "local": {
    "deleted": true,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2017-09-05T14:25:19.0109486+02:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Adobe\\AdobeFlashPlayer\\Adobe_Flash_Player_CleanUp.exe",
    "noPermissions": false,
    "numBlocks": 0,
    "permissions": "0",
    "sequence": 15661,
    "size": 0,
    "type": "FILE",
    "version": [
      "LRORKSI:1",
      "VBHJ3YR:2",
      "VPYAMUE:1",
      "ZTMOUGV:1",
      "7E4EKDA:1"
    ]
  }
}

***  devB ***
{
  "availability": [
    {
      "id": "VBHJ3YR-WXRWUR4-GVZBFIL-4I2GE5T-TYHNPZY-NY3PKDS-Y2AFDBO-MZNP4AD",
      "fromTemporary": false
    },
    {
      "id": "VPYAMUE-D44JYED-5DNCNOV-QNNFESH-QLVRLP4-525HK3X-GK5RXN4-QS5LMQY",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": true,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2017-09-05T14:25:19.0109486+02:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Adobe/AdobeFlashPlayer/Adobe_Flash_Player_CleanUp.exe",
    "noPermissions": false,
    "numBlocks": 0,
    "permissions": "0",
    "sequence": 15319,
    "size": 0,
    "type": "FILE",
    "version": [
      "LRORKSI:1",
      "VBHJ3YR:2",
      "VPYAMUE:1",
      "ZTMOUGV:1",
      "7E4EKDA:1"
    ]
  },
  "local": {
    "deleted": true,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2017-09-05T14:25:19.0109486+02:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Adobe/AdobeFlashPlayer/Adobe_Flash_Player_CleanUp.exe",
    "noPermissions": false,
    "numBlocks": 0,
    "permissions": "0",
    "sequence": 15319,
    "size": 0,
    "type": "FILE",
    "version": [
      "LRORKSI:1",
      "VBHJ3YR:2",
      "VPYAMUE:1",
      "ZTMOUGV:1",
      "7E4EKDA:1"
    ]
  }
}

*** devC*** 
{
  "availability": [
    {
      "id": "7E4EKDA-DEXQAEY-DIKXM2L-6XO7YWS-ARPGAW6-CRUSAI7-U2U5UMP-CCZKRAH",
      "fromTemporary": false
    }
  ],
  "global": {
    "deleted": true,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2017-09-05T14:25:19.0109486+02:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Adobe/AdobeFlashPlayer/Adobe_Flash_Player_CleanUp.exe",
    "noPermissions": false,
    "numBlocks": 0,
    "permissions": "0",
    "sequence": 15319,
    "size": 0,
    "type": "FILE",
    "version": [
      "LRORKSI:1",
      "VBHJ3YR:2",
      "VPYAMUE:1",
      "ZTMOUGV:1",
      "7E4EKDA:1"
    ]
  },
  "local": {
    "deleted": true,
    "ignored": false,
    "invalid": false,
    "localFlags": 0,
    "modified": "2017-09-05T14:25:19.0109486+02:00",
    "modifiedBy": "VBHJ3YR",
    "mustRescan": false,
    "name": "Adobe/AdobeFlashPlayer/Adobe_Flash_Player_CleanUp.exe",
    "noPermissions": false,
    "numBlocks": 0,
    "permissions": "0",
    "sequence": 31923,
    "size": 0,
    "type": "FILE",
    "version": [
      "LRORKSI:1",
      "VBHJ3YR:2",
      "VPYAMUE:1",
      "ZTMOUGV:1",
      "7E4EKDA:1"
    ]
  }
}
  • LRORKSI = ??? I don’t know what this is, maybe something old device that was removed ???
  • VBHJ3YR = devA
  • VPYAMUE = devC
  • 7E4EKDA = devB

After performing the --reset-deltas on all 3 nodes, the log showed “reinitializing delta index” but the state is exactly as before looking at the Web UI - see old screenshots [v1.6.1] Local and global state swapped between two nodes .

To sum up:

  • devA, devB: No indication of a problem on the web UI. Global and local states match.
  • devC: G: 2195 files, L: 2171 files, says that devB has 24 out of sync items but doesn’t mention them in the popup dialog (dialog empty)

I’ve now got a strong feeling I suffer from something similar to https://github.com/syncthing/syncthing/issues/6855#issuecomment-664478913, too.

As I wrote above, I don’t know the LRORKSI device ID and guess it was a device I’ve removed some time ago and that didn’t clean out well in the cluster?! I’m also using receiveOnly folders like the user in the linked GitHub comment said.

Hi,

good news. Since I’ve explained yesterday that I fresh started devC with v1.8.0-rc.2 and got the constant crashes “panic: device present in global list but missing as device/fileinfo entry” I decided to take the shortcut and start once more a fresh scan after deleting the index-DB folder with v1.8.0-rc.3. Since I did not change anything else I had expected the constant crashing problem to reappear but now everything is up2date and correctly accounted “in sync” on all three devices. Can’t explain this, because rc3 was only about fixing the versioning config bug but OK. It’s short before my vacation so sorry for going so fast now - the customer needs this up and running, that’s prio one. I will provide more logs and debugging if this tricky bug reappears in the future.

Thanks for your help and insights so far. :slight_smile:

EDIT: Conclusion is, the error was on device C because I didn’t tinker anymore with A and B to solve this. The database reset on C did help, as far, as you don’t do restarts manually during “Syncthing is under heavy scanning load”.

@imsodin I don’t know if this is in generic of any help, but I’ve upgraded the cluster to v1.9.0-rc.1 and I observed “something positive”.

The cluster was looking fully in sync and up2date before the upgrade on v1.8.0. File counts were also ok on the ST web UI.

After upgrading to v1.9.0-rc.1, all nodes presented me a list of recent changes on the web UI “out of a sudden”. Every list includes a handful (~ 27) deleted files that physically no longer existed in the shared folders since weeks ago in the shared folders. The cluster somehow still kept them in the database “by error”.

Every Web UI shows a list of deleted items looking like this, but different items. So whatever was wrong “in the past” or in “old versions”, it seems fixed now :slight_smile: . Comparing file counts on the web UI again after upgrade, they all match. Everything is told “up2date” now and I don’t see any more problems. The log also shows no errors, just the normal upgrade messages like “performing db upgrade - it will take while” and so on.

Good work!

Letzte Änderungen
Gerät	Aktion	Typ	Ordner	Pfad	Zeit
device	deleted	file	Installationsquellen	Image-W10/SoftwarePackage/7	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/SoftwarePackage/2020-03-27_Win10_SysPrep_Software - Kopie.zip	2020-08-19 09:57:50
device	deleted	dir		Installationsquellen	Image-W10/DriverPackage/Neuer Ordner	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/Drv-Upd-1903/2	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/Drv-Patches-1903/Intel-Network-DrvUpd_v12.18.9.10.msi	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/Drv-Patches-1903/2	2020-08-19 09:57:50
device	deleted	dir		Installationsquellen	Image-W10/DriverPackage/Drv-Patches-1903	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/6	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/5	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/3	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2020-07-08_Win10_SysPrep_Drivers_vhdx.rar	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2020-07-06_Win10_SysPrep_Drivers.zip	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2020-07-06_Win10_SysPrep_Drivers.rar	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2019-08-19_Win10_SysPrep_Drivers.zip	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2019-07-22_Win10_SysPrep_Drivers.zip	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2018-10-26_Win10_SysPrep_Drivers.zip	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Image-W10/DriverPackage/2017-08-24_Win10_SysPrep_Drivers.zip	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/6	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/2017-04-25_Samsung_NVME_Legacy_W7-8_Driver_2.1.0.1611_w10_amd64_2016/secnvme.sys	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/2017-04-25_Samsung_NVME_Legacy_W7-8_Driver_2.1.0.1611_w10_amd64_2016/secnvme.inf	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/2017-04-25_Samsung_NVME_Legacy_W7-8_Driver_2.1.0.1611_w10_amd64_2016/secnvme.cat	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/2017-04-25_Samsung_NVME_Legacy_W7-8_Driver_2.1.0.1611_w10_amd64_2016.zip	2020-08-19 09:57:50
device	deleted	dir		Installationsquellen	Win10CriticalDDrv/2017-04-25_Samsung_NVME_Legacy_W7-8_Driver_2.1.0.1611_w10_amd64_2016	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/2017-04-25_Intel_iaNVMe_Win10_1.5.0.1002.zip	2020-08-19 09:57:50
device	deleted	file	Installationsquellen	Win10CriticalDDrv/2017-04-25_Intel_iaNVMe_Win10_1.5.0.1002.exe	2020-08-19 09:57:50

I’d attribute that to fixing Case-only renames break stuff · Issue #1787 · syncthing/syncthing · GitHub. The github count nowadays is approach 7000 - this has been a long time coming :slight_smile:

@imsodin I didn’t expect that to be the problem because all of the folders synced are “managed” by me and I was aware of this issue thus never doing case-only renames from the very beginning.

But: Something strange is happening. I’m not experiencing a negative impact, but two of the three nodes emit “deleted file” events on every restart of Syncthing to the “recent changes” UI. It’s strange because those files physically weren’t touched on disk and there’s also no syncing activity. Seems like a database cleanup routine doing those reports over and over again.

According to the view of the cluster presented above:

  • devA “the sender” doesn’t show any recent changes on the Syncthing UI. Restarting ST or hitting “rescan all” doesn’t show recent changes.
  • devB + devC “the receivers” show a constant list of recent changes (deletes) that are “phantom”. This started after my first manual restart of ST 1.9.0-rc.1 (after the upgrade).

We should maybe snapshot the DB and exchange it?

Here’s how the symptom looks like on devB: On devB:

Gerät	Aktion	Typ	Ordner	Pfad	Zeit
devA	deleted	file	Dokumentation	Witzig/Explorer_Oli_734_PB.jpg	2020-08-19 10:28:56

After restart of devB:

Gerät	Aktion	Typ	Ordner	Pfad	Zeit
devA	deleted	file	Dokumentation	Witzig/Explorer_Oli_734_PB.jpg	2020-08-19 10:39:37

So whenever you restart devA, those phantom deletions show up in the recent changes on the other devices, correct?

Anything noteworthy in the logs on devA? If not, could you restart again with STTRACE=model,db,scanner set please. I suspect the db snapshot might not show that much here, the updated mod time points at a scanning/deletion detection issue.

@imsodin devA shows an empty “recent changes” list on its web UI. devB and devC show the deletions. And if I restart devB (or devC), they show up the same deletions again. I’ll try the traces tomorrow and report back if I find anything useful in them.

@imsodin: I’ve observed more details.

a) It is sufficient to hit “Rescan All” on the affected devices to see the timestamps of the “recent changes / deleted” items jump to current. Not neccessary to fully restart Syncthing to reproduce this.

b) The list of “recent changes / deleted” items does NOT change in regard to the files affected. I first thought this but the cause I got it wrong was the list only shows the last X changes and I have a lot of files affected by this.

c) I shut down Syncthing on devA and devB (during “all up2date” state of course). If I now restart devC (or hit “rescan all”), I can still see the timestamps of the affected files jump up in the recent changes dialog.

d) I’ve turned logging on by “export STTRACE=model,db,scanner” on devB and devC. I can’t find any errors in it. It lists all file names it encounters during scan (sensitive information, can provide via PM if needed). It does NOT have lines about the “deleted phantom files” (see post above for an example). This fits into the picture because the files are (correctly) not on disk. No node has them. The “deleted phantom” file names aren’t in the log if I look through it with CTRL+F.

How to go on from here? Why does Syncthing not remove those files finally from the database if they are relics of the past and have been correctly deleted long ago?

UPDATE: e) Updated the cluster to v1.9.0-rc3. The situation didn’t change. The problem is still there.

If there’s nothing in the db log, it means those files are somehow sent unchanged as updates to the db. I did changes to how existing items in the db are checked for deleted and ignored ones, so that’s a suspect. Unfortunately debug logging is (too) sparse there. I opened a PR with more debug logging, it would be great if you could run that and report back if something about those files turns up in there (scanner debug facility isn’t actually needed): https://github.com/syncthing/syncthing/pull/6918

Deleted items don’t get removed from db, that’s not a problem.