Many thanks for the helpful reply - much appreciated.
I don’t know there are any such devices that would hold these files in this cluster… but I will continue to poke about on all the machines…
Not sure about that option yet - will investigate. For the record, I just checked the API on the machine with the screenshots above for info on one of the Failed Items - it returned "availability": null - presumably that means it doesn’t know of any machine in the cluster with the file?
I’ve tried doing this, but it’s a little complicated because it’s on a Synology NAS, and it appears you can no longer run a login session as the user account which runs Syncthing. I’ve tried just modding the launch script temporarily - but I can’t see anything obvious that it’s happened…
Is there any indication in the log file that --reset-deltas has been applied?
Yes there is - [ABCDE] 2022/11/14 08:51:51 INFO: Reinitializing delta index IDs…
Instead, I ran the command as root, and then chowned the log file and database files back to sc-syncthing:syncthing afterwards.
So, now having reset-deltas and let the folder re-scan, I’m still faced with much the same situation:
The folder now spends ~10 seconds Preparing to Sync, then briefly flips to Syncing, then immediately back to Preparing to Sync
I still see that Local Items + Failed Items != Global Items
In the log, every ~10 seconds, I see:
[ABCDE] 2022/11/14 10:52:13 INFO: Puller (folder "Folder Name Here" (y9qce-abcde), item "Folder 1/Folder 2/Folder 3/Folder 4/Folder 5/File 1"): syncing: no connected device has the required version of this file
{...repeated many times for different files...}
[ABCDE] 2022/11/14 10:52:13 INFO: "Folder Name Here" (y9qce-abcde): Failed to sync 525 items
[ABCDE] 2022/11/14 10:52:13 INFO: Folder "Folder Name Here" (y9qce-abcde) isn't making sync progress - retrying in 1m11s.
Your screenshot by itself is nothing strange, so I’m assuming you’re referring to the files failing due to “no connected device has that version”, which is the problem you should troubleshoot. You could show output (syncthing cli debug file as mentioned above) for such a file if you want possible hints. I don’t think any database gymnastics will change that situation, in the long term, since items that are needed but not available have per definition been announced by some other device.
Well, in the top output the (global) file exists, was last changed by ZLTQRJ2, and is available from DDR3EZU. In the bottom output, the file has been deleted, was never changed by ZLTQRJ2, and DDR3EZU is nowhere to be seen. These two variants of the file appear entirely disconnected from each other, sharing no history. Perhaps you are running a network with devices not talking to each other and files being modified in multiple places?
What Jakob said. And in reference to the earlier suggestion of removing (and if appropriate re-adding) devices to see if that clears the problem: You’d need to do that with DDR3EZU. However that only works if you haven’t been connected to it in a long time or that device is otherwise “broken” (has a skewed view of the global state), if everything is connected in principle as you say, it looks like you have a more complicated connectivity/sync problem as suggested by Jakob.
Hmmm - that’s certainly not the intention. All devices in the network are connected to many other devices in the network, with all of them connecting to both these nodes.
(With apologies for not following your suggestions earlier!) I’ve dug into this a bit further, and it’s possible that DDR3EZU has a skewed view of the world: this is an older machine which, in theory, has been replaced - but I see that it’s still been connecting to the network… I shall investigate!
Many thanks for your expert input - very much appreciated!
I’ve removed DDR3EZU from the system which wasn’t able to get into sync - and it’s now looking much much happier - thanks for the pointers!
I’ll remove at source as soon as I’m able to - looking at ways to thoroughly ensure it’s expunged from the network (and not lurking through some device which I can’t get access to immediately), I’m thinking:
Remove all Shared Folders from DDR3EZU, whilst it’s connected to the network;
Allow it some time to send its removal updates(?);
Then completely remove the Syncthing installation from this device;
Then delete it from all other devices in the network that I can get my hands on.
The desired result is DDR3EZU is gone from the point of view of Syncthing, correct? Then just remove that device from all other device and uninstall Syncthing on DDR3EZU - order doesn’t matter. I am probably missing something here.
That’s right - it’s just that I won’t be able to access all nodes quickly to remove it and get to a stable state - so I was thinking that if I can remove the Shared Folders from DDR3EZU itself, then it will effectively remove itself from all nodes - unless I’m misunderstanding…
Ah - just managed to get access to DDR3EZU and started deleting folders. It looks like nothing is transmitted to the network to indicate this device is no longer syncing these folders, so I guess there’s no point in pre-emptively deleting these folders… I’ll do so anyway, as I guess it can’t harm!
The other devices should show some indication that DDR3EZU is not sharing the folder anymore. Can’t remember if it will also drop all information from it or not (likely not, only once it’s removed/unshared locally).
Unfortunately I can’t get to any of those other devices at the moment - but they won’t be able to get to DDR3EZU any more - I’ve completely wiped its config.
So - and sorry to keep going on, but there are a few variations on a theme going on here: back to the zombie folders reappearing:
I’m one of my nodes (BJ6O5J). I’ve paused all Remote Devices except one (7FC5MI) - though that node is itself connected to many other nodes in the network.
I delete a subfolder which is misbehaving, and then allow the Shared Folder to rescan - at which point the subfolder reappears.
The subfolder only contains other subfolders - no files contained inside here.
The Shared Folder status is Up to Date; the Ignore Patterns are just my standard set which I use across the entire cluster.
So I’ve been continuing the deletion of DDR3EZU from devices in the cluster - and on each device I remove it from, the Failed Items for one of my folders goes away. So far so good.
I haven’t yet managed to delete this device from all my nodes - so I’m guessing its presence is still causing these zombie subfolders to keep resurrecting themselves.
Hmmm - how would I get to see who is the root cause of re-introducing the deletions? I can’t see anything in the UI to indicate this, and looking at the logs (with model enabled), I see about 14 index updates come in after deleting the subfolder. I know that I’ve removed DDR3EZU from 95% of these devices - I’m guessing I’m going to need to do 100% of them before I achieve success?..
The same syncthing cli debug file ... command. Ideally when deleted and then once resurrected to get a clean diff, but just afterwards should work too (available devices will show it to some extent, version vector is more precise who the original origin was but not always easy to decipher).
Well - I’m delirious to report that the zombie apocalypse seems to have abated now!
I found another node which had DDR3EZU still configured as a Remote Device - so I removed that - and another node which hadn’t been online for a while - which I brought online.
Since attending to these, my canaries have remained deleted for three days now - woohoo! I’m hopeful this is now resolved - but I’m continuing to keep an eye on it, just in case other nodes coming online reintroduce the problems.
Many thanks - as always - to @imsodin and @calmh for their invaluable advice and support.