I doubt it’s a Synctrayzor thing, though it could be that it just takes a long time to load folder data when there’s a lot of them…?
It does sometimes take a bit to load folder data, but in that case, the GUI was stalled like that for 1.5 hours or so, and also Syncthing’s RAM usage dropped to under 100 MBs, with almost no visible activity. It was only after killing the process and restarting that it was able to start working normally again.
I don’t think I had ever seen the same behaviour before v2. Also, definitely not related to SyncTrayzor, as this is just the bare syncthing.exe
binary running on the device.
Goroutine trace when that happens can narrow it down, from support bundle if it’s alive enough otherwise directly from the built in profiler (which needs to be enabled beforehand with an environment variable in that case)
Is the goroutine dump attached to https://forum.syncthing.net/t/syncthing-2-0-0-rc-16/24375/25 not enough? As explained before, profiler was enabled but it wasn’t responding. The URL was just stuck loading with no results. In other words, http://localhost:8384/rest/debug/cpuprof
and http://localhost:8384/rest/debug/heapprof
were unresponsive, with only http://localhost:9090/debug/pprof
working (from where I got the dump).
Actually it’s probably fine, I had overlooked it entirely. Let me take a look and see if there’s something in there.
Yeah it’s a nice little deadlock… It’s not v2 as such though, it’s between opening and closing connections at the same time, presumably because something is resuming or connections are switching. I think I’ve seen this one before.
OK, I was premature in saying all is well with rc.17.
Screenshot of Laptop A and RPi4 showing all files synced. And Laptop B show 98 items out-of-sync along expanded modal showing items out-of-sync. All are 0 byte deleted restic cache files.
The three devices are connected in mesh mode with only the RPi4 being receive encrypted.
I’m happy to provide more info if needed.
Expand the folder on both sides for the counts? (LenovoX1 & RPi4)
I have replaced the affected receive-encrypted folder on the RPi4 and the issue reappeared. The three devices are showing the same file and folder count but this time 20 items are showing out-of-sync 95% 0b on the LenovoX1. Both the RPi4 and LenovoX1 were showing as synced until I brought the Zbook (laptop A) back online and the issue appeared again immediately.
Fascinating; I’ve been trying to reproduce in a similar triangle setup with one device encrypted, but it doesn’t happen for me. Given it’s a small amount of files and I think they’re already non-human-interesting in file names, would you mind sending over the database for the folder in question from LenovoX1?
If you run syncthing debug database-statistics
on LenovoX1 it’ll give you an output of all the database files and some stats on each, but primarily it shows you the mapping from folder ID to database file:
jb@jbo2:~ % syncthing debug database-statistics
DATABASE FOLDER ID TABLE SIZE FILL
======== ====== == ===== ==== ====
main.db - folders 4 KiB 8.4 %
main.db - folders_database_name 4 KiB 6.0 %
...
folder.0001-7nqk2rsx.db 7ojpy-cfjhl blocklists 8888 KiB 93.6 %
folder.0001-7nqk2rsx.db 7ojpy-cfjhl blocks 16176 KiB 96.1 %
folder.0001-7nqk2rsx.db 7ojpy-cfjhl counts 4 KiB 4.7 %
...
folder.0007-txpxsvyd.db w3ejt-fn4dm sqlite_stat4 4 KiB 0.2 %
folder.0007-txpxsvyd.db w3ejt-fn4dm (total) 1906020 KiB 92.8 %
main.db + children - (total) 2206420 KiB 92.0 %
I’m interested in the folder.whatever.db file corresponding to Techie on LenovoX1… In the index-v2 directory in the Syncthing data directory. (And maybe the same from Zbook for extra credit, but if it’s a hassle – primarily from LenovoX1 who is seeing the difference.)
Thanks, I tried but I get this error:
russ@lenovoX1:~$ syncthing debug database-statistics
syncthing: error: statistics: tablestats: no such table: dbstat
I’ll try Zbook later when I have access. Later: Zbook returns the same error
Edit: What is also interesting, with each restic two-hourly backup files/folders in the restic-cache (a sub-folder of Techie) get created and destroyed but the out-of-sync items stay the same as my earlier screenshot on LenovoX1 (ie 20) and actual files/folders in the cache folder are updated as expected.
Ah, I see, the Debian build is broken and doesn’t have the right sqlite extension And yeah, the whole thing seems very odd indeed. You can also get the folder to database mapping by starting once with
STTRACE=sqlite
if you like.
Or upgrade to rc18 which should fix the db stats on Debian
Updated to rc.18 and syncthing debug database-statistics
gives the results expected.
But the folder Techie also has secrets, such as rclone.config, as well as the restic-cache sub-folders. I’ll move the cache into a separate folder and re-establish the setup as before, hopefully the same issue will reappear after the first connection with LenovoX1 (as I mentioned, this seems to be the problematic time as subsequent changes don’t seem to increase the number of 0b out-of-sync errors).
Doing this might take me a while, I’m at UTC+10, I might not get this done today (Friday) but will post the results and database asap.
Should we change the default for numConnections
in Syncthing v2?
Citing the docs:
A zero value means to use the Syncthing default. As of version 1.25.0 the default is to use one connection, like earlier versions of Syncthing. This may change in the future.
Maybe the future is now?
Maybe it is
I have moved the restic-cache folder from superior folder Techie to another location, simply called restic-cache. I have deleted and re-established connections to all three devices for Techie, after deleteing the actual files on LenovoX1 and RPi4, and connected the new folder location to all three for the re-located folder restic-cache.
Bingo! everything works as Syncthing expects, even after several cycles of two-hourly restic backups creating new files, deleting other, and leaving empty folders in their wake in the cache. All three devices are showing up-to-date with equal local and global states.
I’ll leave this static for a few days to see if syncing continues successfully before attempting to move restic-cache folder back under Techie.
I’m suspicious about how earlier the first sync appeared successful for two devices until the third device, LenovoX1, enter the mesh and resulted in 95% 0b out-of-sync. This happened three times as described above.
Ok, it has happened again (95% 0b out-of-sync on LenovoX1). All six files are from the same (now separate) folder, restic-cache. Files/folder count for local/global match, 65/222.
The database files for each device have been captured with syncthing debug database-statistics
and are attached.
sqildb-folder-restic-cache_RPi4.tar.gz (1.6 MB)
sqldb-folder-restic-cache_Zbook.tar.gz (538.6 KB)
sqldb-folder-restic-cache_LenovoX1.tar.gz (679.5 KB)
And out-of-sync items from LenovoX1 attached.
All those files are in there, in sync and not deleted. Not sure what to make of it. It’s like the database is from a situation prior to the one in the screenshot.
Would it be easier to track down with a hub and spoke arrangement?
Zbook ↔ RPi4 ↔ LenovoX1
BTW, the cached files/folders change with each two-hourly restic backup cycle, and that is reflected on each device but the out-of-sync number remains the same.
Is there a way to purge the database (–reset-deltas, --reset-database)?