Folder Takes A Long Time To Stop Being 'Unknown'

Hello all:

Not a biggie here - but I’ve got an installation (on very fast, capable hardware) where one of the folders takes a long time to change from the initial Unknown state in the GUI after starting Syncthing.

By a long time, I’m talking ~20 mins or so; granted - it’s a large folder (1.5TB) with a lot of changed file history, but other folders (up to 300GB in size) on the same system load in just a couple of seconds.

I’ve run stindex over the database, and it reports Missing need entry for needed file… for 51000 items; 99.5% of the items are in my custom-defined Versions Path (also defined in the Ignore Patterns).

This is in Syncthing v1.6.1 for macOS.

Is this expected performance, or would additional diagnostics be helpful?

Many thanks as always for a wonderful tool.

I wouldn’t expect that long times anyway, but it would take longer if it recalculates metadata: Then you’d have an info log line mentioning “metadata” and “recalculating” and that folder ID. Once that is done, there should be a message “Ready to synchronize” with the folder ID. Please check for those just so we now if that’s the culprit or if we are looking at something else for the culprit.

Next thing I’d do is look at resource usage: Is there lots of storage IO (or cpu io wait) or cpu usage (or ram usage and swapping, though that would be unexpected)? A cpu profile might also shed light on what’s going on: https://docs.syncthing.net/users/profiling.html

I believe detecting needs in including ignores is a known limitation of the index checking tool.

Hi Simon:

Thanks for the quick thoughts. I’ve just restarted Syncthing and collected that information; so far Syncthing has been running for 10 minutes, and there are two folders still on Unknown.

Yes - I’m seeing exactly that:

2020-06-16 03:10:04 Stored folder metadata for "y9qce-uafde" is 442306h10m4.252789s old; recalculating

I don’t know if the ‘old’ time is a red herring, but I restarted this instance yesterday whilst looking at this issue; it certainly wasn’t 50 years ago…

CPU usage is about 60% of one core, with plenty of other cores free. RAM and I/O appears not to be heavily used.

Here you go:

syncthing-cpu-darwin-amd64-v1.6.1-031500.pprof (10.1 KB)

Ah - ok. Thanks for the insight!

That means an inconsistency has been detected before. Which obviously isn’t what it says, I’ve just been too lazy to fix it until now: https://github.com/syncthing/syncthing/pull/6751

The profile shows that it’s checking globals, spending all it’s time in the db layer. Maybe there’s something wrong with your db storage, but maybe it’s just that slow. As it’s not maxing out cpu (one core), I’d actually expect IO wait time being a culprit.
More technically: What is taking up the time is looking up all the entries in all global version lists individually. That could probably be sped up a lot by using one iterator per device instead of doing lots of Get calls. However that would make the code much more complicated, when instead using Audrius words:

Hi Simon:

Thanks for looking into it; if that’s the way it is, then that’s fine - but…

This system is a 2018 MacBook Pro, with the DB sitting on its crazyfast internal SSD. I doubt there are many DB storage devices much faster than this.

Lol! However, wouldn’t (in theory) an inconsistency have been detected and then corrected? (Not that I know when the metadata gets stored, or whether its timestamp is correctly applied - just wandering blindly around with a little knowledge… :wink: )

Actually I just checked and for a folder with 180k items (that should be the limiting factor here, not size) it took my not so beefy laptop a bit less than a minute to do metadata recalc. So if you have lots of files in your folders, 20min may just be what it takes currently (which admittedly isn’t very nice).

To correct Syncthing needs to do exactly what it does in those 20min. As it’s “only metadata” it doesn’t immediately do that and keeps working, just sets the creation timer to 0, with the effect that on next (folder) restart it recalculates metadata.

Hmmm - not such a different size from what I have here then:

I don’t know if it means anything, but there will have been a lot of data churn in this folder - would that explain the difference?

is all the data on a SSD? or just the database?

All the data is SSD based - the DB on the MacBook Pro’s internal one, and Shared Folder on an external USB3.0/3.1/3.1Gen2 (can’t remember which, but minimum 5Gbps) fast SSD.

ok, because I noticed on my (much slower) device, that scanning takes a lot longer (~3x) if there are transfers at the same time like in your example.

Thanks - but in this case, I’m talking about the initial startup time for the folder, before its even had a chance to scan or sync.

(And, incidentally, it looks like the system is blind to incoming index updates during this time - I see lots of Index for nonexistent folder "*****-*****" messages in the log before the folder has started.)

Uuuu, you might have hit the jackpot here for finding one of the bugs.

:partying_face: :partying_face: :partying_face:

The cpu profile says it’s 96% metadata checking. However I wouldn’t be surprised if all that other activity has an impact on the db layer, slowing things down (lock/channel contention or whatnot). That might also explain why you don’t max out the cpu. However this is all pure speculation.

Promising as that sounds, it looks like it correctly drops the connection when that happens so it shouldn’t result in lost index updates I think…

Are you guys talking about something concrete other than “something is wrong”? I don’t see it and I feel like I am missing the obvious here based on your statements :smiley:

I was missing the obvious, I am seeing it too now.

Sorry - the joys of having half an idea of what’s going on. You’re right - here’s one example from the logs:

2020-06-16 03:10:18 Index for nonexistent folder "abcde-fghij"

2020-06-16 03:10:18 Connection to M45F6VX-*******-*******-*******-*******-*******-*******-******* at 192.168.1.57:22000-37.157.aaa.bbb:46013/tcp-server/TLS1.3-TLS_AES_128_GCM_SHA256 closed: receiver error: abcde-fghij: no such folder

Hi

Not chasing a resolution, but just checking you peeps don’t need anything more from me at this stage?..

Thanks!

Unfortunately no from my side - as in I don’t know what to look for. The 20min are a bit weird, but not that outlandishly long compared to my 1min given you have tons of activity that might contend with the db check.

:+1: