Folder Takes A Long Time To Stop Being 'Unknown'

Moisie · June 16, 2020, 8:35am

Hello all:

Not a biggie here - but I’ve got an installation (on very fast, capable hardware) where one of the folders takes a long time to change from the initial Unknown state in the GUI after starting Syncthing.

By a long time, I’m talking ~20 mins or so; granted - it’s a large folder (1.5TB) with a lot of changed file history, but other folders (up to 300GB in size) on the same system load in just a couple of seconds.

I’ve run stindex over the database, and it reports Missing need entry for needed file… for 51000 items; 99.5% of the items are in my custom-defined Versions Path (also defined in the Ignore Patterns).

This is in Syncthing v1.6.1 for macOS.

Is this expected performance, or would additional diagnostics be helpful?

Many thanks as always for a wonderful tool.

imsodin · June 16, 2020, 9:51am

I wouldn’t expect that long times anyway, but it would take longer if it recalculates metadata: Then you’d have an info log line mentioning “metadata” and “recalculating” and that folder ID. Once that is done, there should be a message “Ready to synchronize” with the folder ID. Please check for those just so we now if that’s the culprit or if we are looking at something else for the culprit.

Next thing I’d do is look at resource usage: Is there lots of storage IO (or cpu io wait) or cpu usage (or ram usage and swapping, though that would be unexpected)? A cpu profile might also shed light on what’s going on: Profiling — Syncthing documentation

I believe detecting needs in including ignores is a known limitation of the index checking tool.

Moisie · June 16, 2020, 10:20am

Hi Simon:

Thanks for the quick thoughts. I’ve just restarted Syncthing and collected that information; so far Syncthing has been running for 10 minutes, and there are two folders still on Unknown.

Yes - I’m seeing exactly that:

2020-06-16 03:10:04 Stored folder metadata for "y9qce-uafde" is 442306h10m4.252789s old; recalculating

I don’t know if the ‘old’ time is a red herring, but I restarted this instance yesterday whilst looking at this issue; it certainly wasn’t 50 years ago…

CPU usage is about 60% of one core, with plenty of other cores free. RAM and I/O appears not to be heavily used.

Here you go:

syncthing-cpu-darwin-amd64-v1.6.1-031500.pprof (10.1 KB)

Ah - ok. Thanks for the insight!

imsodin · June 16, 2020, 10:53am

That means an inconsistency has been detected before. Which obviously isn’t what it says, I’ve just been too lazy to fix it until now: lib/db: Improve error message on meta inconsistency by imsodin · Pull Request #6751 · syncthing/syncthing · GitHub

The profile shows that it’s checking globals, spending all it’s time in the db layer. Maybe there’s something wrong with your db storage, but maybe it’s just that slow. As it’s not maxing out cpu (one core), I’d actually expect IO wait time being a culprit.
More technically: What is taking up the time is looking up all the entries in all global version lists individually. That could probably be sped up a lot by using one iterator per device instead of doing lots of Get calls. However that would make the code much more complicated, when instead using Audrius words:

Moisie · June 16, 2020, 11:05am

Hi Simon:

Thanks for looking into it; if that’s the way it is, then that’s fine - but…

This system is a 2018 MacBook Pro, with the DB sitting on its crazyfast internal SSD. I doubt there are many DB storage devices much faster than this.

Lol! However, wouldn’t (in theory) an inconsistency have been detected and then corrected? (Not that I know when the metadata gets stored, or whether its timestamp is correctly applied - just wandering blindly around with a little knowledge… )

imsodin · June 16, 2020, 11:18am

Actually I just checked and for a folder with 180k items (that should be the limiting factor here, not size) it took my not so beefy laptop a bit less than a minute to do metadata recalc. So if you have lots of files in your folders, 20min may just be what it takes currently (which admittedly isn’t very nice).

To correct Syncthing needs to do exactly what it does in those 20min. As it’s “only metadata” it doesn’t immediately do that and keeps working, just sets the creation timer to 0, with the effect that on next (folder) restart it recalculates metadata.

Moisie · June 16, 2020, 11:57am

Hmmm - not such a different size from what I have here then:

I don’t know if it means anything, but there will have been a lot of data churn in this folder - would that explain the difference?

uok · June 16, 2020, 11:59am

is all the data on a SSD? or just the database?

Moisie · June 16, 2020, 12:01pm

All the data is SSD based - the DB on the MacBook Pro’s internal one, and Shared Folder on an external USB3.0/3.1/3.1Gen2 (can’t remember which, but minimum 5Gbps) fast SSD.

uok · June 16, 2020, 12:03pm

ok, because I noticed on my (much slower) device, that scanning takes a lot longer (~3x) if there are transfers at the same time like in your example.

Moisie · June 16, 2020, 12:07pm

Thanks - but in this case, I’m talking about the initial startup time for the folder, before its even had a chance to scan or sync.

(And, incidentally, it looks like the system is blind to incoming index updates during this time - I see lots of Index for nonexistent folder "*****-*****" messages in the log before the folder has started.)

AudriusButkevicius · June 16, 2020, 12:20pm

Uuuu, you might have hit the jackpot here for finding one of the bugs.

Moisie · June 16, 2020, 12:21pm

imsodin · June 16, 2020, 12:27pm

The cpu profile says it’s 96% metadata checking. However I wouldn’t be surprised if all that other activity has an impact on the db layer, slowing things down (lock/channel contention or whatnot). That might also explain why you don’t max out the cpu. However this is all pure speculation.

calmh · June 16, 2020, 1:40pm

Promising as that sounds, it looks like it correctly drops the connection when that happens so it shouldn’t result in lost index updates I think…

imsodin · June 16, 2020, 1:42pm

~~Are you guys talking about something concrete other than “something is wrong”? I don’t see it and I feel like I am missing the obvious here based on your statements~~

I was missing the obvious, I am seeing it too now.

Moisie · June 16, 2020, 1:45pm

Sorry - the joys of having half an idea of what’s going on. You’re right - here’s one example from the logs:

2020-06-16 03:10:18 Index for nonexistent folder "abcde-fghij"

2020-06-16 03:10:18 Connection to M45F6VX-*******-*******-*******-*******-*******-*******-******* at 192.168.1.57:22000-37.157.aaa.bbb:46013/tcp-server/TLS1.3-TLS_AES_128_GCM_SHA256 closed: receiver error: abcde-fghij: no such folder

Moisie · June 18, 2020, 2:21pm

Hi

Not chasing a resolution, but just checking you peeps don’t need anything more from me at this stage?..

Thanks!

imsodin · June 18, 2020, 3:08pm

Unfortunately no from my side - as in I don’t know what to look for. The 20min are a bit weird, but not that outlandishly long compared to my 1min given you have tons of activity that might contend with the db check.

Moisie · June 18, 2020, 4:47pm