v1.21.0-rc.1 scanning is slow

Andy · August 11, 2022, 8:19am

Only as a hint.

I have the version v1.21.0-rc.1 in use and it works so far on all my devices. On a Windows computer I have noticed that scanning is slow, on one folder the scan was not completed. I went back to the v1.20.4, then everything was fine again.

calmh · August 11, 2022, 8:37am

Could be the owner lookups which are slow on Windows for some reason. Perhaps some caching can be added. I’ll look into it…

tomasz86 · August 11, 2022, 9:47am

For the record, I’ve just done a simple benchmark, where I scanned the same folder for the first time.

Hardware: Ryzen 4350G, 32GB RAM, NVMe SSD
OS: Windows 10 (antivirus disabled)
Folder: 4,844 files, 551 dirs, 51.9 GB

Syncthing 1.20.4: 1 minute 30 seconds
Syncthing 1.21.0: 6 minutes 40 seconds

The performance impact seems indeed massive .

imsodin · August 11, 2022, 10:23am

Given it’s an opt-in feature, shouldn’t we skip it on scanning unless opted in?

calmh · August 11, 2022, 12:39pm

Possibly… I was thinking to do that at first, but thought that reading and writing the information are separate steps. That is, one might want to send owner information without having to apply owner information from others. Perhaps it needs to be a separate toggle, then (sendOwnership and syncOwnership, where the latter implies the former but not the other way around…).

In either case it’s better if it isn’t dog slow. I added a cache to see what happens: lib/fs: Cache user lookups by calmh · Pull Request #8496 · syncthing/syncthing · GitHub

That makes scanning faster for me, but it’s not an enormous difference on my systems. Probably it depends on how slow the user lookup is, perhaps this is different in a Windows setup with an actual domain etc. There’s also additional constant overhead in opening the file and reading the security descriptor which isn’t cacheable. (But would be turn-offable if we added a switch for it…)

imsodin · August 11, 2022, 12:49pm

I’d prefer that behaviour by default: I don’t think a niche feature like this should have an impact on all other users. We already have enough users complaining that a simple scan takes a long time, adding a user/group lookup will make that even more common.

calmh · August 11, 2022, 12:50pm

Fair enough

imsodin · August 11, 2022, 12:51pm

However I just realized (well got informed by Jakob) that I didn’t really understand what this is about:

[…] it caches the account name that belongs to a UID/SID. […]

So with that out of the way the uid/gid bit may not be that relevant in practice after all.

calmh · August 11, 2022, 1:04pm

So what actually happens is that the scanner a) gets the owner info for the file, which is numeric or a SID, and b) looks up that ID to an account name. Getting the ownership on Unix is free, we get UID and GID in the stat info we already have – looking up the user is not free and I can see it might be expensive if it’s over LDAP or something. Caching will help. Getting the ownership on Windows isn’t free, because it requires opening the file and reading the security descriptor, so roughly in the magnitude of one more stat I think. Then there is the corresponding lookup there, which also might be expensive (but can be cached).

But I will make a flag to disable it regardless.

tomasz86 · August 11, 2022, 1:27pm

I’ve tested the PR. It took 4 minutes 25 seconds to scan the folder this time.

In sum:

Syncthing 1.20.4: 1 minute 30 seconds
Syncthing 1.21.0: 6 minutes 40 seconds
Syncthing 1.21.0 (with https://github.com/syncthing/syncthing/pull/8496): 4 minutes 25 seconds

Andy · August 11, 2022, 1:45pm

With my other devices, such as Synologys and my own Windows computers, I have not observed any abnormalities so far, but I am sensitized now. The computer I mentioned above is integrated into a company domain with a correspondingly more complex rights structure, so maybe there is a connection when I read the last posts.