380k files, 24GB in directory results in very slow sync = 8 days

gadget · January 20, 2024, 10:25pm

My laptop with the ~111,000-file Maildir directory has a dual-core CPU (circa mid-2013) – calling it “fast” would be generous.

(And the file server at work that had the 1.8 million file Maildir directory has a quad-core CPU.)

rdslw · January 20, 2024, 10:27pm

good idea, testing it right now as ‘phase 5’.

FYI phase 4 (cryptsetup && COW OFF for DB) shown no improvement in Scanning phase (105 min), and no or small improvement in Syncthing. It might be that Syncthing is 50% faster (manual observation on webgui), but it still being slow ( below 100KB/s for sure)

rdslw · January 20, 2024, 10:34pm

bingo @bt90

enabling case sensitivity, which on my case sensitive FS is fine solved problem.

Scanning: 3 minutes (was 105) Syncing: around 15MB/s and moving fast.

I stopped it manually to have some files for experiments tomorrow, with slowly moving back to defaults all settings from today.

thank you @bt90 for being persistent with my problem, I appreciate it!

bt90 · January 21, 2024, 7:32am

I never thought that the case-insensitive handling would add that much overhead

bt90 · January 21, 2024, 7:37am

Would you mind capturing a CPU profile with the case-sensitive option still disabled?

https://docs.syncthing.net/users/profiling.html#capture-a-cpu-profile

rdslw · January 21, 2024, 5:58pm

so, here are some results && initial conclusions, probably some of you can add more.

this machine was CPU constrained during Scanning&Synching, because of (random order):

slow caseinsesitive comparison calls
long filenames of 380k files
slow cpu
only one core used
multiple os.reaadir calls during Scanning

enabling caseSensitiveFS (which is not default in syncthing), reduced Scanning time from 105 minutes to 3 minutes, and Syncing was more/less very fast.
after it, syncthing is still cpu constrained during Scanning, but time needed to complet Scaning (and later Syncing) is not only acceptable, but I would say fast, in regards to amount of data (380k files with long filenames (sic!) i.e. maildir) being processed.
syncthing still is using only one core (per folder) during Scanning, and my guess is that with caseSensitiveFS=true, now the os.readdir is main culprit. It may be, or may not be, due to the logic further optimized in syncthing, nevertheless: 3min is fast so I would say: not worth it.
some of configuration changes were good in general (setLowPriority=false, databaseTuning=large, Changes, copyRangeMethod), which helped to speed up Scanning from initial 160min to 105min. I’m keeping them.
the biggest single winner of course is caseSensitiveFS=true
I reverted to default disableFsync (being false). While it would provide some speedup (during Syncing) it’s not worth the risk.
many settings I tried had no effect (or to small to notice) on my machine, due to constraint being on CPU, not IO or due to the fact that only 1 core was used or due to the specific makeup of my data (maildir). I reverted them to defaults: disableTempIndexes=false copiers=0 hashers=0 maxConcurrentWrites=2 SendReceive fsWatcherEnabled=true

Attaching pprof file for the caseSensitiveFS=false scenario, here is quick peek for the impatient ones syncthing-cpu-linux-arm64-v1.27.2-180504.pprof (23.7 KB)

(pprof) top
Showing nodes accounting for 24.80s, 69.66% of 35.60s total
Dropped 262 nodes (cum <= 0.18s)
Showing top 10 nodes out of 73
      flat  flat%   sum%        cum   cum%
     4.74s 13.31% 13.31%      4.74s 13.31%  runtime/internal/syscall.Syscall6
     3.41s  9.58% 22.89%      5.17s 14.52%  strings.(*Builder).WriteRune
     3.29s  9.24% 32.13%      4.29s 12.05%  runtime.mapassign_faststr
     2.95s  8.29% 40.42%      3.46s  9.72%  runtime.findObject
     2.15s  6.04% 46.46%     15.69s 44.07%  github.com/syncthing/syncthing/lib/fs.UnicodeLowercaseNormalized
     2.02s  5.67% 52.13%      2.02s  5.67%  golang.org/x/text/unicode/norm.(*input).skipASCII (inline)
     1.85s  5.20% 57.33%      1.85s  5.20%  unicode.ToLower
     1.79s  5.03% 62.36%      1.79s  5.03%  unicode.ToUpper
     1.30s  3.65% 66.01%     29.61s 83.17%  github.com/syncthing/syncthing/lib/fs.newCaseNode
     1.30s  3.65% 69.66%      8.03s 22.56%  os.(*File).readdir

bt90 · January 21, 2024, 9:32pm

@rdslw thanks for the pprof and your detailed summary. Feel free to open a Github issue as there might be room for improvement (105min vs 3min!).

bt90 · January 23, 2024, 8:45am

@rdslw one last thing: could you also gather a heap profile?

https://docs.syncthing.net/users/profiling.html#capture-a-heap-profile

calmh · January 23, 2024, 10:13am

I’m fairly sure that the problem with case insensitivity is not the CPU or RAM used for the checks, but the repeated recursive listdir:s to figure out proper casing. Especially for large directories listdir can take a really long time (in computer terms). We could potentially do more aggressive (=longer) caching of the results.

bt90 · January 23, 2024, 11:10am

However, we could parallelise Unicode normalisation for very large directories:

Speeding this up would also avoid that entries expire too fast.

bt90 · January 23, 2024, 12:02pm

We could also add a fast path for ASCII only filenames which are quite common.

nfc(lower(upper(filename))) is equivalent to lower(filename) for them.

Quick draft: lib/fs: Add ASCII fastpath for normalization by bt90 · Pull Request #9365 · syncthing/syncthing · GitHub

sercxanto · January 28, 2024, 9:12am

I am really glad having found this thread, because it saved my weekend after hours of debuging.

My setup is similiar: syncthing running in a docker container on an unraid server with all the performance overhead caused by btrfs on luks (plus unraid’s shfs on top). Also the CPU is rather old.

The Maildir is 4 GB with 142526 files in 234 directories. Synchronisation takes forever (“days”). With caseSensitiveFS=true speed is also slow compared to other folders, but synchronization finishes in about 30 minutes at least.

My pprof shows a similar behaviour as the one @rdslw provided. I never thought that it could be related to string comparison. Thanks for sharing the workaround. You do a great job with syncthing!

Here are the pprof files for CPU and heap in case that it is interesting:

pprof.zip (685.7 KB)

The files in the folder “scanning” is taken during scanning phase, the files in the folder “synchronization” in the syncing phase. folders with a “2” contain the pprof with the setting caseSensitiveFS=true.

Out of curiosity: It seems that it is not only related to the performance impact of the string comparison logic in syncthing. Does the case-insensitive string comparison also trigger additional I/O?

I guess with a faster setup (CPU, disks, etc.) I would have not noticed anything.

bt90 · January 28, 2024, 9:30am

Already pointed out by @calmh

imsodin · January 28, 2024, 10:32am

Your profile is actually quite a bit more in line with expectations: The majority of time is taken up by syscalls, the actual string comparison/mangling isn’t taking up much time. Incidentally most time is spent stat-ing files outside of the case-sensitivity machinery. Maybe the profile just got (un)lucky timing with lots of cache hits? @rdslw 's profile is quite astounding in that respect, as most time is spent in string manipulation. And a big amount of that with memory management, including GC.

What might help with everything is different caching as mentined by calmh. Would be nice to add some (optional/debugging) metrics to measure how often we hit the cache or not for the same path. Worst case we are already doing a good job, and there are no cache misses on the same paths, but maybe there are and improving the cache retention mechanism (or simply extending lifetime) could help.

rdslw · January 28, 2024, 11:24am

Few things to add.

My system is musl based, so this my be the reason for slow string comparison if golang is using some sort of non-optimal multiple calls to musl lib ctypes/isupper/islower etc, while musl is slower in this regards to glibc.
Also: maildir filenames are typical 78 chars long (longer than typical filename), and are quite similar which may create a lot of collisions if hashmap/hashing/caching of them is used while not prepared for high degree of similarity with huge number.
Nevertheless I wouldn’t dismiss os.readdir/listdir calls by syncthing. How loop/algo uses them now might be a place for some optimization (cache etc) as there were a lot of them (too many?).

This specific directory of mine, if scanned now (with proper settings) takes:

180 seconds - on this musl, slow machine
30 seconds - on beefy x86 monster (glibc)

yes, there is a difference, but not so big.

system · February 27, 2024, 11:25am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.