380k files, 24GB in directory results in very slow sync = 8 days

Not sure about copiers, but hashers are only relevant when actually hashing files, which is the case only when Syncthing meets new or modified files in the scanning process. Otherwise, if the files haven’t changed, there is nothing to hash. Scanning on its own doesn’t necessarily include any hashing.

1 Like

Although Syncthing is impacted, and some tuning might help, it looks like an external issue…

btrfs directly on a NVMe has a much higher potential throughput than what I’ve got (mSATA SSD):

# hdparm -t /dev/sda

/dev/sda:
 Timing buffered disk reads: 1410 MB in  3.00 seconds = 469.32 MB/sec

One of my Maildir directories (3.6 GB) is only about 1/3 the file count, but all things being equal, the results should be comparable (no inline compression by btrfs or disk encryption):

time ls -1 ~/Mail/inbox | wc -l
111899

real	0m0.150s
user	0m0.106s
sys	0m0.047s

If I had 380,000 files it would take less than 1 second, so comparatively speaking 7 seconds for a directory listing means there’s significant overhead somewhere in your storage stack.

Because it’s email and your system appears to have CPU cycles to spare, if your Linux kernel supports it, enable inline compression in btrfs (use zstd and max it out at level 15). It’ll reduce the on-disk storage requirements by possibly up to 90% for emails without file attachments, but it’ll also greatly reduce the number of disk writes and speed up reads.

Also, since btrfs on top of LUKS, on top of a software RAID likely doesn’t benefit from checksums, consider disabling it to remove the extra disk writes.

(A few months ago I had a Maildir directory at work with ~1.8 million files totaling ~25GB – btrfs volume on a SSD – that was mirrored by Syncthing to a standby server. If it hadn’t been moved to a long-term archive I’d run some benchmarks for you.)

I appreciate your help @gadget !!

I already use zstd compression on this volume. Still as I see on the vmstat output, it shows where the bottleneck is: 1 core, and only 1 core is maxed. mostly by usermode code (syncthing), not kernel code (btrfs/luks/devices), and NO io constraint due to the low bi/bo numbers.

The problem is: syncthing is not concurrent in regard to one folder for scanning (scanner.walk) and (some parts of?) syncing phase during processing list of 380k entries in one directory.

Of course, having faster CPU would help it (and here lies reason why your configuration works fine), but as other 100k files is being processed ultra fast on this very machine, it shows that syncthing logic is also a bottleneck.

having ability to use all cores for these operations would speed it up by 400%. probably other, more complicated features would allow to gain more.

You could try to enable the case-sensitive FS flag in Syncthing.

1 Like

My laptop with the ~111,000-file Maildir directory has a dual-core CPU (circa mid-2013) – calling it “fast” would be generous. :grin:

(And the file server at work that had the 1.8 million file Maildir directory has a quad-core CPU.)

good idea, testing it right now as ‘phase 5’.

FYI phase 4 (cryptsetup && COW OFF for DB) shown no improvement in Scanning phase (105 min), and no or small improvement in Syncthing. It might be that Syncthing is 50% faster (manual observation on webgui), but it still being slow ( below 100KB/s for sure) :frowning:

bingo @bt90

enabling case sensitivity, which on my case sensitive FS is fine solved problem.

Scanning: 3 minutes (was 105) Syncing: around 15MB/s and moving fast.

I stopped it manually to have some files for experiments tomorrow, with slowly moving back to defaults all settings from today.

thank you @bt90 for being persistent with my problem, I appreciate it!

2 Likes

I never thought that the case-insensitive handling would add that much overhead :eyes:

Would you mind capturing a CPU profile with the case-sensitive option still disabled?

https://docs.syncthing.net/users/profiling.html#capture-a-cpu-profile

so, here are some results && initial conclusions, probably some of you can add more.

  1. this machine was CPU constrained during Scanning&Synching, because of (random order):
  • slow caseinsesitive comparison calls
  • long filenames of 380k files
  • slow cpu
  • only one core used
  • multiple os.reaadir calls during Scanning
  1. enabling caseSensitiveFS (which is not default in syncthing), reduced Scanning time from 105 minutes to 3 minutes, and Syncing was more/less very fast.

  2. after it, syncthing is still cpu constrained during Scanning, but time needed to complet Scaning (and later Syncing) is not only acceptable, but I would say fast, in regards to amount of data (380k files with long filenames (sic!) i.e. maildir) being processed.

  3. syncthing still is using only one core (per folder) during Scanning, and my guess is that with caseSensitiveFS=true, now the os.readdir is main culprit. It may be, or may not be, due to the logic further optimized in syncthing, nevertheless: 3min is fast so I would say: not worth it.

  4. some of configuration changes were good in general (setLowPriority=false, databaseTuning=large, Changes, copyRangeMethod), which helped to speed up Scanning from initial 160min to 105min. I’m keeping them.

  5. the biggest single winner of course is caseSensitiveFS=true

  6. I reverted to default disableFsync (being false). While it would provide some speedup (during Syncing) it’s not worth the risk.

  7. many settings I tried had no effect (or to small to notice) on my machine, due to constraint being on CPU, not IO or due to the fact that only 1 core was used or due to the specific makeup of my data (maildir). I reverted them to defaults: disableTempIndexes=false copiers=0 hashers=0 maxConcurrentWrites=2 SendReceive fsWatcherEnabled=true

Attaching pprof file for the caseSensitiveFS=false scenario, here is quick peek for the impatient ones syncthing-cpu-linux-arm64-v1.27.2-180504.pprof (23.7 KB) :slight_smile:

(pprof) top
Showing nodes accounting for 24.80s, 69.66% of 35.60s total
Dropped 262 nodes (cum <= 0.18s)
Showing top 10 nodes out of 73
      flat  flat%   sum%        cum   cum%
     4.74s 13.31% 13.31%      4.74s 13.31%  runtime/internal/syscall.Syscall6
     3.41s  9.58% 22.89%      5.17s 14.52%  strings.(*Builder).WriteRune
     3.29s  9.24% 32.13%      4.29s 12.05%  runtime.mapassign_faststr
     2.95s  8.29% 40.42%      3.46s  9.72%  runtime.findObject
     2.15s  6.04% 46.46%     15.69s 44.07%  github.com/syncthing/syncthing/lib/fs.UnicodeLowercaseNormalized
     2.02s  5.67% 52.13%      2.02s  5.67%  golang.org/x/text/unicode/norm.(*input).skipASCII (inline)
     1.85s  5.20% 57.33%      1.85s  5.20%  unicode.ToLower
     1.79s  5.03% 62.36%      1.79s  5.03%  unicode.ToUpper
     1.30s  3.65% 66.01%     29.61s 83.17%  github.com/syncthing/syncthing/lib/fs.newCaseNode
     1.30s  3.65% 69.66%      8.03s 22.56%  os.(*File).readdir
3 Likes

@rdslw thanks for the pprof and your detailed summary. Feel free to open a Github issue as there might be room for improvement (105min vs 3min!).

@rdslw one last thing: could you also gather a heap profile?

https://docs.syncthing.net/users/profiling.html#capture-a-heap-profile

I’m fairly sure that the problem with case insensitivity is not the CPU or RAM used for the checks, but the repeated recursive listdir:s to figure out proper casing. Especially for large directories listdir can take a really long time (in computer terms). We could potentially do more aggressive (=longer) caching of the results.

3 Likes

However, we could parallelise Unicode normalisation for very large directories:

Speeding this up would also avoid that entries expire too fast.

1 Like

We could also add a fast path for ASCII only filenames which are quite common.

nfc(lower(upper(filename))) is equivalent to lower(filename) for them.

Quick draft: lib/fs: Add ASCII fastpath for normalization by bt90 · Pull Request #9365 · syncthing/syncthing · GitHub

I am really glad having found this thread, because it saved my weekend after hours of debuging. :slight_smile:

My setup is similiar: syncthing running in a docker container on an unraid server with all the performance overhead caused by btrfs on luks (plus unraid’s shfs on top). Also the CPU is rather old.

The Maildir is 4 GB with 142526 files in 234 directories. Synchronisation takes forever (“days”). With caseSensitiveFS=true speed is also slow compared to other folders, but synchronization finishes in about 30 minutes at least.

My pprof shows a similar behaviour as the one @rdslw provided. I never thought that it could be related to string comparison. Thanks for sharing the workaround. You do a great job with syncthing!

Here are the pprof files for CPU and heap in case that it is interesting:

pprof.zip (685.7 KB)

The files in the folder “scanning” is taken during scanning phase, the files in the folder “synchronization” in the syncing phase. folders with a “2” contain the pprof with the setting caseSensitiveFS=true.

Out of curiosity: It seems that it is not only related to the performance impact of the string comparison logic in syncthing. Does the case-insensitive string comparison also trigger additional I/O?

I guess with a faster setup (CPU, disks, etc.) I would have not noticed anything.

Already pointed out by @calmh

Your profile is actually quite a bit more in line with expectations: The majority of time is taken up by syscalls, the actual string comparison/mangling isn’t taking up much time. Incidentally most time is spent stat-ing files outside of the case-sensitivity machinery. Maybe the profile just got (un)lucky timing with lots of cache hits? @rdslw 's profile is quite astounding in that respect, as most time is spent in string manipulation. And a big amount of that with memory management, including GC.

What might help with everything is different caching as mentined by calmh. Would be nice to add some (optional/debugging) metrics to measure how often we hit the cache or not for the same path. Worst case we are already doing a good job, and there are no cache misses on the same paths, but maybe there are and improving the cache retention mechanism (or simply extending lifetime) could help.

Few things to add.

  1. My system is musl based, so this my be the reason for slow string comparison if golang is using some sort of non-optimal multiple calls to musl lib ctypes/isupper/islower etc, while musl is slower in this regards to glibc.

  2. Also: maildir filenames are typical 78 chars long (longer than typical filename), and are quite similar which may create a lot of collisions if hashmap/hashing/caching of them is used while not prepared for high degree of similarity with huge number.

  3. Nevertheless I wouldn’t dismiss os.readdir/listdir calls by syncthing. How loop/algo uses them now might be a place for some optimization (cache etc) as there were a lot of them (too many?).

This specific directory of mine, if scanned now (with proper settings) takes:

  • 180 seconds - on this musl, slow machine
  • 30 seconds - on beefy x86 monster (glibc)

yes, there is a difference, but not so big.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.