Scan not finishing (after >10 days)

Hi,

Trying to Sync a folder (1 TB) with subfolders (around 2mio files per folder).

After 10 days its still scanning. If I activate the debug log (scanner), there is a log line every 20sec or so about a new file hashed. Why is it so slow? Is there any way to speed it up?

Hardware: New Macbook Pro M1

Thanks for assistance.

Is the storage local (i.e. not something like a network drive or similar)? How many files are there in total?

yes its the internal disk.

like 5mio files in total.

It seems to take way too long to hash each file. 20 sec per file mean 3 files per minute, 180 files per hour, and only 4320 files per day. With this kind of speed, it’s going to take forever to scan all the files.

Is the disk healthy?

1 Like

@jerryjacobs Syncthing-macos is distributing arm builds on M1s, right? FS access on the compat layer is known to be ridiculously slow (ok “is known” - it is for docker and that’s all I know).

@syncer What did the log at startup say about hashing speed?

1 Like

2022-10-06 19:44:17 Single thread SHA256 performance is 1073 MB/s using crypto/sha256 (688 MB/s using minio/sha256-simd). 2022-10-06 19:44:18 Hashing performance is 347.74 MB/s 2022-10-06 19:44:18 Overall send rate is unlimited, receive rate is unlimited

1 Like

yes the disk is healthy and everything is working fine in general.

maybe its related to those big folders? 1mio+ files per folder?

What’s the CPU, memory and disk usage like? Grabbing two cpu profiles and if memory usage is significant also a memory profile would show where it spends time: Profiling — Syncthing v1.22.0 documentation For big folders the metadata checking might take a lot of of resources (and that could be disabled), but given you are already at the hashing stage that doesn’t apply here.

activity monitor looks good. cpu 30%. lots of memory unused.

unfortunately i cant share any logfiles since I am working with highly confident client files (lawyer)

That’s a lot of files. I’m going to assume that you mean Syncthing folders, not that you have a single folder on disk with two million files in. Regardless, even you have folders with “just” thousands of files in them, directory access tends to be very slow when there are a lot of files in a folder. We do a lot of directory accesses.

mainfolder (added to syncthing) → 4 subfolder → each folder 1mio+ files

Yeah, that’s not going to fly.

sad world. but thank you. will try another software :slight_smile:

1 Like
jerry@coconut ~ % file /Applications/Syncthing.app/Contents/Resources/syncthing/syncthing
/Applications/Syncthing.app/Contents/Resources/syncthing/syncthing: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64
- Mach-O 64-bit executable x86_64] [arm64]
/Applications/Syncthing.app/Contents/Resources/syncthing/syncthing (for architecture x86_64):	Mach-O 64-bit executable x86_64
/Applications/Syncthing.app/Contents/Resources/syncthing/syncthing (for architecture arm64):	Mach-O 64-bit executable arm64
1 Like

Maybe you can trick Syncthing by gradually adding more files? From my experience once the initial scan is finished Syncthing works ok(-ish) with a huge number of files

Are those files located on a spinning disk? I’ve done a local test in Windows with a single folder that contained exactly 1,000,000 files. It took less than an hour to scan it with a 4-core 8-thread Ryzen 4350G CPU. However, the folder in my case was located on a RAM disk. Just in case yours is located on an HDD, I’d strongly suggest to try using at least a decent SSD instead and then see how long it takes to scan it in Syncthing.

Just for the record, Scan Progress Interval was set to -1 (i.e. disabled).

It’s not an IO thing. They said they use the internal storage in the M1 MacBook, which is really speedy. It’s that listdir on a directory with a million files takes ages, and we do a lot those for case insensitivity lookups.

Here’s an example from my M1 Ultra, internal storage:

jb@sep:~ % mkdir tmp/large
jb@sep:~ % for ((i=0; i<1000000; i++)); do echo tmp/large/file-$i ; done | xargs touch
jb@sep:~ % time ls tmp/large | wc -l
 1000000
ls -F tmp/large  7.11s user 5.18s system 64% cpu 19.044 total
wc -l  0.00s user 0.00s system 0% cpu 19.043 total

It takes 19 seconds just to list the names in the directory. We do this at minimum once per file we scan. Sure, we cache the result, but for like 5 seconds, so that’s going to expire until we need it next time…

Certainly some other software may do this better, or at least with different tradeoffs between accuracy and performance than we have, but it’s a well known issue best avoided by a more reasonable directory structure.

1 Like

Sounds like we should increase those timeouts for large folders if the memory usage doesn’t completely explode? As a user i’d expect higher resource usage for these kind of workloads.

Probably wouldn’t hurt. The cache time is short because the lookups prevent data loss, but maybe the cache time should be max(5 sec, 5 * lookup time) or something.

Would it help if OP disables case-insensitive handling? At least if his filesystems fit that criteria.