CPU Usage on large folder

I have been using syncthing for some time and it’s really nice. Recently I added a new folder with over 170000 items and ~5GB. syncthing CPU usage went up a lot, and I guessed it was due to every-60-seconds rescan of such a big folder. So I run syncthing-inotify now, which I think is making it sleep longer between scans. But, when I make a change, syncthing seems to still use tonnes of CPU for some time during the sync – is it doing a full re-scan instead of just syncing the files that inotify tells it changed?

It hashes the files that have changed, to figure out what actually changed inside them.

Right… but when only a few (< 10) files changed, I wouldn’t expect just the hashing to take 85%-130% of my 2Ghz GPU for many seconds (activating my CPU fan’s noise level) would I? Is the hashing process that intensive? Each file by itself is not big ~10-30K

I think it tells to check the directory that file was in, and not that exact file, hence if your file layout is flat it might peek at other files too, though of other files haven’t changed it should only issue a stat call, which should not consume CPU.

You can run with STTRACE=scanner env var set and check the logs of what it’s actually doing.

So, running with STTRACE=scanner I see:

[EOTUD] 2016/02/09 11:41:09.617273 blockqueue.go:45: DEBUG: open: open /home/si
ngpolyma/Mail/.syncthing.dovecot.index.log.tmp: no such file or directory
[EOTUD] 2016/02/09 11:41:09.617552 blockqueue.go:45: DEBUG: open: open /home/si
ngpolyma/Mail/.syncthing.dovecot-uidlist.tmp: no such file or directory
[EOTUD] 2016/02/09 11:41:09.627936 blockqueue.go:45: DEBUG: open: open /home/si
ngpolyma/Mail/cur/.syncthing.1455036044.21581_0.iweb:2,.tmp: no such file or di
rectory
[EOTUD] 2016/02/09 11:41:09.784051 blockqueue.go:45: DEBUG: open: open /home/si
ngpolyma/Mail/cur/.syncthing.1455036042.21588_0.iweb:2,.tmp: no such file or di
rectory
[EOTUD] 2016/02/09 11:41:11.462954 walk.go:100: DEBUG: Walk /home/singpolyma/Ma
il [] 131072 &{[{0x18dfe120 true} {0x18dfe2a0 true}] true 0x1a93f9d0 375e9597da
d3813a0744abafadbd1667 0x19335b40 0x1a93f460}
[EOTUD] 2016/02/09 11:41:11.463049 walk.go:488: DEBUG: checkDir /home/singpolym
a/Mail &{Mail 4096 2147484141 {63590632869 954848666 0x8a50f00} 0x1e256780}
[EOTUD] 2016/02/09 11:41:11.484454 walk.go:246: DEBUG: ignored: .git/.git-stash
.10526-index
[EOTUD] 2016/02/09 11:41:11.488728 walk.go:246: DEBUG: ignored: .git/.git-stash
.10526-index.lock
[EOTUD] 2016/02/09 11:41:11.491645 walk.go:246: DEBUG: ignored: .git/.git-stash
.10771-added
[EOTUD] 2016/02/09 11:41:11.493938 walk.go:246: DEBUG: ignored: .git/.git-stash
.10941-added
[EOTUD] 2016/02/09 11:41:11.496141 walk.go:246: DEBUG: ignored: .git/.git-stash
.12014-added
[EOTUD] 2016/02/09 11:41:11.497437 walk.go:246: DEBUG: ignored: .git/.git-stash
.13226-added
[EOTUD] 2016/02/09 11:41:11.498569 walk.go:246: DEBUG: ignored: .git/.git-stash
.14441-added
[EOTUD] 2016/02/09 11:41:11.498697 walk.go:246: DEBUG: ignored: .git/.git-stash
.16684-added
[EOTUD] 2016/02/09 11:41:11.498776 walk.go:246: DEBUG: ignored: .git/.git-stash
.22010-added
[EOTUD] 2016/02/09 11:41:11.498854 walk.go:246: DEBUG: ignored: .git/.git-stash
.25509-index
[EOTUD] 2016/02/09 11:41:11.498967 walk.go:246: DEBUG: ignored: .git/.git-stash
.25642-added
[EOTUD] 2016/02/09 11:41:11.499116 walk.go:246: DEBUG: ignored: .git/.git-stash
.8695-added
[EOTUD] 2016/02/09 11:41:11.500589 walk.go:246: DEBUG: ignored: .git/COMMIT_EDI
TMSG
[EOTUD] 2016/02/09 11:41:11.500716 walk.go:246: DEBUG: ignored: .git/FETCH_HEAD
[EOTUD] 2016/02/09 11:41:11.500823 walk.go:246: DEBUG: ignored: .git/HEAD
[EOTUD] 2016/02/09 11:41:11.501036 walk.go:246: DEBUG: ignored: .git/MERGE_RR
[EOTUD] 2016/02/09 11:41:11.501158 walk.go:246: DEBUG: ignored: .git/ORIG_HEAD
[EOTUD] 2016/02/09 11:41:11.502497 walk.go:246: DEBUG: ignored: .git/branches
[EOTUD] 2016/02/09 11:41:11.491645 walk.go:246: DEBUG: ignored: .git/.git-stash
.10771-added
[EOTUD] 2016/02/09 11:41:11.493938 walk.go:246: DEBUG: ignored: .git/.git-stash
.10941-added
[EOTUD] 2016/02/09 11:41:11.496141 walk.go:246: DEBUG: ignored: .git/.git-stash
.12014-added
[EOTUD] 2016/02/09 11:41:11.497437 walk.go:246: DEBUG: ignored: .git/.git-stash
.13226-added
[EOTUD] 2016/02/09 11:41:11.498569 walk.go:246: DEBUG: ignored: .git/.git-stash
.14441-added
[EOTUD] 2016/02/09 11:41:11.498697 walk.go:246: DEBUG: ignored: .git/.git-stash
.16684-added
[EOTUD] 2016/02/09 11:41:11.498776 walk.go:246: DEBUG: ignored: .git/.git-stash
.22010-added
[EOTUD] 2016/02/09 11:41:11.498854 walk.go:246: DEBUG: ignored: .git/.git-stash
.25509-index
[EOTUD] 2016/02/09 11:41:11.498967 walk.go:246: DEBUG: ignored: .git/.git-stash
.25642-added
[EOTUD] 2016/02/09 11:41:11.499116 walk.go:246: DEBUG: ignored: .git/.git-stash
.8695-added
[EOTUD] 2016/02/09 11:41:11.500589 walk.go:246: DEBUG: ignored: .git/COMMIT_EDI
TMSG
[EOTUD] 2016/02/09 11:41:11.500716 walk.go:246: DEBUG: ignored: .git/FETCH_HEAD
[EOTUD] 2016/02/09 11:41:11.500823 walk.go:246: DEBUG: ignored: .git/HEAD
[EOTUD] 2016/02/09 11:41:11.501036 walk.go:246: DEBUG: ignored: .git/MERGE_RR
[EOTUD] 2016/02/09 11:41:11.501158 walk.go:246: DEBUG: ignored: .git/ORIG_HEAD
[EOTUD] 2016/02/09 11:41:11.502497 walk.go:246: DEBUG: ignored: .git/branches
[EOTUD] 2016/02/09 11:41:11.502655 walk.go:246: DEBUG: ignored: .git/config
[EOTUD] 2016/02/09 11:41:11.502791 walk.go:246: DEBUG: ignored: .git/descriptio
n
[EOTUD] 2016/02/09 11:41:11.503662 walk.go:246: DEBUG: ignored: .git/gc.pid
[EOTUD] 2016/02/09 11:41:11.505029 walk.go:246: DEBUG: ignored: .git/hooks
[EOTUD] 2016/02/09 11:41:11.505126 walk.go:246: DEBUG: ignored: .git/index
[EOTUD] 2016/02/09 11:41:11.505202 walk.go:246: DEBUG: ignored: .git/info
[EOTUD] 2016/02/09 11:41:11.505278 walk.go:246: DEBUG: ignored: .git/logs
[EOTUD] 2016/02/09 11:41:11.505357 walk.go:246: DEBUG: ignored: .git/objects
[EOTUD] 2016/02/09 11:41:11.505435 walk.go:246: DEBUG: ignored: .git/packed-ref
s
[EOTUD] 2016/02/09 11:41:11.505513 walk.go:246: DEBUG: ignored: .git/refs
[EOTUD] 2016/02/09 11:41:11.507026 walk.go:246: DEBUG: ignored: .git/rr-cache
[EOTUD] 2016/02/09 11:41:11.517844 walk.go:246: DEBUG: ignored: .stfolder
[EOTUD] 2016/02/09 11:41:11.519414 walk.go:246: DEBUG: ignored: .stignore
[EOTUD] 2016/02/09 11:41:22.976984 walk.go:173: DEBUG: Walk /home/singpolyma/Ma
il [] current progress 0/1 at 0.0 MB/s (0%)
[EOTUD] 2016/02/09 11:41:22.979383 walk.go:167: DEBUG: Walk progress done /home
/singpolyma/Mail [] 131072 &{[{0x18dfe120 true} {0x18dfe2a0 true}] true 0x1a93f
9d0 375e9597dad3813a0744abafadbd1667 0x19335b40 0x1a93f460}

My file layout is very flat (pretty much all ~177000 files are in one directory) and I have a .git folder that is set to ignore (which it seems to re-look at in order to ignore it every time? My pattern is .git/**)

So maybe it’s just stat’ing 177000 items every time and that’s the cause?

It doesn’t look like its walking 177k files, nor do I think stating 177k files should be cpu intensive. Is that the whole of the log?

The actual walk is quite fast, but

means that we’re actually scanning the entire directory. Probably because it’s flat and inotify has merged multiple file changes into a change on the parent directory. So you’re doing more work by using inotify than just periodic scans, at least in your log above with a scan just ten seconds after the previous scan.

This is off-topic, but if you are syncing a Maildir, or some other directory that is actually a filesystem-based database meant to be used with special tools (.git directories are another example), you may want to think about whether or not you will break the “database” by letting Syncthing write to it without using its “API”.

Ok, so for now I’ve turned off inotify and set the scan interval to 10 minutes. That’s much better, at least

Yeah, it’s a Maildir. I’ve been syncing it using git for years, decided to try syncing using syncthing instead. so for it’s doing pretty well, though I don’t think syncthing knows about renames at all, which makes it a little less good at it than git was in some cases

It deals with renames perfectly fine.

Let us know how it goes!

Would be nice if the inotify service could prevent rescans instead of causing them… As it is I think touches on the remote and blowing away my local renames because I’ve set rescan locally to lower than remote. Or something.

The point of inotify is that it causes a scan when a file changes. You have lots of file changes, hence lots of rescans.

But ideally, the inotify daemon knows what has changed, so there’s no reason to rescan.

You need to rescan if something has changed, to workout the hashes of the blocks on the file, and to advertise it.

Filesystem notifications only tell you when a file has changed: they don’t tell you which bits of the file changed in what way.

But you should only have to re-hash the blocks on the file that changed. Not re-look-at every file in the folder.

syncthing-inotify is optimised for folders that aren’t excessively large: if you get a ton of updates for files in the same folder, (in general) it’s more efficient to wait a few seconds for the activity to die down, then tell Syncthing “all of the files in this folder changed” - the cost of stat-ing the unchanged files in that folder is very low, whereas the cost of starting many small scans is higher.

Sure, I get that that’s the current state of affairs. Ideally, I’d like to not scan at all, since a scan on this folder is expensive.