Slowly increasing CPU usage over time

olifre · February 13, 2024, 5:44pm

I observe a strange effect: Syncthing CPU usage seems to slowly increase over time on my embedded x86_64 Debian 12 system.

After a fresh start of the service, it stays very well below 1 %, after 2 days, it uses over 2 %. That’s not heavy, but it keeps growing slowly and increases power draw of the otherwise mostly idle system noticeably.

Running strace on the Syncthing process (following forks) reveals a high rate of:

[pid  1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid  1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid  1605] epoll_pwait(4,  <unfinished ...>
[pid   962] <... nanosleep resumed>NULL) = 0
[pid   962] futex(0x1a2afa0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=2996919}) = -1 ETIMEDOUT (Connection timed out)
[pid  1605] <... epoll_pwait resumed>[], 128, 13, NULL, 0) = 0
[pid   962] nanosleep({tv_sec=0, tv_nsec=10000000},  <unfinished ...>
[pid  1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid  1605] epoll_pwait(4,  <unfinished ...>
[pid   962] <... nanosleep resumed>NULL) = 0
[pid   962] futex(0x1a2afa0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=1697811} <unfinished ...>
[pid  1605] <... epoll_pwait resumed>[], 128, 12, NULL, 0) = 0
[pid   962] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)
[pid  1605] epoll_pwait(4,  <unfinished ...>
[pid   962] nanosleep({tv_sec=0, tv_nsec=10000000},  <unfinished ...>
[pid  1605] <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0
[pid  1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0
[pid  1605] epoll_pwait(4,  <unfinished ...>
[pid   962] <... nanosleep resumed>NULL) = 0
[pid   962] futex(0x1a2afa0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=3958898} <unfinished ...>
[pid  1605] <... epoll_pwait resumed>[], 128, 14, NULL, 0) = 0
[pid   962] <... futex resumed>)        = -1 ETIMEDOUT (Connection timed out)

These are much less after a restart of the service.

Enabling debugging at runtime does not yield anything relevant at first glance.

Checking the FDs, file descriptor 4 of PID 1605 is:

anon_inode:[eventpoll]

I have also tried to disable inotify watching for all folders (but without restarting Syncthing), in case this would be related, but no change in CPU usage was observed. The only “special” configuration of the system is staggered versioning, but no multiple connections etc. are used.

Since restarting the service makes CPU usage drop: Does anybody have a good idea on how to debug this?

olifre · February 13, 2024, 6:52pm

Good news! I managed to reproduce this without waiting for a prolonged time!

The “trick” is to press “Rescan all folders” about 80 times. Then, CPU usage increases from <1 % to 2.5 %, so this is nicely incremental. Since a scan is run automatically even with enabled inotify watchers, this likely explains my observation.

I’ll open a GitHub issue soon with all information I have at hand, now that I have a “reproducer”.

olifre · February 13, 2024, 6:59pm

For future readers, the story continues here:

github.com/syncthing/syncthing

CPU usage increases after folder scans (and hence, over time)

opened 06:58PM - 13 Feb 24 UTC

olifre

bug needs-triage

- Syncthing: v1.27.3, Linux (64-bit Intel/AMD) - OS: Debian 12, Kernel 6.1.0-18…-amd64 - Size: 23.075 files, 1.738 dirs, ~45,6 GiB total size, 6 folders - Staggered versioning enabled (not sure if relevant) - `inotify` watches enabled # Observation CPU usage is low in general (< 1 % on a small Intel SoC), but keeps growing slowly over time (after >2 days) to 2 % and above, i.e. by over a factor of two (in idle). After prolonged runtime of some weeks, I also observed > 4 %. # Reproduction steps Either wait ;-), or run many folder rescans. Slowly hitting "Rescan all folders" about 80 times, then waiting until everything has settled down, CPU usage has increased, too. So likely, the regular hourly scan is causing it. # Further information Initially reported as: https://forum.syncthing.net/t/slowly-increasing-cpu-usage-over-time/21639/2 (until I found a reproducer). Note I also observe a higher rate of syscalls such as: ``` [pid 1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0 [pid 1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0 [pid 1605] epoll_pwait(4, <unfinished ...> [pid 962] <... nanosleep resumed>NULL) = 0 [pid 962] futex(0x1a2afa0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=2996919}) = -1 ETIMEDOUT (Connection timed out) [pid 1605] <... epoll_pwait resumed>[], 128, 13, NULL, 0) = 0 [pid 962] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0 [pid 1605] epoll_pwait(4, <unfinished ...> [pid 962] <... nanosleep resumed>NULL) = 0 [pid 962] futex(0x1a2afa0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=1697811} <unfinished ...> [pid 1605] <... epoll_pwait resumed>[], 128, 12, NULL, 0) = 0 [pid 962] <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) [pid 1605] epoll_pwait(4, <unfinished ...> [pid 962] nanosleep({tv_sec=0, tv_nsec=10000000}, <unfinished ...> [pid 1605] <... epoll_pwait resumed>[], 128, 0, NULL, 0) = 0 [pid 1605] epoll_pwait(4, [], 128, 0, NULL, 0) = 0 [pid 1605] epoll_pwait(4, <unfinished ...> [pid 962] <... nanosleep resumed>NULL) = 0 [pid 962] futex(0x1a2afa0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=3958898} <unfinished ...> [pid 1605] <... epoll_pwait resumed>[], 128, 14, NULL, 0) = 0 [pid 962] <... futex resumed>) = -1 ETIMEDOUT (Connection timed out) ``` in this state, but not sure whether this is really related.

Alex · February 20, 2024, 4:57pm

I also noticed constant high CPU usage >4% a few days ago and it was back to more reasonable values after restarting.

With v1.12.2 I never observed this and after downgrading usage seems lower (0.0% in top most of the time, v1.12.3 always has at least 0.3% even after restart). With a few rescans I can bring it up to >1% quickly.

355479 files, 39508 folders, ~1.65 TiB, 52 folders configured in syncthing, OS: Debian 12

olifre · February 20, 2024, 10:21pm

Thanks! Probably best to take this into the GitHub issue, @calmh was already looking into more consistent use of timers.

In my case, the change in behaviour was not clearly version related (I only started more elaborate monitoring of my homeserver with 1.12.3), so this is additional information which may help.

system · March 21, 2024, 10:21pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.