Memory usage slowly rising after migration to 1.4.0

thanks for the hints.

Let me point out that the reason I raised the issue is not because the allocation is unusually large, it’s just because memory usage seems to increase over time for no evident reason (for example, now it’s fluctuating between 600 and 800Mib, according to ST web interface, and ST has always been idle - no new file was added/touched anywhere).

there are 6 folders and 4 peers.

peers A and B are always on. that remaining 2 peers are off 99% of the time.

folder A and B are shared send-only with peers A and B respectively. the other 4 folders are basically empty.

folder A is 115000 files, roughly 1TB.

folder B is 25000, roughly 300GB.

both trees are well balanced (i.e. no single folder in the tree contains, say, more than 100 files).

as I pointed out, I’m not saying that memory usage is large or small, I’m just saying that it seems to grow over time (something that never happened before the upgrade to 1.40).

I’m hearing more “fluctuate” (grows and then decreases) than “grows over time” (continuous increase until, presumable, crash). I suspect this is the (new in 1.4) database GC which traverses the whole database periodically and (in large database tuning, at least) will likely result in fair amount of in-memory caching.

“fluctuate” does not exclude “grow over time”. it’s not a flat sinusoid: the trendline seems to grow steadily.

The one thing in your profile that might generate continuous growth is the QUIC usage. We’ve seen some reports of leaks on that previously. You could try and disable QUIC (set listen address to tcp://:22000, dynamic+https://relays.syncthing.net/endpoint instead of default, on both sides) and see what effect that has.

thanks for the hint. I’ll definitely try it here. As the entire country is in “soft lockdown”, now I cannot physically reach the other two peer machines.

Is this a change in 1.40? I’m asking, because I never had any issue since I started using SyncThing 1.21.

anyway, if I see another unusually high memory usage, I’ll take another dump.

No, only the db GC thing is new in 1.4. But, you might not have noticed the slight upward trend until the larger fluctuations started happening, causing you to investigate?

I don’t think so.

Memory usage climbs up pretty quickly, today it’s 1.01Gib. So roughly every ~10 days it will need a reboot. I would have noticed.

syncthing-heap-linux-arm64-v1.4.0-094858.pprof (108.6 KB)

Additionally, it looks like logging in the web interface does trigger something. After a couple of seconds from login, memory usage spikes up, cpu usage briefly goes up too, but eventually some memory is released (some is leaked, so the final result is higher than the initial).

syncthing-heap-linux-arm64-v1.4.0-095019.pprof (110.9 KB)

here’s memory usage, as reported from the NAS itself:

The GUI does a bunch of db operations which drives up memory usage.

Both your profiles are 55% QUIC memory. Something is broken with QUIC, but I doubt it’s new in 1.4 because nothing QUIC related changed there.

Did you upgrade regularly or directly from 1.2.1 to 1.4.0? In 1.3.0 and 1.3.1 we updated the quic library versions, and the leak might have been introduced there.

Yes, I updated regularly. Auto-updates are on.

I’m not very familiar with QUIC: can I see if it’s indeed being used? .e.g in the logs, maybe

It is used, that’s evident from the heap profiles. You can disable it:

I just applied the change here (memory was 1.3GB) and restarted. I’ll see what happens now.

the only thing that changed recently on my side: one of the peers (which is just an offsite backup, actually), auto-upgraded to 1.40 and it’s been sort of “stuck” since then. The web gui says:

“Syncing 95%”, download rate: (a few hundred bytes, changes sometimes), out of sync items: 489 (never changes), 0 BYTES.

the list of items pops up empty.

I cannot go and check the machine physically (because of the covid-19 lockdown), but it’s not the first time it happens. I thought it was due to the folder being send-only. Maybe this “permanent connection” is what is creating the disturbance in the Force…

The situation is a bit better, but still weird…

This graph is memory usage in the last 24 hours. The leftmost 03 on X axis means March,27,h15:00. yesterday at 16:00 I disabled QUIC locally and restarted Syncthing (shown as the leftmost 04 in the graph above). Syncthing logs:

[start] 16:00:12 INFO: syncthing v1.4.0 "Fermium Flea" (go1.13.8 linux-arm64) teamcity@build.syncthing.net 2020-03-06 19:52:22 UTC
[start] 16:00:12 INFO: Using large-database tuning

Memory climbed up after a while, then decreased and at about midnight, suddenly dropped.

In the logs, there’s literally nothing. The last line from yesterday is at 16:04, the next line is 10am today.

[B3TV5] 16:04:07 INFO: Device ... client blah blah
[B3TV5] 10:20:37 INFO: Connection to ... at ... closed: read timeout

What’s the y axis unit?

I think it’s memory %. The NAS has 2GB, and currently Syncthing is using more or less 200MB (so 10 on the graph).

My hunch says this is megabytes, I don’t think we start off with 200mb usage straight off the bat?

Allow me to me insist. When you say “I think” in English, it doesn’t necessarily mean you are not sure, you are just being kind. Yes, it’s really memory usage % (see attachment) and yes, Syncthing does take 200MB on startup.

Right, so from 400 to 200 or so MiB. I think this is expected. I can’t explain the difference over time, but a) there’s a lot of things going on internally, b) a lot depends on what other devices are doing or whether the GUI is up, GC is running, etc, c) it’s all garbage collected and managed by the Go runtime which will release memory back the OS when it feels like it, and d) the OS might not reclaim that memory until it feels like it…

So yeah. YMMV. This looks reasonable to me, with large database / folder sizes and all.