Memory usage slowly rising after migration to 1.4.0

this morning memory is fluctuating between 350mb and 650mb.

syncthing-heap-linux-arm64-v1.4.0-072530.pprof (74.2 KB)

One more thing I didn’t mention: the NAS is dedicated to Syncthing, it’s not doing anything else (except sharing folders over the LAN, obviously).

It depends, this memory allocation does not have to be unusual. Because every file, every directory, i.e. every element must also be managed. But I guess that each element occupies +/- 1kB in RAM. Simon can say more or better about that.

Hence my question, how many peers (folders) have you created and how many files and directories (top right under “This Device”) are managed?

1 Like

What Andy says, to some extent. There will be some overhead in handling a large setup > 1 TB. Additionally, in detecting this Syncthing makes a trade off for performance over memory usage:

[start] 18:22:04 INFO: Using large-database tuning

You can set the database tuning to small in the advanced config. This will reduce RAM usage somewhat. Your profile is busy scanning and running a database GC. You might want to reduce the GC scanning interval to make that happen less frequently, as you have a large database and slow computer:

export STGCINDIRECTEVERY=720h

(default is every 13h)

thanks for the hints.

Let me point out that the reason I raised the issue is not because the allocation is unusually large, it’s just because memory usage seems to increase over time for no evident reason (for example, now it’s fluctuating between 600 and 800Mib, according to ST web interface, and ST has always been idle - no new file was added/touched anywhere).

there are 6 folders and 4 peers.

peers A and B are always on. that remaining 2 peers are off 99% of the time.

folder A and B are shared send-only with peers A and B respectively. the other 4 folders are basically empty.

folder A is 115000 files, roughly 1TB.

folder B is 25000, roughly 300GB.

both trees are well balanced (i.e. no single folder in the tree contains, say, more than 100 files).

as I pointed out, I’m not saying that memory usage is large or small, I’m just saying that it seems to grow over time (something that never happened before the upgrade to 1.40).

I’m hearing more “fluctuate” (grows and then decreases) than “grows over time” (continuous increase until, presumable, crash). I suspect this is the (new in 1.4) database GC which traverses the whole database periodically and (in large database tuning, at least) will likely result in fair amount of in-memory caching.

“fluctuate” does not exclude “grow over time”. it’s not a flat sinusoid: the trendline seems to grow steadily.

The one thing in your profile that might generate continuous growth is the QUIC usage. We’ve seen some reports of leaks on that previously. You could try and disable QUIC (set listen address to tcp://:22000, dynamic+https://relays.syncthing.net/endpoint instead of default, on both sides) and see what effect that has.

thanks for the hint. I’ll definitely try it here. As the entire country is in “soft lockdown”, now I cannot physically reach the other two peer machines.

Is this a change in 1.40? I’m asking, because I never had any issue since I started using SyncThing 1.21.

anyway, if I see another unusually high memory usage, I’ll take another dump.

No, only the db GC thing is new in 1.4. But, you might not have noticed the slight upward trend until the larger fluctuations started happening, causing you to investigate?

I don’t think so.

Memory usage climbs up pretty quickly, today it’s 1.01Gib. So roughly every ~10 days it will need a reboot. I would have noticed.

syncthing-heap-linux-arm64-v1.4.0-094858.pprof (108.6 KB)

Additionally, it looks like logging in the web interface does trigger something. After a couple of seconds from login, memory usage spikes up, cpu usage briefly goes up too, but eventually some memory is released (some is leaked, so the final result is higher than the initial).

syncthing-heap-linux-arm64-v1.4.0-095019.pprof (110.9 KB)

here’s memory usage, as reported from the NAS itself:

The GUI does a bunch of db operations which drives up memory usage.

Both your profiles are 55% QUIC memory. Something is broken with QUIC, but I doubt it’s new in 1.4 because nothing QUIC related changed there.

Did you upgrade regularly or directly from 1.2.1 to 1.4.0? In 1.3.0 and 1.3.1 we updated the quic library versions, and the leak might have been introduced there.

Yes, I updated regularly. Auto-updates are on.

I’m not very familiar with QUIC: can I see if it’s indeed being used? .e.g in the logs, maybe

It is used, that’s evident from the heap profiles. You can disable it:

I just applied the change here (memory was 1.3GB) and restarted. I’ll see what happens now.

the only thing that changed recently on my side: one of the peers (which is just an offsite backup, actually), auto-upgraded to 1.40 and it’s been sort of “stuck” since then. The web gui says:

“Syncing 95%”, download rate: (a few hundred bytes, changes sometimes), out of sync items: 489 (never changes), 0 BYTES.

the list of items pops up empty.

I cannot go and check the machine physically (because of the covid-19 lockdown), but it’s not the first time it happens. I thought it was due to the folder being send-only. Maybe this “permanent connection” is what is creating the disturbance in the Force…

The situation is a bit better, but still weird…

This graph is memory usage in the last 24 hours. The leftmost 03 on X axis means March,27,h15:00. yesterday at 16:00 I disabled QUIC locally and restarted Syncthing (shown as the leftmost 04 in the graph above). Syncthing logs:

[start] 16:00:12 INFO: syncthing v1.4.0 "Fermium Flea" (go1.13.8 linux-arm64) teamcity@build.syncthing.net 2020-03-06 19:52:22 UTC
[start] 16:00:12 INFO: Using large-database tuning

Memory climbed up after a while, then decreased and at about midnight, suddenly dropped.

In the logs, there’s literally nothing. The last line from yesterday is at 16:04, the next line is 10am today.

[B3TV5] 16:04:07 INFO: Device ... client blah blah
[B3TV5] 10:20:37 INFO: Connection to ... at ... closed: read timeout

What’s the y axis unit?