Large mem usage on sync regardless of db tuning

I made a few tests and thought I would share them, as quite a few discussions/reports are happening at the moment concerning resource usage. I did not get to the bottom of things, these are just some preliminary theories (thus might be totally wrong):

I am doing some mem intensive simulations at the moment and got an unexpected OOM - turns out Syncthing was using ~2GB memory (yes, it’s an old laptop). I noticed it was syncing a huge file then. So I did some testing syncing 1.1GB of random data, and the memory usage increased during the time of sync to ~1GB. Even weirded, after sync memory usage displayed in the UI didn’t drop down immediately, but very slowly. In htop it didn’t go back at all (RES, not VIRT). Suspecting the db, I set db tuning to small, but the same happened (starting from a lower base mem usage). And consecutive pprofs showed an increase im mem usage by protocol.BufferPool, but the full profile never accounted all the used memory. As we didn’t touch that code in quite some time, I am suspecting Go at the moment (I think there were significant changes in GC with 1.13). I’ll do some tests with older Go/Syncthing versions at some point (no time right now) to investigate this further.

3 Likes

It could be the go version 1.13 as we currently see the same user reports at Syncthing Android where users state the “almost same amount of data” syncing didn’t require so much RAM before the latest app (and native) updates.

Suspicion got some basis now:

I reproduced the issue (mem usage rising during sync) with go1.13.3 and Syncthing on master and v1.1.4. I cannot reproduce on master and go1.12.9. Below is a memory profile with master during the sync: syncthing-heap-linux-amd64-v1.3.1-rc.2 25-gcf420e135-164221.pprof

There have been changes to sync.Pool in go1.13:

sync

[…] Large Pool no longer increase stop-the-world pause times.

Pool no longer needs to be completely repopulated after every GC. It now retains some objects across GCs, as opposed to releasing all objects, reducing load spikes for heavy users of Pool .

However request limiting should make sure there’s no more than 32MiB requested at any time and after requesting the buffers should be given back to pool, so it shouldn’t grow above that in the first place, which makes GC an unlikely culprit. So next step is debugging what exactly is happening there - unless anyone has more specific ideas.

2 Likes

Could be we have a bug and are putting more stuff into the pool than we take out. Previously the pool would be emptied on GC but I guess now it isn’t.

Running the LargeFile4G benchmark on vanilla 1.3.0 (with Go 1.13) gets me this baseline:

2019/11/04 17:16:03 Result: Wall time: 2.77946ms / MiB
2019/11/04 17:16:03 Result: 3.68e+05 KiB/s synced
2019/11/04 17:16:03 Receiver: Utime: 4.580026ms / MiB
2019/11/04 17:16:03 Receiver: Stime: 1.283715ms / MiB
2019/11/04 17:16:03 Receiver: MaxRSS: 644452 KiB
2019/11/04 17:16:03 Sender: Utime: 4.62121ms / MiB
2019/11/04 17:16:03 Sender: Stime: 1.075979ms / MiB
2019/11/04 17:16:03 Sender: MaxRSS: 1790704 KiB

Applying this patch:

diff --git a/lib/protocol/bufferpool.go b/lib/protocol/bufferpool.go
index 17ad2f38..a2ea686c 100644
--- a/lib/protocol/bufferpool.go
+++ b/lib/protocol/bufferpool.go
@@ -41,6 +41,7 @@ func (p *bufferPool) Get(size int) []byte {
 
 // Put makes the given byte slice availabe again in the global pool
 func (p *bufferPool) Put(bs []byte) {
+       return
        c := cap(bs)
        // Don't buffer huge byte slices
        if c > 2*MaxBlockSize {

Results in this:

2019/11/04 17:21:43 Result: Wall time: 3.198473ms / MiB
2019/11/04 17:21:43 Result: 3.2e+05 KiB/s synced
2019/11/04 17:21:43 Receiver: Utime: 4.638406ms / MiB
2019/11/04 17:21:43 Receiver: Stime: 1.209779ms / MiB
2019/11/04 17:21:43 Receiver: MaxRSS: 284044 KiB
2019/11/04 17:21:43 Sender: Utime: 4.869509ms / MiB
2019/11/04 17:21:43 Sender: Stime: 1.197102ms / MiB
2019/11/04 17:21:43 Sender: MaxRSS: 174912 KiB

I think it’s safe to say that while we may possibly be saving some allocations using the buffer pool, it’s not a great tradeoff. Lacking proof that it does other good things I suggest simply git rm -f bufferpool.go.

(Don’t read too much into the increased wall time / MiB; there’s a lot of variation, and the Utime/MiB hasn’t increased…)

It’s a 20% difference, which is a lot. I’d rather we fixed the pool.

I just added a comment and a slightly different benchmark. I’m not convinced the wall time difference is statistically significant.

If the memory usage is due to broken pooling, that benchmarks doesn’t say much (it would just be broken differently).

Well, the pool is broken, that’s for sure. I’m fixing it. Although this is puzzling:

 func (p *bufferPool) Put(bs []byte) {
     c := cap(bs)
...
     for i := range BlockSizes {
         if c >= BlockSizes[i] {
+            fmt.Printf("put size %d into %d (block size %d)\n", c, i, BlockSizes[i])
             p.pools[i].Put(&bs)
             return
         }
...
put size 4194304 into 0 (block size 131072)
put size 8388608 into 0 (block size 131072)
put size 4194304 into 0 (block size 131072)
put size 8388608 into 0 (block size 131072)
put size 4194304 into 0 (block size 131072)
...

Or, well, not puzzling but wrong. :slight_smile:

Was it integrated in v1.3.1 ? I’m still getting the issue and it’s really making my server unstable every day so really looking forward to a fix :slight_smile:

1 Like

No, 1.3.2(-rc). (1.3.1 was finalized about a month ago.)

1 Like