OpenBSD out of memory

Continuing the discussion from Folders are "up to date" but devices show "syncing" with no change:

So the issue is that Syncthing on OpenBSD 6.6 constantly goes out of memory panic error. I have raised RAM from 4GB to 40GB (Vmware 6.7), no difference…

Here is the panic log file

Any ideas how to resolve this?

There are two reasons I can think of that memory consumption has increased significantly lately:

Ideally you could post a heap profile before the OOM to see what’s causing it: https://docs.syncthing.net/users/profiling.html

Otherwise one can disable the above, but I’d rather get the profile first :slight_smile:

Panic looks like it’s allocating while reading the database. This may mean nothing, it’s just the straw that broke the back and says nothing about what was already there, but it might indicate that there is some sort of corruption in the database on disk. Especially if this just suddenly happened.

tried disabling QUIC, but that doesn’t help…

tcp://0.0.0.0:22000, dynamic+https://relays.syncthing.net/endpoint

[EAGTA] 20:21:15 INFO: Completed initial scan of sendreceive folder public
[EAGTA] 20:21:44 INFO: QUIC listener (0.0.0.0:22000) shutting down
fatal error: runtime: out of memory
[monitor] 20:22:12 WARNING: Panic detected, writing to "/home/st/.config/syncthing/panic-20200117-202212.log"
[monitor] 20:22:12 WARNING: Please check for existing issues with similar panic message at https://github.com/syncthing/syncthing/issues/
[monitor] 20:22:12 WARNING: If no issue with similar panic message exists, please create a new issue with the panic log attached
[monitor] 20:22:12 INFO: Reporting crash found in panic-20200117-202212.log (report ID bc1c834e) ...
[monitor] 20:22:13 INFO: Syncthing exited: exit status 2
[monitor] 20:22:14 INFO: Starting syncthing

so it keeps going out of memory every few minutes …

@calmh I already did -reset-database. But I remembered that I got this issue on clean OS install with no DB - when it started syncing files from other hosts, it run out of memory. I just increased RAM from 4GB->16GB, later to 40GB. So it just “resolved” an issue and allowed to synchronize almost all folders I have.

Here are the heap files, made multiple ones (after some ~10 sec) until it went with memory from ~200MB starting till ~600MB hang with OOM.

If the damage was caused by quic already you would only see improvements after a restart.

Panic shows oom within seconds of startup with no QUIC. Makes no sense to me, didn’t look at the profile yet though because mobile. Also don’t understand how things go from working on <1G to oom on 40 gigs with fresh setup and apparently 600M usage… Hunch says 32bit binary, but it’s not. :man_shrugging:

It’s definitely the db that uses the memory based on the heaps, but the last one only accounts for ~300MB, so not that much. That’s nowhere near 4GiB. It really seems like something more is going on here - no idea what though.

OK. I think it might be cause of QUIC. I still have it disabled, made a database reset and started server with all paused. Unpaused folders and allowed full scan, then unpaused all devices one by one. It took all day, but with the RAM for the OS being downsized down to 4GB - it’s working!!! :slight_smile: Although I don’t know why it’s downloading ~60GB of files, which haven’t been changed (I know they have been copied and non changed photos, movies etc with different dates, but no changes on photos of 2008, 2009 etc… 2019 , but it’s redownloading part of that stuff). good syncthing-heap-openbsd-amd64-v1.3.3-205235.pprof (316.9 KB) I added also a new HEAP file, maybe difference between old ones gives some clue.

P.S. also noticed on VMWare console, that Monitor graph of OpenBSD memory usage is steady 2.33GB (total RAM 4GB). Previously it went just up from bottom till ~14GB (if I remember correctly) with total RAM 40GB and then the Syncthing just crashed with OutOfMemory. I think it’s some bug causing memory leak ?

f*** I can definately say it’s QUIC. Had it all being resynced after 20h - don’t know why Syncthing redownloaded ~60GB of files, but I have nightly rsnapshot - in the rsnapshot log it writes about plenty of directories, but detects no changes of files (only downloaded /.stversions of all files which syncthing redownloaded)…

but regarding QUIC - I reenabled it through settings - set to default and babam… >

fatal error: runtime: out of memory
[monitor] 06:46:15 WARNING: Panic detected, writing to "/home/st/.config/syncthing/panic-20200119-064615.log"
[monitor] 06:46:15 WARNING: Please check for existing issues with similar panic message at https://github.com/syncthing/syncthing/issues/
[monitor] 06:46:15 WARNING: If no issue with similar panic message exists, please create a new issue with the panic log attached
[monitor] 06:46:15 INFO: Reporting crash found in panic-20200119-064615.log (report ID f637d286) ...
[monitor] 06:46:15 INFO: Syncthing exited: exit status 2
[monitor] 06:46:16 WARNING: 4 restarts in 50.582437716s; not retrying further

Now disablid QUIC doesn’t help either even with all devices paused, because it on startup tries to scan folders and just keep crashing … >

fatal error: runtime: out of memory [monitor] 06:46:15 WARNING: Panic detected, writing to “/home/st/.config/syncthing/panic-20200119-064615.log” [monitor] 06:46:15 WARNING: Please check for existing issues with similar panic message at https://github.com/syncthing/syncthing/issues/ [monitor] 06:46:15 WARNING: If no issue with similar panic message exists, please create a new issue with the panic log attached [monitor] 06:46:15 INFO: Reporting crash found in panic-20200119-064615.log (report ID f637d286) … [monitor] 06:46:15 INFO: Syncthing exited: exit status 2 [monitor] 06:46:16 WARNING: 4 restarts in 50.582437716s; not retrying further

so now it seems - to resolve crashing I have to disable QUIC and then reset the database, so syncthing can build it can again…

Unfortunately I don’t have any HEAP logs, as it happened so fast I couldn’t react to it…

attached panic logs, if that give some clue: panic-20200119.7z (72.4 KB)

while writing this, I started again syncthing to see what happens, it started, enabled QUIC, it still crashed, then it all scanned and made all folders up to date. panic-20200119-071402.reported.log (100.9 KB) heap19.7z (733.1 KB) so this is from run with multiple crash restarts and the latest 2 is from reboot myself.

although it’s now shows all up to date, any ideas how to resolve this issue?

I dunno man. The QUIC thing appears to be a fairly slow leak of some kind under some circumstances; I can’t see it causing out of memory more or less directly at startup. If that were the case it would be crashing all over the place for people, and you seem to be one of quite few affected by this.

What was the last version that worked for you? Are you seeing this on some non-OpenBSD machine?

Not that OpenBSD stands out as an outlier either, looking at average/max memory usage for 1.3.x versions per platform…

ur=> select platform, avg(memoryusagemib)::int, max(memoryusagemib), count(*) from reports where date='20200118' and version like 'v1.3._' group by platform having count(*) > 10 order by avg desc;
   platform    | avg | max  | count
---------------+-----+------+-------
 freebsd-amd64 | 181 | 2008 |   600
 linux-amd64   | 128 | 8886 | 12757
 darwin-amd64  | 114 | 7940 |  1704
 openbsd-amd64 |  96 |  320 |    14
 linux-arm64   |  92 |  998 |   418
 linux-386     |  87 | 1437 |   298
 windows-amd64 |  87 | 8582 | 15351
 linux-arm     |  86 | 1298 |  2437
 android-arm64 |  56 |  767 |  3779
 android-arm   |  49 |  175 |   806
 windows-386   |  49 | 1560 |  1314
 linux-mipsle  |  46 |  118 |    48
 android-386   |  42 |   65 |    20
(13 rows)

Oh, what you can do is run Syncthing with the STHEAPPROFILE variable set:

STHEAPPROFILE=1 syncthing ...

This will make Syncthing check memory usage every 250ms and write a heap profile whenever it’s increased. You should get a series of profiles, with the largest one being just before the crash. The profiles above are unremarkable.

% for f in * ; do go tool pprof -top $f | grep total ; done
Showing nodes accounting for 128.44MB, 99.61% of 128.94MB total
Showing nodes accounting for 132.89MB, 99.63% of 133.39MB total
Showing nodes accounting for 132.39MB, 99.62% of 132.89MB total
Showing nodes accounting for 127.75MB, 98.84% of 129.26MB total
Showing nodes accounting for 126.93MB, 99.61% of 127.43MB total
Showing nodes accounting for 124.86MB, 99.60% of 125.36MB total
Showing nodes accounting for 131.97MB, 98.86% of 133.49MB total
Showing nodes accounting for 140.44MB, 99.28% of 141.45MB total
Showing nodes accounting for 137.75MB, 98.91% of 139.27MB total
Showing nodes accounting for 135.47MB, 98.15% of 138.02MB total
Showing nodes accounting for 139.94MB, 97.88% of 142.98MB total
Showing nodes accounting for 133.01MB, 98.86% of 134.54MB total
Showing nodes accounting for 133.58MB, 98.87% of 135.11MB total
Showing nodes accounting for 133.58MB, 98.50% of 135.61MB total
Showing nodes accounting for 133.58MB, 98.50% of 135.61MB total
Showing nodes accounting for 130.94MB, 98.10% of 133.48MB total
Showing nodes accounting for 130.37MB, 97.72% of 133.41MB total
Showing nodes accounting for 128.12MB, 99.61% of 128.62MB total
Showing nodes accounting for 126.92MB, 99.61% of 127.42MB total
1 Like

I had some OpenBSD 5.x version with I think v.131 or 132 at the time. Then I installed new box with OpenBSD 6.6 and version 1.32. It started showing the panic logs already at the beginning. Now I remember there was also issue with the listener, showing that there is too many files open, so I just disabled listener and put it to rescan folders 3600sec (as this is a “server” box, no files are changed on it anyway). The new one was crashing few times with OutOfMemory, so I just increased the OS RAM from 4GB to 16->40G, so it synced all the files. Afterwards it transferred all the files, i deleted the old OpenBSD machine. But later I noticed on other Windows PC, that they show remote OpenBSD as syncing, although OpenBSD box show all up to date. Other windows 10 machines work without problems. This is the only OpenBSD box I have.

I started Syncthing with STHEAPPROFILE=1 syncthing …

but the problem is, now it’s not crashing … :slight_smile:

OK. Made again database reset, so it got crashed, but this time only 2 times. Hope it will give some clue… :slight_smile: heap20.7z (1.4 MB)

But still I don’t get why every time on database reset syncthing is redownloading part of files (this time it was less than previously, but still), although they were not changed on other devices? Rsnapshot doesn’t see difference and does not make a backup of them.

It’s not redownloading, it’s reconciling changes, which show up as out of sync until it has done comparison to figure out that nothing had changed.

The heap profiles are the same as https://github.com/syncthing/syncthing/issues/6226 - that quic stuff needs some investigation. I have looked at the quic_listen code last weak and as far as I saw any stream/conn/… should be closed properly, i.e. I either didn’t find the leak or it is upstream (well that doesn’t help that much, does it :slight_smile: ).

Thanks, that I know when it’s comparing files. But in reality in previous time Download Rate was showing 60GB. now 7GB. Also I see that rsnapshot downloads .stversions folder of “changed” files. To be sure I made md5 hashes of files (new files and .stversions of “changes”):

MD5 (MVI_4289.MOV) = e1bc3f420a9e43b5a9d7a5f4b094e024
MD5 (MVI_4289~20200118-155223.MOV) = e1bc3f420a9e43b5a9d7a5f4b094e024
MD5 (MVI_4289~20200120-191111.MOV) = e1bc3f420a9e43b5a9d7a5f4b094e024

MD5 (MVI_4291.MOV) = 61aeb02d161021ca78fa76ced2b2c499
MD5 (MVI_4291~20200118-163548.MOV) = 61aeb02d161021ca78fa76ced2b2c499
MD5 (MVI_4291~20200120-210445.MOV) = 61aeb02d161021ca78fa76ced2b2c499

MD5 (MVI_4292.MOV) = 690db5ab074b734b18564b5de9759e7a
MD5 (MVI_4292~20200118-170656.MOV) = 690db5ab074b734b18564b5de9759e7a
MD5 (MVI_4292~20200120-192022.MOV) = 690db5ab074b734b18564b5de9759e7a

MD5 (MVI_4293.MOV) = 57971bdeb4602323dbce437d311785f3
MD5 (MVI_4293~20200118-155748.MOV) = 57971bdeb4602323dbce437d311785f3
MD5 (MVI_4293~20200120-194422.MOV) = 57971bdeb4602323dbce437d311785f3

MD5 (MVI_4294.MOV) = 918a0606a0ad8d6b1ca7b3710889692a
MD5 (MVI_4294~20200118-165938.MOV) = 918a0606a0ad8d6b1ca7b3710889692a
MD5 (MVI_4294~20200120-195841.MOV) = 918a0606a0ad8d6b1ca7b3710889692a


-rw-r--r--  1 st  syncthing  522768488 Nov 14  2013 MVI_4289~20200118-155223.MOV
-rw-r--r--  1 st  syncthing  522768488 Nov 14  2013 MVI_4289~20200120-191111.MOV
-rw-r--r--  1 st  syncthing  509726648 Nov 14  2013 MVI_4291~20200118-163548.MOV
-rw-r--r--  1 st  syncthing  509726648 Nov 14  2013 MVI_4291~20200120-210445.MOV
-rw-r--r--  1 st  syncthing  696222744 Nov 14  2013 MVI_4292~20200118-170656.MOV
-rw-r--r--  1 st  syncthing  696222744 Nov 14  2013 MVI_4292~20200120-192022.MOV
-rw-r--r--  1 st  syncthing  777975880 Nov 14  2013 MVI_4293~20200118-155748.MOV
-rw-r--r--  1 st  syncthing  777975880 Nov 14  2013 MVI_4293~20200120-194422.MOV
-rw-r--r--  1 st  syncthing  520868528 Nov 14  2013 MVI_4294~20200118-165938.MOV
-rw-r--r--  1 st  syncthing  520868528 Nov 14  2013 MVI_4294~20200120-195841.MOV

-rw-r--r--  1 st  syncthing  522768488 Nov 14  2013 MVI_4289.MOV
-rw-r--r--  1 st  syncthing  509726648 Nov 14  2013 MVI_4291.MOV
-rw-r--r--  1 st  syncthing  696222744 Nov 14  2013 MVI_4292.MOV
-rw-r--r--  1 st  syncthing  777975880 Nov 14  2013 MVI_4293.MOV
-rw-r--r--  1 st  syncthing  520868528 Nov 14  2013 MVI_4294.MOV

As you can see, nothing has changed… but it might be caused by crashing with OutOfMemory, as now again I see that other PC’s show this OpenBSD ST3 device as “syncing” with some files. So it maybe can’t provide correct data regarding this which cause it to think that the file has changed.

I got another from this morning - I don’t know if it’s the cause, but when one of the PC’s connected, after 2 min happened the crash. It was working fine all night (10+ hours with multiple other devices connected) heap21.7z (972.5 KB)

Heap profile shows nothing. I’m starting to feel you have some limit that is not physical memory. ulimit? No swap and a bunch of other stuff using RAM? There’s no reason it should fail with OOM with a 200M heap on a system with gigs of RAM.

ST3c$ ulimit -a
time(cpu-seconds)    unlimited
file(blocks)         unlimited
coredump(blocks)     unlimited
data(kbytes)         786432
stack(kbytes)        4096
lockedmem(kbytes)    1346232
memory(kbytes)       4033548
nofiles(descriptors) 512
processes            128

ST3c$ pkg_info
bzip2-1.0.8         block-sorting file compressor, unencumbered
gdiff-3.7p0         GNU versions of the diff utilities
gettext-runtime-0.20.1p0 GNU gettext runtime libraries and programs
glib2-2.60.7        general-purpose utility library
libffi-3.2.1p5      Foreign Function Interface
libiconv-1.16p0     character set conversion library
libidn2-2.3.0       implementation of IDNA2008 internationalized domain names
libpsl-0.20.2p0     public suffix list library
libsigsegv-2.12     library for handling page faults in user mode
libslang-2.2.4p5    stack-based interpreter for terminal applications
libssh2-1.9.0       library implementing the SSH2 protocol
libunistring-0.9.7  manipulate Unicode strings
lynx-2.8.9rel1p0    text web browser
mc-4.8.23p0         free Norton Commander clone with many useful features
nano-4.4            simple editor, inspired by Pico
oniguruma-6.9.4     regular expressions library
partial-intel-firmware-20191115v0
pcre-8.41p2         perl-compatible regular expression library
pcre2-10.33         perl-compatible regular expression library, version 2
png-1.6.37          library for manipulating PNG images
python-3.7.4        interpreted object-oriented programming language
quirks-3.182        exceptions to pkg_add rules
rsync-3.1.3         mirroring/synchronization over low bandwidth links
screen-4.6.2        multi-screen window manager
sqlite3-3.29.0      embedded SQL implementation
unzip-6.0p12        extract, list & test files in a ZIP archive
wget-1.20.3p1       retrieve files from the web via HTTP, HTTPS and FTP
xz-5.2.4            LZMA compression and decompression tools
zip-3.0p1           create/update ZIP files compatible with PKZip(tm)


load averages:  0.03,  0.01,  0.00                                                                                                ST3c.hl 14:03:33
59 processes: 58 idle, 1 on processor                                                                                             up 3 days, 17:21
CPU0 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU1 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU2 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU3 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU4 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU5 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU6 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
CPU7 states:  0.0% user,  0.0% nice,  0.0% sys,  0.0% spin,  0.0% intr,  100% idle
Memory: Real: 305M/1712M act/tot Free: 2240M Cache: 1305M Swap: 0K/2055M

  PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
 4078 st        10    9  702M  261M idle      thrslee  31:53  0.00% syncthing
31162 st        10    0  110M   14M idle      thrslee   0:36  0.00% syncthing
78373 _pflogd    4    0  892K  564K sleep/0   bpf       0:12  0.00% pflogd
91081 st         2    0 1156K 2352K idle      select    0:07  0.00% screen

I don’t know, it’s default install, nothing specific, have added only few things like mc, nano etc. On that box is not running anything else except syncthing. The syncthing is downloaded from syncthing web site, not from packages (but I doubt it’s the cause?) RAM is 4GB, SWAP 2GB.

I have local discovery servers with pretty old version - syncthing-discosrv-0.12.2. Does QUIC use them somehow?

And the Clients are located in two different times zones (as they are in two different countries). Half of them is in one place, other half in second place.

The quic thing might indeed be a red herring (as @calmh was suggesting but I ignoring all along?) . I was incorrectly assuming the profiling is just accounting for a fraction of the actually used memory, and as the major and growing part is quic, I blamed that. Given it’s <300MB and that’s all, that alone doesn’t point to quic being to blame for OOMs.