v1.3.0 crash on ARM (possible resource exhaustion?)

v1.3.0 is installed on my Arch Linux Raspberry Pi in a LAN setup. I’m facing crashes of syncthing on that pi regularly when attempting to sync a folder containing a lot of wav audio files and other data (about 60 GB). Data is stored on a powered USB hard drive attached to the pi.

In systemd journal I get a lengthy go stack trace with the error message “runtime/cgo: pthread_create failed: Resource temporarily unavailable”.

I have been observing this error ever since I installed syncthing and had it running with all my data on the device.

Error message and stack trace seem to indicate that it happens inside the go runtime when creating a new thread.

I tried increasing the maximum allowed threads and decreasing the default thread stack size via ulimit to 4k because some of my research about pthread_create indicated that this might be the underlying cause. However, this did not help to change or solve the issue.

Any hints how to sort out the reason for pthread_create failing with EAGAIN (Insufficient resources to create another thread) would be highly appreciated.

You seem to have done all the troubleshooting as well as I could suggest. Syncthing will use probably a couple of hundred threads when syncing data, so I guess one devil is in the details of what the actual thread limit was?

Edit: Oh, also, on 32 bit and possibly with a lot of files you should set the database tuning to small as there is a bug in 1.3.0. Or, upgrade to 1.3.1-rc.2 which will become 1.3.1 any day now.

I probably haven’t been specific enough on that part. I have seen this issue since at least v1.2.x, unsure which version I initially installed.

Current “ulimit -a” for the user running the syncthing daemon is:

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 7346
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 2048
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 4096
cpu time               (seconds, -t) unlimited
max user processes              (-u) 7346
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I have already doubled the number of files (1k to 2k) and halved the stack size (8k to 4k). Maybe 2k files is not enough when scanning or transferring large directories with many files of various sizes?

Where do you get your binary from?

You can probably watch open fds in /proc as the process runs and see if you can see something fishy.

Limits look sane to me :man_shrugging:

Seems to be the same thing I was hitting: Github issue. Binary is this build.

My case came with the added bonus that the crashes caused corrupted versions of some of the files which were being synced to be transferred to all devices (from the crashing device).

Perhaps a broken release by arch?

Could be. I’ll try and repro with your build.

My ulimit -a output is the same as @trurli’s would have been, before they tweaked it.

Changing databaseTuning to small didn’t help with the crash.

Did you try a vanilla build?

I’ve just been doing that, and it appears to be stable. So I’m blaming Arch Linux Arm (which is odd, because their build script doesn’t look like it contains anything suspicious).

I’ve posted on their forum.

Thanks for your help!

Just checked open files via lsof. When syncthing is running without issues, there are almost 2k files open with around 1k of them being index database files (ldb?) the rest are mostly network sockets.

I guess this number of open files might be expected with 10+ shared directories and a couple of clients?

Arch package build is as vanilla as it gets. There’s one patch applied and that’s for some race-condition in tests on amd64 and shouldn’t have any impact on armv7h build, if I remember this correctly. The issue did not begin with release 1.3.0, but has been observable since at least 1.2.

I will try increasing the user’s file limit and see if the issue persists. Otherwise I will try to build syncthing for arm myself, but for now my guess is that either the defaults are too low or a pi 3 doesn’t have enough juice for my setup.

@trurli I did find that the vanilla build fixed the issue. Do bear in mind that the alarm build seems to be built with PIE, and gccgo?

Try installing syncthing-bin from the AUR - that’s the vanilla build, all packaged up. Worked for me.

Ok, thanks, I will try that next, then.

Yes, synthing-bin from AUR seems to behave differently in regards to open files. I will monitor that for a bit, but currently instead of ~2k open files, syncthing only needs ~800.

My current working hypothesis is, that this is a statically linked binary and therefore does not need any file handles for accessing system libraries (which were almost half of the observed 2k files open of the other version).

If that’s true, the arm syncthing community package is fine, it just needs more than a 2k file descriptor limit for this workload.

I will try it out and report back.

1000 fds for libraries? That seems excessive or surprising to me but it’s your system… :slight_smile: We set the database layer to use 200 fds (max). Then there can be quite a few for syncing files while that happens, plus a few for connections of course. 800 sounds high but reasonable for when there is a lot going on with a large database.

Yeah, 1k for libs sounds crazy to me, too. Time to do some more research, I guess.

Possibly related:

As of Syncthing 1.1.x, the test suite for PIE builds was broken in Chakra Linux (Arch-like architecture, x64-only). I did not use gccgo. I don’t know if some tests are meant to be disabled for PIE, if it’s different on ARM, or if PIE changed since 1.1.x.