Syncthing killed repeatedly - no errors

Finomosec · April 9, 2018, 11:13am

Version: syncthing v0.14.46 “Dysprosium Dragonfly” (go1.10.1 linux-arm) deb@build.syncthing.net - on a Raspi

I have now seen this for the second or third time:
Syncthing suddenly just not running anymore.
This time i checked: sudo service syncthing@sync status
With this output:

● syncthing@sync.service - Syncthing - Open Source Continuous File Synchronization for sync
   Loaded: loaded (/lib/systemd/system/syncthing@.service; enabled)
   Active: inactive (dead) since Mo 2018-04-09 00:14:26 UTC; 10h ago
     Docs: man:syncthing(1)
  Process: 4304 ExecStart=/usr/bin/syncthing -no-browser -no-restart -logflags=0 (code=killed, signal=PIPE)
 Main PID: 4304 (code=killed, signal=PIPE)

In the journal there was no error message at the end. What could be causing this?

I have one suspect:
I have one folder which is VERY big (more than a million files, >100GB).
I have seen this before, that while scanning this folder syncthing stopped working. My guess would be that there is some kind of is-alive-check of the system to verify that the service is still running, but probably when the scan takes too long it is not responding or not generating a heartbeat or whatever triggers the is-alive-check. Thus the service is killed.

When i paused this particular folder the service survived.
When i first paused it until al other folders were scanned (after restart) and then activated it, syncthing would also survive.

Anyway … it seems connected. (But of course this could be coincidence as well …)

calmh · April 9, 2018, 11:17am

32 bit arm build, most likely running out of memory while scanning that folder.

Finomosec · April 9, 2018, 11:25am

Sounds reasonable.

I actually had a similar problem with the 32bit dev-build (from another ticket) on my windows machine this morning. There was a popup from windows saying syncthing had a memory problem. Didn’t think about it more though. But synchting was using almost 1GB of RAM at that time.

Two questions:

Could you add a proper error-message in that case so it’s easy to understand?
Is there any way to work around this? I mean the machine is 32bit with low RAM, can’t change that. Maybe could you optimize / reduce the memory-usage?

calmh · April 9, 2018, 11:31am

If it dies due to failure to allocate there is a very verbose panic message, I just it just wasn’t captured by systemd or you didn’t see it. If it gets killed by Linux due to OOM, there is no chance to emit a message. The dmesg log will have a message from the kernel about it.

Generally speaking it’s fairly optimized as is. The known exceptions are very large files (which soon will be better with variable block size) and large folders where a lot has changed (because all of the changed files get queued).

Finomosec · April 9, 2018, 5:25pm

Yep you were right: (from dmesg log)

[998745.207835] Out of memory: Kill process 21850 (syncthing) score 826 or sacrifice child
[998745.207975] Killed process 21850 (syncthing) total-vm:1340936kB, anon-rss:856940kB, file-rss:0kB, shmem-rss:0k

Finomosec · April 9, 2018, 5:27pm

When is soon? But anyway i think in my case it’s a lot of small files, a few medium sized and no huge files.

So the problem must be related to the second part:

Could this be done with the help of a temp file? Maybe just if memory runs low?

Databases do this all the time, when sorting stuff that does not fit in memory, they use temp files on disk.

AudriusButkevicius · April 9, 2018, 7:36pm

Set scanProgressInterval to a negative value in the config and see if that helps, or enable swap.

Finomosec · April 10, 2018, 7:38pm

I tried enabling swap. I increased it two times. My final setup is this:

RAM: 1G
SWAP: 2G

Result:

Syncthing crashed again:

I have the feeling that the last update degenerated the needed memory for a scan.

Before it was at least able to download what it got here.

(Not sure though. Could also be that it never had to do a full scan yet, while just downloading stuff …)

I will try again with more swap, but not sure if the system supports that.

But maybe you can take a look again on your side, too.

Greetings Fred;

Finomosec · April 10, 2018, 7:39pm

What does this do?

AudriusButkevicius · April 10, 2018, 9:08pm

Disables the cache which allows to calculates scan progress, so your scans won’t have progress but they won’t consume as much memory.

There are a few other settings that could reduce memory usage, yet I suspect it shouldn’t be this high to start with. Can you try redownloading the binary to make sure it’s not corrupt?

Finomosec · April 11, 2018, 7:44am

I can not further increase swap. It just won’t work. Ill’ try the scanProgressInterval setting next.

AudriusButkevicius · April 11, 2018, 7:47am

Did you try redownloading the binary? Can you post your config?

Finomosec · April 11, 2018, 7:24pm

I don’t think that could / should be a problem. Did this ever happen?
I use apt-get to install it. How should it be corrupted without being totally broken?

Here it is (anonymized): config.xml (41.4 KB)

Finomosec · April 11, 2018, 7:37pm

Ok i did this and restarted the scan NOW (21:35 Uhr German time).
So let’s see what happens.

Currently at: 118 MB RAM

Last scan: 2018-04-10 20:33:25 (which did crash)

Update: 22:19 Uhr German time
So far the scan is still running and the RAM usage did only increase slightly to: 129 MiB

Disk I/O for read and writes varies between 0.2 and 4MB/s.

I’ll let you know tomorrow if it was successful.

AudriusButkevicius · April 11, 2018, 7:43pm

Yes, multiple times

Finomosec · April 12, 2018, 5:35am

The scan was yet again not successful.

But looking in dmesg it does not seem like a memory related problem this time:

[Mi Apr 11 18:05:54 2018] systemd[1]: Starting Syncthing - Open Source Continuous File Synchronization for sync...
[Mi Apr 11 18:05:54 2018] systemd[1]: Started Syncthing - Open Source Continuous File Synchronization for sync.
[Mi Apr 11 18:08:10 2018] systemd[1]: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 7551 (du)
[Mi Apr 11 18:08:10 2018] systemd[1]: Mounting Arbitrary Executable File Formats File System...
[Mi Apr 11 18:08:10 2018] systemd[1]: Mounted Arbitrary Executable File Formats File System.
[Mi Apr 11 19:10:26 2018] systemd[1]: Starting Session c11 of user pi.
[Mi Apr 11 19:10:26 2018] systemd[1]: Started Session c11 of user pi.
[Mi Apr 11 19:10:39 2018] systemd[1]: Starting Session c12 of user pi.
[Mi Apr 11 19:10:39 2018] systemd[1]: Started Session c12 of user pi.
[Mi Apr 11 19:13:09 2018] systemd[1]: Starting Session c13 of user pi.
[Mi Apr 11 19:13:09 2018] systemd[1]: Started Session c13 of user pi.
[Mi Apr 11 21:11:22 2018] systemd[1]: syncthing@sync.service: main process exited, code=exited, status=2/INVALIDARGUMENT
[Mi Apr 11 21:11:22 2018] systemd[1]: Unit syncthing@sync.service entered failed state.
[Mi Apr 11 21:11:22 2018] systemd[1]: syncthing@sync.service holdoff time over, scheduling restart.
[Mi Apr 11 21:11:22 2018] systemd[1]: Cannot add dependency job for unit syncthing-inotify@sync.service, ignoring: Unit syncthing-inotify@sync.service failed to load: No such file or directory.
[Mi Apr 11 21:11:22 2018] systemd[1]: Stopping Syncthing - Open Source Continuous File Synchronization for sync...
[Mi Apr 11 21:11:22 2018] systemd[1]: Starting Syncthing - Open Source Continuous File Synchronization for sync...
[Mi Apr 11 21:11:22 2018] systemd[1]: Started Syncthing - Open Source Continuous File Synchronization for sync.
[Mi Apr 11 22:43:54 2018] systemd[1]: Stopping User Manager for UID 1000...
[Mi Apr 11 22:43:57 2018] systemd[1]: Stopped User Manager for UID 1000.
[Mi Apr 11 22:43:57 2018] systemd[1]: Stopping user-1000.slice.
[Mi Apr 11 22:43:57 2018] systemd[1]: Removed slice user-1000.slice.
[Do Apr 12 00:38:42 2018] systemd[1]: Starting Cleanup of Temporary Directories...
[Do Apr 12 00:38:48 2018] systemd[1]: Started Cleanup of Temporary Directories.
[Do Apr 12 01:12:51 2018] systemd[1]: syncthing@sync.service: main process exited, code=exited, status=2/INVALIDARGUMENT
[Do Apr 12 01:12:51 2018] systemd[1]: Unit syncthing@sync.service entered failed state.
[Do Apr 12 01:12:51 2018] systemd[1]: syncthing@sync.service holdoff time over, scheduling restart.
[Do Apr 12 01:12:51 2018] systemd[1]: Cannot add dependency job for unit syncthing-inotify@sync.service, ignoring: Unit syncthing-inotify@sync.service failed to load: No such file or directory.
[Do Apr 12 01:12:51 2018] systemd[1]: Stopping Syncthing - Open Source Continuous File Synchronization for sync...
[Do Apr 12 01:12:51 2018] systemd[1]: Starting Syncthing - Open Source Continuous File Synchronization for sync...
[Do Apr 12 01:12:51 2018] systemd[1]: Started Syncthing - Open Source Continuous File Synchronization for sync.
[Do Apr 12 05:26:36 2018] systemd[1]: syncthing@sync.service: main process exited, code=exited, status=2/INVALIDARGUMENT
[Do Apr 12 05:26:36 2018] systemd[1]: Unit syncthing@sync.service entered failed state.
[Do Apr 12 05:26:36 2018] systemd[1]: syncthing@sync.service holdoff time over, scheduling restart.
[Do Apr 12 05:26:36 2018] systemd[1]: Cannot add dependency job for unit syncthing-inotify@sync.service, ignoring: Unit syncthing-inotify@sync.service failed to load: No such file or directory.
[Do Apr 12 05:26:36 2018] systemd[1]: Stopping Syncthing - Open Source Continuous File Synchronization for sync...
[Do Apr 12 05:26:36 2018] systemd[1]: Starting Syncthing - Open Source Continuous File Synchronization for sync...
[Do Apr 12 05:26:36 2018] systemd[1]: Started Syncthing - Open Source Continuous File Synchronization for sync.

Finomosec · April 12, 2018, 6:24am

Apparently the scan completed now. The folder is syncing now (after i rebooted the whole device).

Thanks for the help!

imsodin · April 12, 2018, 11:07am

Would it make sense to use a disk-spilling queue in both scanner and puller? Scanner is a bit more problematic as we store a file info (without block though), not just file name. It seems pretty doable to use an adjusted version of the index sorter for this. It’s not the nicest solution considering the disk io discussions going on, but better than OOM crashes and short of walking the folders twice, I don’t see a possibility to have progress updates without intermediately storing files to be processed. I’d definitely propose to use some heuristic criterion for spilling, not a fixed max size, to prevent spilling on systems that can take the memory spike.

Finomosec · April 12, 2018, 5:27pm

Sorry, i’m back again …

Now i have OOM problems again, but this time while syncing (the scan completes ok with the new settings):

SyncthingCrashedAgain.txt (193.2 KB)

This happened several times now. But this time in noticed it in the logs.

The memory usage rises up to at least 1.9G (displayed in the UI).

I will pause the big folder now:
It’s a little bit above 1 million files and about 78GB in size with about 30k folders.

afalout · April 15, 2018, 5:08am

+1 for “soon”

It is sorely needed…

I had to move Syncthing from my NAS box (4GB) to the main PC (16GB) to get it working. And now of course watching for changes does not work (on CIFS mounts)

Thanks!