Excessive RAM Usage

Hello:

I’m having an issue with Syncthing using up all the available RAM in a Synology NAS, and being killed by the OS.

The NAS has 16GB RAM installed; amongst some smaller folders, I have one particularly large folder set up inside Syncthing - ~7m files at ~46TB. This folder takes a number of days to complete scanning - and I think it’s being killed before the scan completes.

The earliest record I can find of Syncthing being killed is 2018-04-02, during the 0.14.46rc2 release phase. There are no records of Syncthing being killed during the months I had it installed before that date, and I’ve had this large folder set up since my first Syncthing install.

Here are some relevant lines from the kernel log:

2018-05-29T13:46:19+01:00 NAS02 kernel: [462531.460727] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
2018-05-29T13:46:19+01:00 NAS02 kernel: [462532.253377] [27947]   100 27947  5466370  2423492   10678  2951170             0 syncthing

There doesn’t appear to be a panic.log being written; I’ve got logging enabled (scanner and verbose), but I can’t see anything pertinent in there.

In case any of this is relevant:

  • I’m running v0.14.47;
  • I’m running with GOMAXPROCS=2 to limit the CPU usage;
  • I’ve got 8 smaller folders, each with the FS Watcher enabled; this large folder is not yet running the FS Watcher (issue with max_user_watches noted in this thread);
  • Progress Update Interval (seconds) is -1; (for the large folder):
  • Copiers = 0;
  • Hashers = 0;
  • Scan Progress Interval (seconds) is -1;

Here’s a screenshot of the RAM usage since I restarted Syncthing yesterday.

Any ideas where I should look next to track this down?

Thanks,

Pants.

It’s syncing your large folder, which means that at the moment it has the metadata for those 31 TB of files in RAM. That’s unfortunate for a huge folder where lots of it is out of sync, but how it is at the moment. Once the initial sync of that folder has completed the RAM usage will be much lower.

Hi Jakob:

Thanks for the feedback on this - much appreciated.

Was something changed around 0.14.46rc2 in this respect? I had Syncthing running for months on this folder beforehand - but I only saw the OS killing it from 2018-04-02 onwards.

Unfortunately that gets tricky - scanning the folder can take a day or more, and then Syncthing gets killed before (or just after) it’s finished scanning. So it starts the scan again…

In this case, I’ve just noticed that the Ignore Patterns had been replaced by ‘Loading…’ - I’ll reset that, and things will hopefully settle down soon.

For the future: I don’t know the decisions that lead to the metadata being stored in RAM, but can I lodge a feature request for it to be cached to disk instead?

As far as I’m concerned (:wink:), the ability to run Syncthing on a low-power, high-capacity NAS is a good usage case to design for.

Thanks,

Pants.

It’s not scanning though, it’s syncing. The two are different, and have entirely different memory profiles where the amount of memory required while syncing is proportional to the number of files that are out of sync. Unless you’re saying that the other side is the one doing the scanning and getting killed?

You can certainly file a request for it, but folders in the tens of terabytes is, to say the least, a somewhat niche use case. It’s also probably not going to be trivial as the reason for keeping it in RAM is to be able to sort the queue according to the configured criteria. Potentially we could add it as a sorting variant; “I don’t care, keep the metadata out of RAM”.

Or we could just set a limit on the size of the queue. It’ll screw up the sorting criteria when more files than the limit are out of sync, but it’s an easy fix for this issue.

Hi Jakob:

Thanks for the feedback.

If I understand your point correctly, then I think you’re misunderstanding the issue as it applies in the real world for large folders. :wink:

In this situation (as I’m experiencing it):

  • I set a large folder running. It starts scanning and takes 1+ days to complete;
  • Syncthing finds a large number of files to sync, and the RAM usage rises quickly;
  • The OS then kills Syncthing;
  • Syncthing respawns, and starts scanning the large folder. This takes 1+ days to complete;
  • Syncthing finds a large number of files to sync, and the RAM usage rises quickly…
  • And so on.

You can see how it then takes forever to complete the sync.

Agreed - for now. But I’m sure it will only become a larger percentage as time goes on.

In the case where there are a large number of out-of-sync files, I’d argue that the order of syncing is less important than having syncing progress in any order.

Would it be sensible to automatically revert to the “keep metadata out of RAM” mode when the RAM usage exceeds some defined limit?

Best wishes,

Pants.

=> So there are a lot of things that have changed, on this device, compared to what we have in the database. Or, the files are new (not in the database already). These files should be synced to other devices.

Or, things haven’t changed locally but just reading the file metadata (listing directories and checking modification times etc) and database takes a long time because there’s a lot of it and disk access is slow and/or not cached in RAM.

=> In addition to the changes detected above (if any), there are a lot of files that are older than the files on other devices. These need to be synced to this device from someone else.

Scanning does not result in things to sync. Scanning results in things for others to sync.

Out of memory, yes. To be sure of the cause (since there is confusion above), you can capture a heap profile.

Agreed. I think making it just a limit on queue length is a good enough proxy for this and easier to understand though. I filed it.

Hi Jakob:

Thanks for this.

In my case (on this particular device), this is definitely why scanning takes a long time - not because the content has changed a lot.

Sorry - I’d just initiated a Syncthing restart: after correcting the Ignore Patterns, I saw that the RAM usage hadn’t dropped and, rather than wait for the OS to kill it, I jumped straight in and restarted.

From previous experience, I expect the RAM usage will be up in the 4-5GB range in an hour or two whilst it’s scanning. Not high enough to be killed on this machine - but is that to be expected just during the scan phase? I’ll grab a heap profile if that helps.

Brilliant - thanks!

16 gig of RAM is very little for a NAS with tens of TB of files, if you expect to actually access a large number of those files. 16 gig isn’t enough for my little desktop machine. The operating system will always access the files through RAM, so that repeated access only hits RAM. For that to be efficient you’ll want a lot of RAM if your server access many files or huge files. For that kind of server I would want more like 256G of RAM. Although that’s expensive.

In any case, what you could do for the initial scan is to set up an awful lot of swap. There’ll be a lot of trashing and it won’t be fast, but it’ll get the job done and you can get rid of all that swap later.

Hi Tor:

Thanks for your input.

We only have a handful of client machines accessing the NAS - but it’s holding a lot of archived media files, hence the usage.

Best wishes,

Pants.

If the files are > 256 MiB you’ll gain some general memory efficiency by enabling large blocks, too.

Thanks Jakob. Yes - I’ve got my eye on that, and am currently evaluating it on another installation I have.

Will that affect RAM usage in both scanning and syncing?

It makes the block list smaller so the metadata overhead is smaller everywhere. The block list isn’t kept in RAM for more than a couple of files at a time, but if the files are large this can still be significant.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.