Limits of syncthing?

Are they limits? Too mutch files or too large files? Can i use it for my gigabytes of files?

from the webpage: “Powerful. Synchronize as many folders as you need with different people.”

I think there are no unintended restrictions. Theoretical limitations of file systems and, possibly, the programming language.

There is a limit of a million files, see https://github.com/calmh/syncthing/commit/0d3caa218307b9eb441da22385a68ea77ec70e4d

Yeah, there are a few limits. These are mostly so that when there are issues with the protocol (which have happened, but are quite rare now) we catch the problem when one node says “I have one hundred billion files!” instead of trying to allocate memory for it.

These limits should not be hit by users doing non-broken things. If/when they are, they should be fixed,

IMHO btsync needs 100Bytes RAM for every tracked file. Is there something in syncthing?

A list of “for what usage is syncthing good and for what not” will be good.

e.g.:

  • Can i use it for ~200GB MP3 files (~80000 files in 16000 directories)
  • Can i use it for big files? e.g. Videos, ISO larger than 4GB ?
  • Can i use it for quick changed files? e.g. browser cache (bad example)

btw. Syncthing is not in the list here: https://en.wikipedia.org/wiki/Comparison_of_file_synchronization_software

I don’t have exact numbers for memory usage; it depends on many factors, but it will be a bit more than 100 bytes. There’s a bunch of metadata (timestamps, mode, version counter, …) adding up to about 80 bytes, the actual file name of course, and a list of block checksums (about 48 bytes per 128 KB file data). Then there’s a bunch of indexes and things to track which nodes have which version of what file and so on.

I think all of your use cases make sense, except the one about very quick changes. The change detection is by scanning, by default every 60 seconds, so that sets an obvious limit. There’s also the factor that the other nodes need to actually have time to synchronize to the new version of a file before it changes again.

At some point, when the amount of data being synchronized goes toward infinity, it will be better to not keep this info in RAM but use some disk based key/value store.

As a remark:

BTSync makes a difference here, because it uses a sqlite database for every synced folder. I have not analyzed this in depth, but there seems to be a mapping from path to a blob which looks very much like torrent encoded json-ish stuff.

As result of the the thread below, the index is one of the syncthing bottlenecks:

Is there also a file size limit limit ?

Because is see this:

In https://github.com/calmh/syncthing/blob/master/protocol/PROTOCOL.md is the info that one block is 128 KiB (131072 bytes)

So the biggest file can be  ~12GB (100000*131072 bytes) ?

EDIT:

Is this per repo or per node?

Per repo. And yes, correct on the other limit. It’s a bit arbitrary though; there wouldn’t be any harm in bumping that two magnitudes or so.

I would like to add in the FAQ more information around the question, can i use syncthing for my needs…

So i like the idea of insert the information, how many RAM syncthing needed and how big the index would be. Of course, approximate values…

e.g.: 1 file needs X bytes RAM and results in X bytes in index file.

Maybe add a table like this:

  • 1000 files -> X KB RAM, X Bytes index
  • 10000 files -> X MB RAM, X KBytes index
  • 100000 files -> X MB RAM, X KBytes index
1 Like

It’s kinda tricky to predict. The in-RAM size is easier, but it’s still a whole bunch of fairly dynamic data structures and I haven’t done any real optimization on it. Probably shouldn’t, and instead look into something database backed. Experimentally, it looks like something like 100 bytes per file and 150-200 bytes per 128 KiB block of data. The latter is about four times as much as I would expect from the size of the structs, so there might be something going on there.

Then there’s added bookkeeping to keep track of the files other nodes have, not just our own.

The index being sent on connect is pretty much the stuff in memory, without overhead, compressed. Which is what is stored in the .idx.gz files.

My idea was to just calculate the effective sizes by simply arithmetic: e.g.:

per file size = idx.gz size / repo file found

Would that be ok?

So a repo of 10.000 files and 5GB size will result in:

100 bytes * 30000 = ~3MB

and

5GB / 128KB * 150-200Bytes = ~6MB-8MB

So RAM usage is about 9-11MB, only?

Actually that’s probably on the low side. Things is, we keep a list of all files and their blocks etc. Then we have a bunch of structures tracking which of those files we have, which of them every other node has, which is the latest version we should strive to get, etc. So the number of files and their individual sizes is one factor. Another is how many nodes are connected. Another is what files those nodes have. Then there’s temporary buffers for file data, compression, encryption, various old copies of objects that have not yet been garbage collected, …

Some information.

I currently have 45444 files in three repos totalling 331GB, It’s using 304MB of RES memory ( and 798 of VIRT memory) according to top. It’s a i386 linux.

O another node I currently have 18790 files in two repos totalling 300GB. It’s using 129MB resident memory, and 778MB virtual. That’s ARMv7 Linux.

This is unfortunately too much for my poor machine (It’s a NAS) so it’s swapping, and I’ve not added my Photos yet… So yeah, either I need another machine or another software.

I’m working on a different index structure that uses an on disk database instead. SQLite is awesome as always, but doesn’t crosscompile as easily as Go and setting up build servers for every architecture is a pain (and won’t be possible for ARM). There are other key-value stores written in Go that I’m trying as well. It’s a significant change so might take a little while, but it’ll solve the memory usage issue once I get it done.

I add this status info on:

https://github.com/calmh/syncthing/issues/295