Syncthing 2.0 resource usage

pixelspark · September 4, 2025, 7:57am

Just managed to find some stats from that slow NAS I was referring to earlier (Synology RS422, 2GB of RAM). This node holds about 3,8 TiB (~500k files). First picture shows swap, second shows memory usage (systemwide). You can clearly see when the update to v2 happened. The node is now running v2.0.6 (with that memory pragma change). Note that after the update I also re-synced a bunch of folders, which is now finished - so it will be interesting to see what the next few days look like.

From this I think it would be worthwhile to investigate other means of reducing memory usage (Syncthing on a NAS seems like a common use case, and these things are often underpowered/old). Also reducing database size is likely to reduce memory pressure. (Was i.e. compression of database pages considered? Also I see that file names in the database seem to be kept twice, both in the row as well as in FileInfo - that’s not going to add up to a lot of data, but it does mean less rows will fit in a single SQLite page).

tomasz86 · September 4, 2025, 8:11am

I can confirm that enabling disk compression (e.g. NTFS compression on Windows) still reduces the database size by 100% (e.g. from 5 GB to 2.5 GB), which is a lot, especially considering the fact that the standard NTFS compression is made for performance, not to achieve the best compression results.

On the other hand, using SQLite build-in tools like VACUUM doesn’t really decrease the database size by much for me (i.e. maybe by a few hundred MB, but much less than the disk compression).

For comparison, the old database barely compressed.

calmh · September 4, 2025, 9:15am

Do you have a suggestion for this that we could consider?

pixelspark · September 4, 2025, 9:27am

I did a quick search earlier and found a few options, but none of them look ideal to me:

SQLite offers a commercial extension for compression. Something like this would be the cleanest solution as it’s mostly transparent to the application, and it compresses at the page level (so the compression can fully utilize redundancy in e.g. file paths, device ID’s etc.). Unfortunately the official extension is commercial.
There is an open source (LGPL…) extension that provides functions for discretionary compression of columns based on zstd (basically you call compress(data, dictionary) in your SQL and some other function to calculate the dictionaries). This seems a lot of work to adopt, and also I’m not sure this is still maintained.
- EDIT: they also appear to offer a ‘transparent’ mode to enable compression at row level for specific tables.
We could compress e.g. FileInfo and other large objects one-by-one (basically to what the above extension does on the Go side), I’m not sure if that would lead to much gain.
The final option would be to optimize the structures we write to the database (like FileInfo) and strip out stuff that isn’t absolutely necessary (for instance the redundant file path).

Of course a proper way to approach this would be to first analyse why the v2 database is larger than v1 (you might have some metrics?) and then see if these elements are compressible/could be made smaller.

bt90 · September 4, 2025, 9:32am

Based on this sample the database size is about 80% caused by the blocklists table:

Maybe we could compress the blprotobuf entry?

calmh · September 4, 2025, 9:35am

Yeah none of those alternatives seem like fantastic ways forward to me… I’m sure you can shave a few percentages here and there by compressing some things and manually removing attributes that could be rehydrated from a column, etc, but I don’t think it’s worth the effort and complexity.

One is a simple, stupid key-value database with an enormous amount of custom logic on top, the other is a SQL database with several indexes and cross references. I don’t know what any analysis would yield here, beyond the obvious – it is what it is.

That doesn’t mean we can’t optimise the structure of course, just that I don’t think comparing it to the v1 leveldb is useful.

It’s mostly a list of block hashes, which is almost mathematically perfect random noise. You’re welcome to try and benchmark it, but I suspect it’s just cpu cycles heating the world for no good reason.

(The exceptions being files with lots of repeated blocks, like zero-filled ones. I don’t think these make up a relevant amount of the data out there, though. Maybe for VM images?)

pixelspark · September 4, 2025, 9:48am

And:

Doesn’t really rhyme with:

If (after VACUUM) there is still a lot of compressibility, and that is not due to the largest table’s contents (which is mostly mathematical noise) then I’m very curious what causes it to be still so compressible. This can either be something in the storage format (padding or wasted space in the pages themselves, but not due to fragmentation. Maybe look at dbstat), indexes, or redundancy in the contents. For the latter we might try the following: for each column, make a copy of the database, set column to NULL, VACUUM, then compare compressed size against original size.

calmh · September 4, 2025, 9:59am

I had longer replies, but I’ll do the short version. I think y’all should try out the suggestions you have and do the analysis you suggest, instead of suggesting I do it. Be the change you want to see.

tomasz86 · September 4, 2025, 10:03am

Just to provide a few more samples, my databases here compress (with NTFS) as follows:

from 6.47 GB to 2.97 GB
from 1.18 GB to 606 MB
from 1.68 GB to 957 MB
from 547 MB to 417 MB
from 4.44 GB to 1.8 GB

This is all after VACUUM as well. Especially the last one compresses exceptionally well for some reason.

pixelspark · September 4, 2025, 10:04am

My suggestions should not be read as ‘please go do this’ When I find the time I will have a deeper look myself of course. I do think a discussion is worthwhile for gathering data points, and also to see what has already been considered and what not.

calmh · September 4, 2025, 10:08am

Yeah, and something like page level compression would be nice, but as you also found when you went looking just isn’t a thing that exists as far as we’re concerned. If this is something you feel should exist, I guess the proper place to implement it is in SQLite…

My database compresses from 1.4G to 1.0G with tar+gzip. That doesn’t surprise me at all.

jb@jbo-m3wl72rv:..on Support/Syncthing % du -sh index-v2
1.4G	index-v2
jb@jbo-m3wl72rv:..on Support/Syncthing % syncthing debug database-statistics | tail -1
main.db + children       -            (total)                      1470400 KiB   91.3 %
jb@jbo-m3wl72rv:..on Support/Syncthing % tar zc index-v2 | wc
 4024546 27749178 1063884800
jb@jbo-m3wl72rv:..on Support/Syncthing %

pixelspark · September 4, 2025, 10:30am

Data point for the slow NAS (note, while running, and no VACUUM beforehand, although after v2.0.4 it should have had a few VACUUMs ):

bash-4.4# du -h -s index-v2.tar
6.3G    index-v2.tar
bash-4.4# gzip index-v2.tar
bash-4.4# du -h -s index-v2.tar.gz
4.6G    index-v2.tar.gz

Compressed size = 73% of uncompressed. That seems a lot less than what @tomasz86 is seeing (still substantial though).

As for memory usage, might also need to check SQLite’s caching settings.

EDIT: Fill rate is about 91%.

main.db + children - (total) 6474040 KiB 90.9 %

bt90 · September 4, 2025, 11:27am

Let’s look at it the other way round. If a sizeable amount of space can be saved here through compression, then it must be due to either duplicate or empty data.

@tomasz86 could you run sqlite3_analyzer on your DB?

This should allow us to conclude if we need to optimize at the sqlite level or the data we pour into it.

tomasz86 · September 4, 2025, 12:50pm

This is the result for a per-folder database that compresses from 1.16 GB to 318 MB.

folder.001c-haqoksnf.txt (43.0 KB)

calmh · September 4, 2025, 1:11pm

*** Page counts for all tables with their indices *****************************

FILES............................................. 211451      69.2%
FILEINFOS......................................... 87005       28.5%
BLOCKS............................................ 3494         1.1%
BLOCKLISTS........................................ 3326         1.1%

I’m going to go out on a limb here and guess that you’re carrying around millions of entries for deleted files that have no blocks associated with them.

(Which is fine, I know that’s how you roll, but it does change the equation on compressibility.)

(syncthing debug database-counts $folderID for the counts of that)

tomasz86 · September 4, 2025, 1:26pm

This is what it says:

DEVICE   TYPE       FLAGS    DELETED  COUNT    SIZE
-local-  FILE       -------  ---      0        0
-local-  FILE       -------  del      0        0
-local-  FILE       ---e---  ---      0        0
-local-  FILE       --G----  ---      40909    23174364513
-local-  FILE       --G----  del      1153895  0
-local-  FILE       --Ge---  ---      0        0
-local-  DIRECTORY  -------  ---      0        0
-local-  DIRECTORY  -------  del      0        0
-local-  DIRECTORY  ---e---  ---      0        0
-local-  DIRECTORY  --G----  ---      2964     379392
-local-  DIRECTORY  --G----  del      149071   19081088
-local-  DIRECTORY  --Ge---  ---      0        0

Yeah, although I’ve just got rid of my workaround to prevent removing deleted files from the database (as the default period of 1 year 3 months should be enough for my use case). However, this specific Syncthing installation/database is only 3 weeks old, so the deleted entries would still be kept in the database regardless.

calmh · September 4, 2025, 1:30pm

Yup. On the plus side, your database will shrink significantly and become less compressible in about a year and three months.

bt90 · September 4, 2025, 1:30pm

I can see what the del flag and COUNT column are for. But why is there a size associated with deleted directories?

calmh · September 4, 2025, 1:31pm

It represents all the space they’re not taking up

More seriously, directories get a synthetic size of 128 bytes for reasons, apparently also in the deleted case. That’s not necessary, deleted files have a size of zero

tomasz86 · September 4, 2025, 1:38pm

I doubt it actually, because I’ve now checked which folders the two databases that compress so much are associated with.

The first one is the system’s Recycle Bin (which I add to Syncthing without sharing, so that it can re-use blocks from it), and the second is a parent path for all other Syncthing folders that ignores everything but .stversions directories (also to re-use blocks from them). The content of those is going to change constantly, with many deletions happening (e.g. in the case of .stversions, they are cleaned out after 1 day).