Is Syncthing v2 with very large folders possible ?

TLDR: I have huge folders that synced with Syncthing v1 and have trouble migrating to v2 (after a whole month the actual syncing hasn’t even started).

(Originally posted on reddit /r/Syncthing)

Here’s the context.

The largest folder has approximately 30 million files, 2.5 million directories and 15 TiB of data. Folders are send-only on a server and receive-only on two separate servers to keep a fresh copy available at all times, the servers where the folders are receive-only take BTRFS snapshots for backup purposes.

Currently the source and one backup server are still running 1.29.5. I’m trying to migrate only one backup server (to 2.0.10) to validate that it works properly in my environment.

Note : I initially tried using the database migration but it failed. The database was reported migrated in the logs but Syncthing kept presenting the migration UI even after a restart. But that’s a different subject.

I removed the old database to force Syncthing v2 to recreate the indexes from the content folders. After 2 weeks scanning and not transferring actual data I reconfigured it to only make IO intensive operations on one folder at a time. This was 2 weeks ago and it is only working on the largest folder and for 10 days is in the “Preparing to Sync” state.

The on disk database for this folder is easy to spot :

  • 52636483584 Jan 16 18:46 folder.0002-xxx.db
  • 369754112 Jan 26 03:50 folder.0002-xxx.db-shm
  • 190415597392 Jan 26 03:50 folder.0002-xxx.db-wal

That’s nearly 200GiB…

For reference the total DB size on the other servers running 1.29.5 is <50G for a total of ~43 million files and ~19TiB (that was the same on the server migrating).

Syncthing is almost never waiting on disk (currently it is reading at ~2MiB/s with 0 write using 200% CPU for example), doesn’t swap (128GiB of RAM almost all of which available to Syncthing). It sometimes uses ~600% CPU and can use all CPU threads (12 of them) at times too. I see the global state of the directory follow the state of the source (with varying delays measured in minutes usually).

Is there any way to guess at the progress done and an ETA ? Are there performance fixes in later versions (2.0.11 +) ? I didn’t find any by reading the Changelogs.

I hesitate to make changes needing restarts as I believe that this huge WAL means there is a transaction in progress that could simply be aborted meaning 10 days of work would be lost. The other servers have faster disks and CPUs but I’m not sure it would make a big difference (it seems the process is CPU limited and most of time it can’t use many CPU threads, I’d say the average seems around 3 used oscillating between 1 and 12).

If v2 is not suited to my environment, is v1 a viable option in the future ? I could reinstall v1 and recreate the indexes with it (from memory it spent between 2 to 3 days with 90+% of the data currently stored). But if v1 is not maintained this isn’t a viable option.

Is the database located on an SSD? If it’s all spinning drives, then honestly, I don’t think the setup is going to work, especially since the new database seems much heavier on the I/O side of things compared to the old one.

No it isn’t. That said it is on a dedicated ext4 filesystem on a RAID10 of 9 7200rpm drives. And the syncthing process :

  • is almost never in the D state,
  • is constantly using CPU oscillating between 100% and 1200% (using all 12 threads)

So clearly it isn’t limited by the storage or waiting for data from other peers.

Have you tried applying any of the tweaks listed at https://docs.syncthing.net/users/tuning? If your systems are case-sensitive, then specifically enabling https://docs.syncthing.net/users/config.html#config-option-folder.casesensitivefs may have a big impact on the scanning performance.

If those tweaks don’t help though, then I think the amount of data may simply be too much for the SQLite database and the hardware in question…

I’ve tried most of them. The most promising was casesensitivefs but I didn’t change it because I assumed it would trigger a whole new scan.

I’ve just changed it to true and the Web UI is reporting that it is “Saving changes”.

Other promising settings that I didn’t change are :

  • copiers
  • hashers

I refrained from changing from the default as the doc says :

“These are low-level performance options for advanced users only; do not change unless requested to or you’ve actually read and understood the code yourself. :slight_smile:

I don’t know the values Syncthing chose by default for the system so I’m not sure what to try. The system is using :

  • Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
  • 128GiB of RAM
  • 10x 10TB 7200 rpm disks nearly dedicated to Syncthing (with a BTRFS filesystem for storage and an ext4 filesystem for the DB using separate partitions on 9 of the 10 disks to build RAIDs)

I just restarted Syncthing and it logged :

2026-01-26 18:17:56 INF syncthing v2.0.10 “Hafnium Hornet” (go1.25.3 X:nodwarf5 linux-amd64) portage@localhost 2025-12-16 13:00:50 UTC [libsqlite3, noupgrade] (log.pkg=main)
2026-01-26 18:18:01 INF Starting temporary GUI/API during migration (address=xxxxx:8384 log.pkg=main)

It is reading from disk at ~200MiB/s and writing at ~500KiB/s.

Probably triggered by the the casesensitivefs change ? The file constantly changing size in the index directory is the shm file for the folder.

I will add my 2 cents here that when working with small folders with lots of flies (millions), WAL size can indeed be *N of size of data, and really, really huge and stopping the show; this is not this paritcular issue anomaly - I just face it every single time when working with 1M+ folders no matter what - WAL a lot larger than DB it corresponds to.

It was even when deleting 1M folder, first step of deletion processing was to explode 4 GB DB to ~4 DB + 12 GB WAL.

Maybe commits should be made more often - will not harm small WALs but will improve on large ones. (on the other hand, I have an impression that there is some single running operation that can cause such WAL size explode, so more often commit will not make it anyway)

(the rest is many times discussed in topics here around about scalability - sadly but on large installations v2 can cause these showstoppers, but v1 considered bad design by design, and abandoned).

UPD: in spare time I am working on a patch promised in another few threads to reduce DB size and IO footprint, but I am uncomfortable and sorry to say that my spare time is not enough right now to make a good progress here. I am doing this to pay back to the this great project but just slow unfortunately :frowning:

also,

maintenance on DBs of this size is just not going to finish, ever, in any reasonable time.

You will need this patch to make it disabled:

(before the patch, I just was setting interval to “forever” in all my internal builds, it is no-go with that, even with a few millions, annoying constant disk trashing)

or you could help testing actual fixes like wip: use sharding for block database by calmh · Pull Request #10454 · syncthing/syncthing · GitHub

I’m willing to test a development branch, but is this ready to be tested ?

This is a WIP with no documentation and no indication that it even builds or can run as is in place of an existing v2 binary (does it need to start from scratch or can it convert an existing v2 DB ?).

UPD: in spare time I am working on a patch promised in another few threads to reduce DB size and IO footprint, but I am uncomfortable and sorry to say that my spare time is not enough right now to make a good progress here. I am doing this to pay back to the this great project but just slow unfortunately :frowning:

Mind pointing me to the WIP branch? I’d like to try pushing this forward.

@calmh found a few lab rats :hand_with_fingers_splayed:

nothing public yet; I have some internal questions unresolved; my internal implementation we use for our 10+M files installation was to remove all the code that corresponds to internal hierarchy fetch support, fine for us because most of users at scale never need it anyway; but for general public release this is just no-go, because GlobalDirectoryTree is used by many of smaller installations.

Implementing with destroying GlobalDirectoryTree is very straightforward and leads to clear and obvious win, that’s just deleting all the (currently very fat) indexes and replacing them all with 4-byte hash index: index IO pressure reduces 10x. But this makes impossible efficient tree traversals. I cannot public WIP with something “lets delete half of the thing for a while”. This all needs further thinking.

When I have something, it will be discussed in this dedicated thread: https://forum.syncthing.net/t/reducing-v2-db-context-and-footprint-yes-or-no

UPD: or at least it needs parent_id impmentation also right now, also a desired thing, but much more complex too.

1 Like

@gyver @bxff would you mind sharing the output of sqlite3_analyzer for the largest of your folder DBs?

It is running…

The db file is ~60GB and there’s a 10+GB wal. Not sure how long it will take, the process is reading at 2-3 MB/s, writing in the tens of KiB/s and seems mostly waiting for IO.

Syncthing is itself constantly reading and writing at the moment (in the Preparing to Sync phase on the same folder).

Note : I’ve reacted to Files are locally modified when logs says they are unchanged because my receive-only folders seem affected too (it could be part of the problem if Syncthing wastes time trying to sync already identical files).

Here is the result of the analysis. The analysis lasted about 4 hours.

folder-analysis.txt (56.0 KB)

1 Like

I’ve stopped using Syncthing for the past few weeks, and my folder size isn’t particularly large either. So I’m doubtful this would be helpful.

The files table is pretty huge compared to the blocks. How deep is the folder structure?

I don’t have hard numbers as the content is very diverse and not under my control but as I wrote in the topic there are 2.5 million directories. From a quick look I’d say at least 6 levels deep seems common. It could probably go up to around 10 levels as some contents are stored in multi-level trees to avoid too many files per directories and these subtrees are themselves somewhat deep in the main tree.