Is Syncthing v2 with very large folders possible ?

@gyver would you like to test my PR?

I’m willing to try but keep in mind :

  • I won’t be able to report anything for about 48h : the initial scan after restart takes more than a day.
  • If the database is corrupted, I won’t keep trying to migrate to v2 and revert to v1 : I’ve already waited more than 7 weeks. Currently I think it may take another 3 to 5 weeks but if your PR breaks something I won’t have the patience for another try. So please triple-check there’s no risk of that happening.

If you can get someone with a relatively large but smaller folder than ours that can be scanned relatively quickly that would be ideal. If not, I think I can checkout your code and build it.

To have a better idea of the risks I looked at your commit and found this that you didn’t change.

_, _ = conn.ExecContext(ctx, `ANALYZE`)
_, _ = conn.ExecContext(ctx, `PRAGMA optimize`)

This is BONKERS ! By default ANALYZE reads the whole database… I’ll let you imagine what it does on my system…

This is even less needed given that apparently on opening the database context there is already PRAGMA optimize = 0x10002 which is the recommended way of calling ANALYZE automatically when using long lived connections. Anything else is not only superfluous but in the case of large databases downright harmful.

I’m not even sure ANALYZE or optimize (which launches ANALYZE in the background) is even needed. It is used when the queries are complex enough that it isn’t obvious if an index will speed them up. From what I know Syncthing should only make simple lookups that should always benefit from the indexes so if SQLite uses them by default no ANALYZE is even needed.

Premature optimization is the root of all evil.

Can you get rid of this in your branch ? If not I will remove them myself before building.

Good point. Optimize should be more than enough.

If someone else likes to test the PR, feel free. I’m lacking the data and hardware to properly test this. Having everything on NVMe drives apparently has also some kind of downside :sweat_smile:

I’m looking at the rest of db_service.go and there’s some really bad things for large databases in there.

All the garbageCollect* methods basically make full table scans to do their work.

So each time “periodic” is called what is done is a nearly full db read (most if not all large tables are parsed) and it locks the whole DB doing so (the call to fdb.updateLock.Lock()).

There are two ways I know how to address this kind of problem :

  • on DELETE or (UPDATE if the “parent” changes), check if the parent object you reference still has a child, if not delete it, this can be done in the Go code or in a TRIGGER, this will properly use the indexes,
  • don’t find all the objects on each pass, but limit the scan to a segment of the table you want to clean and rotate this segment on each periodic call.

I tend to favor the first one as it is the most efficient but it is not self healing in case of bugs, the second one is more robust but needs to put a tunable in the hands of the user so that (s)he can make the trade-off between the speed of the cleanup and the impact on the performance.

I’ve seen something very suspicious about disabling foreign keys temporarily that I didn’t try to understand but the need for this hack might go away naturally by using one of the methods above.

The more I dig the more problems I find.

I saw that garbageCollectBlocklistsAndBlocksLocked attempted to split the table in ranges to process and being hopeful I tried to understand the whole process…

One of the first things it does is count the rows in the table to have a guess as how to split the whole hash range in subranges… This means a whole table scan right from the beginning, not having even anything to process yet.

Still in garbageCollectBlocklistsAndBlocksLocked. The function aborts after 5 minutes so I wondered how it continued on the next call to periodic. It doesn’t try to remember where it was and simply randomizes the ranges so that there’s a chance to process another range next time… So for large databases this means that to process ALL ranges you’ll have to roll the dices many, many, many times…

In fact the function doesn’t abort after 5 minutes exactly. It aborts after the processing of the current range exceeds the 5 minutes window : if a range of gcChunkSize takes 1 hour to process for each table, the function will abort after two hours (there’s two tables being processed). In the mean time Syncthing is basically dead waiting for cleanups even if there are none to be done.

Even leaving aside that the process is itself wasteful and could probably be handled efficiently by a TRIGGER that would allow SQLite to use its indexes, if there is no other mean than processing the whole table this should use a restartable process that adapts to the load :

  • select an initial very short range of hash values based on an estimation of the row count (and not the exact value), it should bring something like 10 rows statistically
  • fetch the hashes without reference in a simple select, (you might have to count the rows covered by the range to adapt more precisely)
  • remember how many there are and the time spent to get them
  • delete them
  • adapt the width of the next range to reduce or increase based on the load and the amount of work done.

Ideally this process should run with very short ranges targeting a total process time < 1 second (and not 5 minutes) and being called at regular intervals (every minute should be a nice compromise between time allocated to this and responsiveness).

You don’t want a program designed to replicate changes from machine to machine in the order of seconds/minutes being blocked for several hours at a time.

I decided to bite the bullet and learn Go :fearful:

I forked the main repository to work on this. My first commits build so at least I make some progress. I’ll have to test my builds on a local small instance before restarting my huge one.

You’ll find the work in progress at :

I have to learn as I go so I welcome any feedback on Go best practices, mistakes, SQLite best usage patterns, …

2026-02-08 14:27:29 INF Invalid version string …

I’ve looked up the error, made a local tag and checkout the tag but I’m not sure it rebuilds correctly.

Why not a --dev-mode to bypass this annoying check ?

Found -version parameter to go run build.go by looking around that forces the version string…

I’m not sure that is the way to go, I followed the build guide and there was nothing about this.

I read this but I failed to see how it actually works. I created a local tag with the correct format but it wasn’t used when calling go run build.go.

Should I have fetched the original tags like described in the doc ? I don’t see how it would help as I’m not building one tag but my own version.

Fetching the tags work but I’m not sure why. Probably some magic makes them different from a local tag.

I’d advise to move a reference to the note in the “Building” paragraphs.

On my first try I stopped reading at “Building (Unix)” as that was my goal and I couldn’t imagine that you could build something that would later refuse to run.

Initially I didn’t even go back to the documentation and started to google for the error as I thought it was not something “normal” but maybe a problem in my build environment that would be common knowledge for a Go developer.

I I had paid attention up to the end I wouldn’t have missed this but I have currently half a dozen problems to solve in my head and many things to learn at the same time so I have to read fast if I want to not forget half of what I want to do. Sorry.

v2 adds a bit to that as you also need a C compiler if you want to use our default sqlite driver. A bit of restructuring would go a long way IMHO. The important bits are a bit late in the docs.

It would be nice if the build without additional options would yield a fully functional dev binary. e.g no-upgrade should be the default. Apart from the final release artifacts, it doesn’t make sense.

I’ve seen this while looking at the docs. I have the default build environment for ArchLinux AUR so I should be covered. Is there a test or build log to check where I can verify that the C driver is used ?

Still coding. My code will probably make go developpers eyes bleed (I know there must be some way to write more elegant code but I’m forging ahead).

I found a bug that I’m fixing right now : the blocks and blocklists table GC is paused as long as the device sequence doesn’t move. For all other GC this is fine but these tables are cleaned incrementally so even when the sequence is stationary there can be work to be done.

Check the notes on go run build.go build.

I’m not cross compiling so the note doesn’t apply to my case ?

I implemented the low hanging fruits I found. This works on my laptop with 13 folders with not much data. I validated at least that syncing is working and my debug messages reflect the work I expected Syncthing to do after my changes when I create new files and delete existing ones.

This has to be :

  • cleaned up : I’ve coded in many languages but never in Go and I didn’t take the time to learn its capabilities yet.
  • battle tested : I’ll let my laptop run as is for the night and probably use the binary on the v2 server still trying to “Preparing to Sync” tomorrow.
  • reviewed : I made a huge mistake when fixing the bug I found (blocks/blocklists GC stopped before finishing when the device sequence freezes) by putting the necessary data to track the GC in the Service struct instead of the Folder… so I clearly don’t have a clear understanding of the code structure yet (and I’m a bit tired).

Don’t try to use this if you’re not familiar with the code and understand my changes…

I’m not in a position to test it but I appreciate you taking a crack at this…