Is Syncthing v2 with very large folders possible ?

Can you create a draft PR? This makes it easier to comment and improve the code together.

Done :

I have a small bug to fix before trying it out on our largest folder (the blocks/blocklists cleanup is needlessly launched although there isn’t work to be done and it aborts immediately when detecting it).

I’ve pushed more changes in our db_maintenance_performance branch.

  • more debug logs, especially of potentially costly operations’ runtimes,
  • removes the row count for blocks/blocklists, the adaptative GC doesn’t need it
  • I suspect the checkpoint(TRUNCATE) to be a problem so I added runtime debug logs and forced an interval of 24 hours for it (the rest runs every 5 minutes).

The current code is running on our server right now (but the startup is still in progress, as said earlier, I’ll have more to report in ~48h).

I wanted to update or replace my draft pull request with the latest commits and merge with main but I couldn’t find how…

git rebase main
git push --force-with-lease

Should do the job

Sorry wasn’t clear, I already merged with main. The problem is the draft PR, Github redirects me to the existing one when I want to create a new one and I can’t modify the existing one…

PRs are always tied to a specific branch. Either you update the branch or create a new one, but that is seldomly the right thing to do.

OK, I’ve only created 2 Github PRs including this one so I’m not familiar with the process, sorry.

Everything is fine then.

Just discovered that the cleanup of deleted files takes 25 minutes the first time it is called on our big folder. There’s no index on deleted… so again that’s a full scan.

Added an INDEX on deleted. An alternative would have been a slow walk of the files table instead of a single DELETE over the whole table but with the current indexes this was not something I found a way to implement efficiently. With the INDEX on delete if the single DELETE is still too costly I’ll try the slow walk.

And in addition just found that tidy in db_service.go is not the only place checkpoints are manually triggered. Calls to folder.Update can trigger a checkpoint too…

Having two points with independent logic able to trigger a checkpoint which is a very expensive and potentially locking operation seems like a source of headaches. I just disable the checkpoint logic linked to Update as currently checkpoints just don’t work so I revert to a simpler base.

The GC for file_names and file_versions suffers too.

For each table the delete makes sure that for each idx in the file_names and file_versions there is 0 entry in files. Fortunately there is an index in files for both version_idx and name_idx but the total work is :

  • a full scan of file_names/file_versions
  • a lookup for each entry of file_names/file_versions in the indexes

This is not something you can expect to be fast on tables with millions of rows. This is batch in the background query territory.

I’ll implement a sequential incremental processing similar to the one I coded for the blocks/blocklists, a “slow walk”.

The whole checkpoint handling should be more centralized. Apart from passive checkpoints it’s more or less a hail maria without explicit locking.

Sometimes I wonder if people working on it were simply fed up with the idiosyncrasies of SQLite and started to throw things on the wall to see what stuck.

That said you should not have to lock anything to do a checkpoint in SQLite. In fact it is quite the opposite : all readers and writers should go about their business and just focus on not leaving transactions open too long. The automatic checkpoint should in theory be enough then. If you lock something it means you are potentially blocking code in a transaction which might block you from checkpointing. I haven’t seen a clear example of that but even if it is safe in the current code state this is not robust to code change, that’s for sure.

I’ve seen the sdb/fdb.updateLock.Lock/Unlock littered in the code and I’m not sure if they serve any purpose. At least they should not protect against corruption as SQLite is perfectly capable of doing so. I don’t have the time yet to make a listing of these calls to verify how isolating the code they protect might be useful though.

You can’t truncate the WAL as long as a single reader is active. Short transactions certainly help as the pages in the WAL can be checkpointed quickly, but the WAL file will grow until you hit radio silence.

Not only radio silence : all transactions must be closed.

For each transaction SQLite maintains a pointer to the position in the WAL that corresponds to the state of the whole database when the transaction opened. Until all transactions are closed these positions must be valid so the WAL can’t be truncated.

Isolated quick queries aren’t a problem, SQLite should treat them like transactions that happen only for the duration of the query.

Long queries and more generally long-lived transaction are the problem. Even if Syncthing is doing a boatload of concurrent processes each making small queries by themselves there should be frequent periods where there is zero activity on the database which should let SQLite properly checkpoint. When the last process with activity finish its query, SQLite can detect there aren’t any other access and trigger the checkpoint.

If I understand the current design, automatic checkpoints are not relied upon and maybe even deactivated to be triggered manually. But I’ve not seen any indication that an effort is made so that they can actually complete as these manual triggers don’t check on concurrent activity (maybe on writers by forbidding them if this is the goal of updateLock but readers will block checkpoints as well and I’ve seen the maintenance code open new connexions that may circumvent this mutex).

It’s in a weird spot atm. We partly switched to a dedicated checkpointer but didn’t turn off the auto checkpointer of sqlite. On top of that it’s missing locking to ultimately enforce a full/restart/truncate checkpoint.

IMHO we need a RWMutex at the connection pool level in order to allow the checkpointer exclusive access once needed.

Maybe, but there are two reasons why I’m not eager to go this way right now :

  • in our current state this RWMutex could be a huge performance problem, when the checkpointer wants to work it must take the Mutex which will wait for each and every reader to stop. On our system I just found a 35 minute query after the previous 25 minutes one in the GC* funcs : this means that this checkpointer will effectively block Syncthing from starting any new query until this 35 minute query (and any other already being processed) finishes. This is not acceptable.
  • frankly the codebase is already quite complex : adding an additional synchronization layer right now seems to be premature. I’m slowly learning about the codebase, maybe in a week I’ll have a different opinion.

There are other ways to go too. In fact one of my ideas given the nature of SQLite was to completely rework the DB access by putting one component in charge of doing the DB access, making it expose an asynchronous API to the rest of the code. This is not even an half-baked idea but the potential is there to simplify the concurrency management by making one component aware of every current DB accesses. This component could even dynamically manage the concurrency to levels that the DB can handle and split the queries in chunks by itself as it would have the knowledge of all the current load on the DB. I don’t think this kind of idea is even appropriate within the 2.0.x versions.

Anyway there are still low-hanging fruits around : each and every SQL query that takes more than 250ms on my system is a potential target and as I continue to find some I have enough work as it is.

Getting the transaction durations down is the best bet :+1:

Ouch : I think I finally found the source of the bug I was looking for in my code and once again this was caused by an early confusion between Service and Folder.

The file_names and file_versions slow walks are implemented. The code is starting on our server with the large folder after being tested on my laptop.

Let’s see if it manages to avoid blocking the progress of “Prepare to Sync” in the next 48 hours or if I find a new slow part to speed up or split up.

I have a minor bug in the slow walk cleaning of file_names and file_versions which makes the GC a little less costly that it should be as it doesn’t actually do its full job and only very partially clean the tables.

That said, the worst in term of load is the “blocks” table cleaning and it is fully exercised. periodic is called at 1 minute intervals instead of the 5mn in the code published and even then it doesn’t seem to prevent progress of the Scan and Preparing to Sync phase at all.

In fact I was surprised to find the folder already in the “Preparing to Sync” state this morning and after looking at Zabbix it took a bit more than 8 hours for the full scan instead of the full day I saw the last few times with mainline code.

I suspect that making the GC very short (typically <1s with the Mutex locked for even shorter periods) doesn’t impact general performance because :

  • Syncthing is probably able to work on other stuff (code not blocked by the Mutex). I’m not familiar with the rest of the processes so it is just an educated guess,
  • IO and network buffer processing aren’t blocked until the buffers are empty/full so having short enough freezes of the code doing IO and communication doesn’t have a measurable impact on them.

There’s of course a compromise : the blocks table is far from completely cleaned right now. The GC processed 70k+ rows but there are approximately 120M rows. At the current speed a complete GC on blocks would take… 3 years :frowning:

So I’ll have to make some progress on this table to have something really adequate for large installations. Fortunately I have a few ideas. Having the blocks table only partially cleaned is not a problem in our case, the current code already made the Scan 3x faster and I expect about the same speedup for the “Preparing to Sync” phase.

One possible and simple solution would make the GC effort increase when the folder is in the “UpToDate” state.

For now I’ll fix my bugs and add more logs to better follow what happens. And at the first chance I get (next Syncthing restart probably), I’ll make a copy of the 60Gib+ SQLite database to be able to test queries locally instead of running the whole Syncthing process in (semi) production.

Edit : the current code works very well with “reasonable” folders. The slow walk on my laptop rapidly adjusts to process every GC in a single pass over the whole table.

The bug is fixed but had the opposite effect of what I thought. The two tables were processed entirely on the first pass.

The main DB maintenance is run at a different interval (1/day) and the incremental_vacuum is now really incremental (expected speed : a bit under 300MiB/day) not just in name (the SQLite PRAGMA name is confusing…).

I’ve begun to try to make the code more readable but I have yet to read anything about Go syntax convention so it might not be more readable to others…

I’ll probably have time to read more instead of coding now that our server seems to progress faster.

I have no idea if the quality of the code is good enough to be included and if not what should be done about it (I may find this in Syncthing’s documentation though).

At one point I used the kv table to store the slow walk parameters adapted to the DB, but I believe this was not the right approach. The slow walk algorithm is designed to converge in <30 minutes so restarting from scratch instead of polluting the DB seems more appropriate (unless Syncthing is restarted very often on the device, can it be the case for mobile users ?).

This means that if I’m not mistaken the only change that can’t be reverted just by replacing my version with main is the creation of a partial index on the deleted files.