Completely untested and the code quality is probably poor as I’m not a go dev:
@gyver does your WAL ever shrink to a reasonable size ( < 1 GB) ?
Not while in the “Prepare to Sync” state : it only keeps growing. Last time without the case sensitive fs option activated it nearly reached 200GiB (see my original post).
As I write this it is 33 873 762 472 bytes.
This is 700MiB more than 10 hours ago but it didn’t change significantly for the last 7h though (which is reflected on the stable used space on the filesystem according to Zabbix measurements).
For now the rise in WAL size (as seen by the FS used space) is correlated with the decrease in needFiles (but I only have 12h of data to confirm this).
A 200GB WAL will absolutely shatter any kind of performance.
The fact that the WAL never shrinks is an indicator for checkpoint starvation.
But contradictory to that the checkpoint logs appear to be normal?
It is checkpoint starvation. The returned result is 1 which means the checkpoint was blocked from completing by readers.
And after 2 days I can confirm that the WAL size rises at the same time the needFilles changes, this seems consistent with the checkpoint starvation.
The WAL size went past 41GiB today.
Since your WAL was larger before, it must have shrunk at some point. Does that coincide with the maintenance window?
No, it only shrunk when I restarted Synching to enable case sensitive fs and it was almost empty just after that.
I suppose that during startup SQLite is in the conditions needed to apply the content of the WAL to the db file and then truncate the WAL.
Is there a correlation between WAL size and the speed of needFiles going down? My assumption is that the performances degrades as the WAL grows.
Not that I can see. Syncthing is now running for nearly 12 days, the WAL is now nearly 45GiB and the number of files in needFiles decreases roughly at the same rate now compared to 10 days ago when the WAL was empty or almost empty.
I’m not sure why you think WAL size should impact the speed of needFiles decrease. There are 2 “states” I can see :
- needFiles is blocked while Syncthing tries to checkpoint, Syncthing is more or less frozen so the speed is not affected in this state,
- needFiles decreases and Syncthing simply adds modifications to the end of the WAL. Write speeds (INSERT/UPDATE/DELETE) should not be slowed down.
The only thing that could be slowed down are reads (SELECT) that need an up to date state of the database (ie, content of the db file with modifications in the WAL applied). But :
- I’ve not seen anyone pointing at some reads being slow so there might not be any during the Preparing to Sync phase (Syncthing might have everything it needs in memory),
- SQLite itself could maintain the necessary information in memory,
- The system has 128GiB of RAM and there is almost no memory pressure outside of Syncthing (very few accesses to local files that could evict the SQLite DB and WAL from the memory cache).
In think you’re referring to vacuum?
I’m not sure of the right term. VACUUM is usually a manual one-shot operation (PostgreSQL does VACUUM automatically if you set it up but calls the process autovacuum).
Checkpoints normally describe events where the modifications in the WAL are applied to the databases. They are needed to alleviate the performance problems on reads where both the DB and the WAL must be read to get at the data I mentionned.
I assumed from earlier discussions that when needFiles didn’t change in my case it was because checkpoints were tried and blocked by reads.
Checkpoints in sqlite never block readers (for better or worse). So my bet is on vacuum
I’m not sure I follow : until now I’ve only seen mentions of readers blocking checkpoints backed by SQLite’s documentation. The other way around (checkpoints blocking readers) was not a subject ?
My understanding was that the theory being discussed was that Syncthing inadvertedly blocked the checkpoints it was trying to run by not stopping reads. I see that #10554 mentions both vacuum and checkpoints so I didn’t pay enough attention.
VACUUM and checkpoints probably shouldn’t be treated the same way as they do fundamentally different things and may not have the same effects on concurrent accesses.
In fact my opinion is that unless you have a full understanding of what goes on globally you should avoid to try to manipulate vacuum and checkpoints and let the database handle this for you until it reached its limit (maybe it happened during v2 development, I don’t know).
VACUUM should be left alone until you are really sure you need it because it is only useful if your database grew too much compared to the data it has to store (or fragmented the data too much on rotating storage). On most installations of Syncthing I assume the content in a folder is probably growing, not shrinking so VACUUMing the database is probably a wasted effort. Maybe a one-time VACUUM when starting Syncthing could be beneficial but I wouldn’t make it mandatory as it could slow down startup significantly on large DBs. If you want to use the incremenal vacuum pragma this is even trickier, the number of pages to recover and the frequency should be dynamic to adapt to the actual DB usage. One value and frequency will never work for everyone, it will either be too low to be effective on large systems that have the capacity to put more effort into it and it will be too large for small systems with few files/activity and cause trouble.
Apparently checkpoints should be triggered automatically by SQLite and if the right conditions are met a WAL truncate happens automatically so I would refrain from triggering it but instead focus on making sure there are time windows where SQLite can effectively do its own thing. It is probably needed to do so if you want to control the checkpoints anyway as the truncate can’t happen unless there is no writer and no reader actively trying to access old data. This is a limitation of SQLite that might be tricky to work around : PostgreSQL has zero problem checkpointing with heavy concurrent write and read accesses but it is used heavily in this context.
SQLite should probably not be used with concurrent accesses as it was not designed to be efficient at all when there are concurrent accesses. If you focus on the symptoms and try to alleviate them I fear you will only accumulate bandaids over the real cause of the problem. I’ve no idea how Syncthing deals with DB accesses but from past discussions they happen probably concurrently from several Go threads (is routines the right Go term ?). I would probably force them to pass through a single point and completely forbid direct DB access to separate processes/threads/routines. Then you will have a component where you can cleanly adapt the way you access the DB.
I feel this is very wrong. PASSIVE (incl. background auto) - never, that’s right.
All other levels can harm / block both readers and writers. Was reading a lot about this when exploring that issue about SQLITE_BUSY, and almost 100% sure on it now.
Actually, many sources just plain say “never do RESTART on regular production”. A bit too strong as of my taste, but to get the idea.
Unfortunately, official doc is not very verbose here on this. But it states clearly only on PASSIVE. On all others clarity is missing and the result is I just checked the code that there is an exclusive lock there in all other modes at least once, and yes it can block just anything it likes.
… well, was reading a lot on this again right now, to find the exact proofs, and still unsure.
In the SQLite code, everything except PASSIVE, blocks WAL. It is just like
if( eMode!=SQLITE_CHECKPOINT_PASSIVE ){
...
rc = walBusyLock(pWal, xBusy, pBusyArg, WAL_READ_LOCK(0), 1);
it is doing this to make some commits and promised “for a short time”, but I am just not sure how short is that time in respect to these huge WALs.
I am just unsure. I only got readers BUSY with these in both my test app and vanilla Syncthing, so that is my bet.
Your bet on Vacuum is also valid. That’s a lot easier explanation anyway. So yes better to discard my considerations here.
(also an advice that in all the commits about intcremental vacuums etc, just add the logs before and after)
UPD: what escapes me in vacuum theory, why so long? DB is mostly unchanged. Pages are mostly packed as they were. Does it have page registry or whatever, and fill ratios, just to skip all the thing most of the time like any normal DB? If yes, it is not vacuum. If no, who knows, yes maybe.
… it may as well just hang on any write, and then, on of the mutexes on s.updateLock etc, for example, in GetIndexID path.
Maybe really something like this is a best bet here.
Generally the checkpoint types higher than PASSIVE are blocked by readers/writers. But that doesn’t mean that you’re wrong. There seems to be a lock involved around WAL access during a checkpoint that might block readers for a very brief amount of time which in turn may trigger SQLITE_BUSY if you’re unlucky enough.
@gyver we need to intervene here. It would be cool if sqlite would handle this on its own, but with enough load on the DB you’ll always end up with starvation. The auto checkpoints are passive by default and even if we’d pick a different mode, the result would be the same. Sqlite can’t shrink the WAL file as long as there’s a transaction open.
Conceptually “mostly unchanged” is unfortunately not much different than “completely rewritten” when your goal is to maintain consistency and consistency is one of the main pillars of transactional databases.
For full-fledged transactional databases they work very hard at making it so that there’s a difference (using fine-grained locking, multi-version support or any other technique minimizing conflicts between concurrent accesses). SQLite isn’t such a database…
According to the documentation, to free up space SQLite incremental vacuum moves data present in the pages at the end of the db file to free pages in the middle of it before truncating the db file.
From the top of my head to do so it first must make sure that no one (reader or writer) will access :
- the data it is about to move,
- the references to the location of the data it is about to move.
- then it can move the data and the references.
It must most probably write the modifications it is about to do at least in part in the WAL so that if a crash occurs it can recover properly.
So not only an incremental vacuum must block readers but it needs to write to the WAL. Any write to the WAL can’t be relied on until it is properly synced to storage (which usually means that most of the writes pending to the same devices involved must be committed to disk at the same time).
Even on SSDs fsync can be very costly as some have large RAM buffers hiding slow TLC/QLC and emptying these buffers (which fsync must do) can take quite some time (in the hundred of milliseconds or even worse on the cheapest and overused models).
100% agreed. That’s one of the things I’m addressing in fix(db): WAL friendly vacuum by bt90 · Pull Request #10556 · syncthing/syncthing · GitHub
A vacuum run only ever happens if at least 10% of the database consists of free pages. This would effectively prevent the vacuum from running in your case until the initial scan is finished.