Syncthing on SQLite -- help test!

Certainly seems like it would be yes. I guess I think most setups spend a fair amount of time idle, but for sure there will be those that don’t.

The truncate is indeed to make sure all the pages in the WAL are written to the main db, and then truncate it (to some nonzero size if desired, currently 64 MiB in our code). It does need to happen from the end, so nothing is allowed to hold a read lock on those pages.

In practice this sounds like we need to periodically grab a super lock and prevent all readers and then run a PRAGMA wal_checkpoint(TRUNCATE)… (Currently we do that every 8h but without the super lock thing.)

We could be at least reasonably smart about it and keep a counter so we do it after every like 5000 write transactions or whatever.

Of course this will easily turn into deadlock palooza

It’s not just a reader. If there is a write after checkpoint started, it can’t be finished, as it doesn’t know how to roll a new WAL file. The data from the WAL gets applied to db, but the file stays as big as it was.

https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md

I assume this is still a branch, and not a mainline feature?

I think some of my real-world setups are like that. Basically, a lot of small-sized folders syncing something all the time. The amount of the synced data is usually quite small, but (at least some of) the folders keep working constantly.

Sounds like it’s been in beta for at least the last five years and still is…

SQLite doesn’t have concurrent writes, so there’s a database-global lock for writers, so those are actually easy to handle.

While that may well be the case, note that “working constantly” from a human point of view is maybe quite different from the database point of view, the latter being “overlapping write and read transactions at literally every point in time always”. A few files changing every few seconds will very much not be that, for example, even though the disks may be churning all the time.

The sqlite docs explain it quite good:

https://www.sqlite.org/wal.html#concurrency

Yup. I wonder if we could solve this in practice by making our read transactions much shorter in time. Currently we do the natural thing: loop over file infos, put them in batches, send them over the wire. This essentially means our read transaction is blocked (kept open) by the time the other side takes to write the file infos. We could make each batch a separate read transaction, so that instead of selects lasting up to 5 seconds or a little more we’d break them down into selects lasting milliseconds and blocking outside of the select while sending. This might give the checkpointer a better fighting chance…

Smaller transactions paired with PRAGMA wal_checkpoint(RESTART) if necessary (see checkpoint starvation):

https://www.sqlite.org/wal.html#avoiding_excessively_large_wal_files

1 Like

Maybe we need to disable auto checkpointing for that:

https://sqlite.org/forum/info/6a66501e4df030ae

But it’s probably for the better to have this managed by a dedicated goroutine. Do a regular checkpoint and if that fails resort to a RESTART checkpoint.

What happens if there are multiple devices exchanging indexes at the very same time? For example, I’ve done a test with 3 devices (actually Syncthing instances on the same computer), all starting with the same folder, pre-populated with files but unshared. I then shared the folder between all three at the same time.

Doing so brought the whole system to a crawl, and this is Ryzen 9950x with 96 GB of RAM, and the database was located on a RAM disk.

The folder itself is the same as I was testing previously today, i.e. on the larger side but nothing too extreme:

image

The situation only stabilised once I limited the index exchange to just two devices at the same time (by pausing the folder on the third one).

Using shorter read transactions and periodic checkpointing solves this for me. Either of them resulted in a reduction of the WAL size, together it’s altogether reasonable in my setup. With a DB size of 2.5 GB I got a WAL size after --debug-reset-delta-idxs of ~2.2 GB previously, now it stays at ~128 MiB.

https://github.com/syncthing/syncthing/pull/10027

2 Likes

This would mean you get conflict resolution and index entries bouncing back and forth for every file and pair of devices, so it’s a bit worse than normal worst case for most setups.

This should also boost our read performance. Nice :+1:

@bt90 BTW I noticed an oddity, I tried PRAGMA wal_checkpoint(TRUNCATE) like we do in the slow periodic, and while it worked it truncates back to zero bytes, regardless of our PRAGMA journal_size_limit = 67108864. I tried moving that pragma to just before the checkpoint as well with no difference. From the docs I think it should have the effect of leaving the WAL at that size, but it appears not to in practice.

Nope, connection pooling issues were the cause. Need to run the journal size limit pragma in the same connection just before the checkpoint.

The reduced WAL size and performance improvements look great :slight_smile:. I’m done for today, but I’ll try to go on with my testing tomorrow.

3 Likes

I’ve tested the improved handling of WAL. During the index exchange phase, the WAL file kept growing up to ~800 MB, then shrinking to <100 MB. The process kept repeating itself until the index exchange was done. This is with a database sized at 1.5 GB.

Just a completely separate question, but is there any particular reason not to continue using db for database debug logging instead of the current sqlite? Is the old db logging used for anything else?

A few notes regarding checkpoint handling that might be worth having a look at.

  1. We don’t check the return values of PRAGMA wal_checkpoint(RESTART):

see Pragma statements supported by SQLite

The wal_checkpoint pragma returns a single row with three integer columns. The first column is usually 0 but will be 1 if a RESTART or FULL or TRUNCATE checkpoint was blocked from completing, for example because another thread or process was actively using the database. In other words, the first column is 0 if the equivalent call to sqlite3_wal_checkpoint_v2() would have returned SQLITE_OK or 1 if the equivalent call would have returned SQLITE_BUSY. The second column is the number of modified pages that have been written to the write-ahead log file. The third column is the number of pages in the write-ahead log file that have been successfully moved back into the database file at the conclusion of the checkpoint. The second and third column are -1 if there is no write-ahead log, for example if this pragma is invoked on a database connection that is not in WAL mode.

  1. we’re still using the auto checkpointer which might interfere with our RESTART checkpoint as mentioned earlier: SQLite User Forum: PRAGMA wal_checkpoint(RESTART) unexpectedly returning SQLITE_BUSY immediately

  2. We could replace the update point logic and check the size of the WAL file instead if that’s not too costly.

This might explain why @terry still has problems with a gargantuan WAL file despite running beta 4.

So, I ignored the return values because I didn’t fully understand them and they didn’t seem actionable :stuck_out_tongue: but I can understand the part about auto checkpoints blocking manual checkpoints and neither progressing, except I thought auto checkpoints happened as part of the transaction commit and that is guaranteed to not be at the same time as our manual checkpoint…

Turning off auto checkpoints seems like a net loss since then we’re down to only managing it ourselves, and for small setups I suspect we’re guaranteed to let the wal grow larger than it needs to be.

We can try it though and see what happens, it should be a fairly easy change.