Using these settings: “journal_mode = OFF”, “default_temp_store = MEMORY”, “temp_store = MEMORY”, “foreign_keys = 0”, “synchronous = 0”, “locking_mode = EXCLUSIVE”, we are migrating data from leveldb into sqlite using the following schema, followed by the DELETEs/INSERTs/SELECTs/UPDATEs used by syncthing. Users with large datasets are complaining the process is very slow. What can be done to speed things up?
Included after that question is our DDL in ./internal/db/sqlite/sql/schema/ and most of the SQL statements in our code. It suggested:
Top wins (in order)
Load into a “quiet” schema, then rebuild derived data
Temporarily drop the triggers that maintain indexids and counts, and drop secondary indexes on the hot tables (keep only the primary keys).
Bulk-insert everything.
Recompute indexids and countsset-wise in one or two statements.
Recreate the indexes and triggers at the end.
This alone is commonly a 5–50× speedup on big datasets.
Replace OR REPLACE with UPSERT
REPLACE does DELETE + INSERT, firing AFTER DELETE/INSERT triggers and touching every index twice.
Use INSERT … ON CONFLICT (…) DO UPDATE … so existing rows are updated in-place.
Remove AUTOINCREMENT where you don’t truly need it
INTEGER PRIMARY KEY already auto-increments; AUTOINCREMENT forces extra work and contention on sqlite_sequence.
For files(sequence … PRIMARY KEY AUTOINCREMENT), switch to INTEGER PRIMARY KEY unless you must forbid rowid reuse across the entire lifetime.
Batch and pipeline
One big transaction (or a handful of large chunks), prepared statements, and avoid per-row RETURNING round-trips by chaining the RETURNING into the subsequent insert (example below).
Right-size the engine for bulk load
Before creating any tables in a new DB:
PRAGMA page_size=32768; (or 65536 if your build supports it)
PRAGMA cache_size=-1048576; (≈1 GB cache; tune to free RAM)
PRAGMA mmap_size=2147483648; (2 GiB, or more if available)
You already have temp_store=MEMORY, synchronous=OFF, locking_mode=EXCLUSIVE.
After the load: ANALYZE; PRAGMA optimize; and flip to your normal runtime pragmas.
Add targeted partial indexes for your read queries
Many of your selects filter on flag bits; a few small partial indexes will avoid scanning the large general ones.
Thoughts? I know ChatGPT makes mistakes, and I’m not conversant enough with our schema, or sqlite in general, to make an informed option regarding its feedback.
Just a side note, but I always wonder how many “programmers” just follow AI suggestions blindly and apply them to their code without any additional consideration or even testing .
I maintain that we use autoincrement where we need it and nowhere else.
Not really, I believe. We use it in a couple of places where performance absolutely won’t matter, and the one that could matter (updating file entries) it’s correct imho because we need a new sequence entry (a required use of auto increment…). The mtime update could in fact be rewritten as an update, but there are no triggers involved there and it only kicks in under specific circumstances.
Yeah. This one seemed the most promising to me, but it benchmarks no difference on at least my computer.
If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that changes the automatic ROWID assignment algorithm to prevent the reuse of ROWIDs over the lifetime of the database. In other words, the purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from previously deleted rows.
Yes, precisely that is an essential property. Sequence numbers must be strictly monotonically increasing.
(It is not as essential for folder and device IDs, but it makes it nicer for humans and there are typically <10 inserts in those tables over the lifetime of a Syncthing install…)
My apologies for any frustration caused. I wasn’t suggesting there were issues with our schema or code. I asked to illicit a discussion to gain a better understanding of our schema and design decisions, and maybe squeeze out a smidge more performance, if possible.
But based on everyone’s feedback, it seems the only real takeaway is we can avoid extra roundtrips using WITH. For example:
insertFileStmt, err := txp.Preparex(` INSERT OR REPLACE INTO files (...) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) RETURNING sequence `) if err != nil { return wrap(err, "prepare insert file")} insertFileInfoStmt, err := txp.Preparex(` INSERT INTO fileinfos (sequence, fiprotobuf) VALUES (?, ?) `)
in folderdb_update.go could become
WITH ins AS ( INSERT OR REPLACE INTO files (...) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) RETURNING sequence)INSERT INTO fileinfos (sequence, fiprotobuf)SELECT sequence, ? FROM ins;
I will submit a PR, if that’s of interest. That appears to be the only place we use RETURNING.