Syncthing on SQLite -- help test!

Oh and in case I sounded unnecessarily dismissive before, I’m all for producing statically linked musl binaries and whatnot, it’s just that that’s a nonissue if the whole thing is unsound for other reasons before then. :slight_smile: Once we’re at a point that it seems like this is an acceptable future we can absolutely spend all the effort to fine tune the build process to generate the best possible binaries for the most platforms.

5 Likes

Hi @calmh, this is a great effort. I have tried this branch on the iOS side (simulator only for now) and apart from some code changes that were needed on Synctrain (see PR), this seems to be working just fine at first glance.

5 Likes

I don’t have a very low limit (hard limit: 500000, soft limit: 1024). I was already using my own Syncthing build where I ran Syncthing as library when those errors occurred. However, I’m pretty sure my c-bindings run the code where it increases the limit (as the c-bindings simply invoke Start() from lib/syncthing/syncthing.go).

Good to know that it is “just” twice as big. I would say that’s still acceptable. Maybe it is really a good thing that one can then move the database to the SD-card on Android, though.

Thanks. If there’s a commit I can cherry-pick I can test it. I saw that you added actually quite a few more commits on calmh/jb/sqlite. If you want us to test them, let us know. (Although I’m afraid I’ll be a bit busy in the next two weeks.)

For now I’m still testing the v2.0.0-beta.1 version a little bit. So far I have nothing more to add based on the testing.

Just a few thoughts on the idea and motivation in general:

  • “maintaining indexes ourselves has been a source of inconsistencies” - I also wrote a piece of software using a DB that is a mere key-value store (LMDB) and I must say that this is definitely true. It is just hard. I “solved” this with a lot of C++ template magic but reading/understanding this code is not easy, too. So using a more high-level DB seems like a good idea for Syncthing, indeed.
  • “We see a lot of odd crashes in the database layer.” - That doesn’t speak for leveldb but I must say that as a user I rarely ever experienced and problems.
  • “One advantage of a simple key-value database like leveldb is that inserts and changes are incredibly fast. Now, changes entail triggers, multiple indexes to update, and transactions.” - I guess that’s a price we can pay. Testing shows that the SQLite-based version is definitely slower and the DB takes more space but the resource usage still seems to be in reasonable bounds.
  • I’m wondering whether it would be possible to support other databases (e.g. PostgreSQL) or even the old and the new database at the same time. That would probably eliminate concerns about resource usage as one could still use leveldb on low-end devices.
4 Likes

That’s probably caused by syncthing running SQLite in WAL mode. With WAL, SQLite synchronizes by sharing memory between processes (if needed) instead of using POSIX advisory locks (flocks) [which SQLite can also use].

2 Likes

The tip of the jb/sqlite branch is the thing to test, it’s what I’m running, which is also always in the container image ghcr.io/calmh/synchthing:jb-sqlite. I’ll make a new “release” in the morning to have ready binaries for those who prefer that, I just didn’t want to overload everyone with a new release every five minutes.

I don’t usually point to my fork for stuff, preferring people don’t even look at it, but in this case that was the simplest way to produce binaries and containers without mixing it in with the more stable stuff in the regular repo. I don’t want people suddenly getting onto the sqlite train by mistake just because they want the latest nightly or from main or whatever.

This should in fact be quite straightforward, and may be a win for those trying to run it at scale. I’m sure the table creation scripts etc need to be tweaked, but at least all the principles are in place so it would mostly be a question of abstracting some of the query SQL.

3 Likes

I’ve been trying out the binaries in my test environment (under Windows 10 x64). Here is what I’ve noticed so far.

  1. There seem to be quite a large performance impact when scanning new folders, e.g. with this one:

    image

    it takes just 5m 30s with the old database to finish the scan, while with the new one, it takes 21m 30s, i.e. it’s 4 times slower.

  2. The old database produces a lot of information with db,model debug logging enabled, while the new one doesn’t seem to produce anything. I suppose the lack of detailed debug information is expected at this stage?

  3. If you try to cancel scanning, e.g. by removing the folder from Syncthing while the initial scan is still going on, it always takes a long time, and this error is recorded in the log:

    svcutil.go:163: DEBUG: serviceMap[string, <nil>]@0xc00030fa70: Service sendonly/ua7rz-t6zwc@0xc0008ba008 failed to terminate in a timely manner
    
3 Likes

Indeed, thanks for bringing this up. I could see the same and have fixed it, I think.

It’s under sqlite now, but I haven’t had the reason to add much debug output. There are metrics instead. :slight_smile:

I couldn’t immediately reproduce the slow cancel, but if it remains I can look again. There’s a new 2.0.0-beta.2 release coming with the latest fixes.

1 Like

@pixelspark sorry I didn’t merge your changes yet, will get them in during the day.

For performance problems you guys see, a support bundle from while it’s ongoing is helpful.

2 Likes

Thanks! Performance is much better now. Actually, even better than the old database, as it takes just 4m 20s to scan the same files.

I’ve also tested the 386 version (albeit still under Windows x64) and the result is 7m with the old database vs 6m 40s with the new one.

Just two more questions:

  1. The database is a single file now. Is this going to be a problem when running on filesystems that have filesize limitations (e.g. 4 GB maximum file size on FAT32)?

  2. I’m seeing an empty syncthing.lock file created next to syncthing.exe on each run, and it stays there even after shutting Syncthing down. What is the purpose of this file?

4 Likes

In terms of database size, when do we perform a full database vacuum? Because that’s the only point at which the DB file can shrink.

We might also benefit from partial indexes for queries on flags like files.local_flags & {{.FlagLocalGlobal}} != 0

You have the vacuuming in db_service.go, I set it to incremental vacuuming once an hour.

Maybe? If so, don’t do that I guess.

We (mis)used the old database as a lock for having only one Syncthing instance running. SQLite doesn’t work like that, so I added a separate lock file. It should be in the data directory right beside the database, but I guess if that’s where you keep the exe file it could be beside that as well?

Please tell me more, I know nothing of this.

1 Like

Incremental vacuum runs are fine, but they don’t defragment the database file. It would be nice to have CLI flag and/or API endpoint to trigger a full vacuum run.

https://www.sqlite.org/partialindex.html#queries_using_partial_indexes

2 Likes

Yeah, I’m just worried that this has a potential to break some existing installations if the database format change ends up being a part of a standard automatic upgrade.

Understood, just curious if the lock file is supposed to stay there even after shutting Syncthing down. The location is the same for both, yes, but this is the default for Syncthing running in Windows :slight_smile:.

By the way, I’m still getting the svcutil.go:163: DEBUG: serviceMap[string, <nil>]@0xc000285080: Service sendreceive/nd4du-fsaee@0xc00011a008 failed to terminate in a timely manner thing when trying to interrupt a folder scan. Should I provide any specific logs to help with finding the culprit?

Where do we expect people to be running fat32 for the system partition or equivalent? Is this an expected Android thing? It sounds like something that would be for external SD cards or something, but also those could then not be used to store video or anything like that so what’s the situation? I don’t want to get too deep into possibly unlikely what-ifs, especially given “we” don’t really have an Android app… :wink:

The lock file is just a file that we can hold an exclusive lock on, I didn’t add anything to remove it on shutdown.

Can you run with the internal profiler enabled, and take a goroutine profile before and after asking it to cancel?

Having the exe running in %localappdata% surprises me, but on the other hand I don’t know that we have a default as such, so I guess it’s fine. :person_shrugging:

1 Like

I was thinking maybe of some portable installations running from USB sticks and such. I do this myself, although I still don’t use FAT32 as the filesystem there. Just for the record, Windows still defaults to FAT32 when formatting removable drives, at least the smaller ones (just confirmed with an 8GB-sized USB stick).

Now I see what you mean… Putting the exe file there is my own “invention” :innocent:. However, by default, there is no separation between Syncthing data and config in Windows, so both the config, keys, and database are all located under the same path.

2 Likes

As to the database size, much of it is blocks and the index on the blocks:

sqlite> SELECT name, pgsize/1024,  (pgsize-unused)*100.0/pgsize FROM dbstat WHERE aggregate=true order by pgsize desc;
name                           pgsize/1024  (pgsize-unused)*100.0/(pgsize)
-----------------------------  -----------  ------------------------------
blocks_hash_position           735868       88.2333268171483
blocks                         726972       99.0076247724723
blocklists                     482808       90.3920487996781
fileinfos                      238152       97.4371836316722
files                          217344       96.8620920349768
files_device_name              166384       78.7915832595832
files_name_only                161312       79.7674062918444
files_remote_sequence          23692        87.4221289095475
files_blocklist_hash_only      10464        99.0309222393444
sqlite_autoindex_blocklists_1  4900         89.9025031887755
mtimes                         604          90.0023605649834
sqlite_autoindex_mtimes_1      576          73.8147311740451
sqlite_schema                  12           49.5686848958333
folders                        4            3.271484375
sqlite_autoindex_folders_1     4            3.2470703125
devices                        4            3.662109375
sqlite_autoindex_devices_1     4            3.6376953125
indexids                       4            8.8623046875
sqlite_autoindex_indexids_1    4            3.369140625
sqlite_sequence                4            0.5615234375
kv                             4            40.52734375
sqlite_autoindex_kv_1          4            21.38671875
sqlite_stat1                   4            14.3310546875
counts                         4            51.4404296875
sqlite_autoindex_counts_1      4            40.8447265625
schemamigrations               4            4.5166015625

It might be worth it to separate out the blocks to a separate database file, keeping the data for most queries more compact and presumably faster.

6 Likes

One more for today, now with static binaries for Linux again (though gnu and not musl, for you libc connoisseurs out there).

2 Likes

Here are three goroutine profiles:

2 Likes