Oh and in case I sounded unnecessarily dismissive before, I’m all for producing statically linked musl binaries and whatnot, it’s just that that’s a nonissue if the whole thing is unsound for other reasons before then. Once we’re at a point that it seems like this is an acceptable future we can absolutely spend all the effort to fine tune the build process to generate the best possible binaries for the most platforms.
I don’t have a very low limit (hard limit: 500000, soft limit: 1024). I was already using my own Syncthing build where I ran Syncthing as library when those errors occurred. However, I’m pretty sure my c-bindings run the code where it increases the limit (as the c-bindings simply invoke Start() from lib/syncthing/syncthing.go).
Good to know that it is “just” twice as big. I would say that’s still acceptable. Maybe it is really a good thing that one can then move the database to the SD-card on Android, though.
Thanks. If there’s a commit I can cherry-pick I can test it. I saw that you added actually quite a few more commits on calmh/jb/sqlite. If you want us to test them, let us know. (Although I’m afraid I’ll be a bit busy in the next two weeks.)
For now I’m still testing the v2.0.0-beta.1 version a little bit. So far I have nothing more to add based on the testing.
Just a few thoughts on the idea and motivation in general:
“maintaining indexes ourselves has been a source of inconsistencies” - I also wrote a piece of software using a DB that is a mere key-value store (LMDB) and I must say that this is definitely true. It is just hard. I “solved” this with a lot of C++ template magic but reading/understanding this code is not easy, too. So using a more high-level DB seems like a good idea for Syncthing, indeed.
“We see a lot of odd crashes in the database layer.” - That doesn’t speak for leveldb but I must say that as a user I rarely ever experienced and problems.
“One advantage of a simple key-value database like leveldb is that inserts and changes are incredibly fast. Now, changes entail triggers, multiple indexes to update, and transactions.” - I guess that’s a price we can pay. Testing shows that the SQLite-based version is definitely slower and the DB takes more space but the resource usage still seems to be in reasonable bounds.
I’m wondering whether it would be possible to support other databases (e.g. PostgreSQL) or even the old and the new database at the same time. That would probably eliminate concerns about resource usage as one could still use leveldb on low-end devices.
That’s probably caused by syncthing running SQLite in WAL mode. With WAL, SQLite synchronizes by sharing memory between processes (if needed) instead of using POSIX advisory locks (flocks) [which SQLite can also use].
The tip of the jb/sqlite branch is the thing to test, it’s what I’m running, which is also always in the container image ghcr.io/calmh/synchthing:jb-sqlite. I’ll make a new “release” in the morning to have ready binaries for those who prefer that, I just didn’t want to overload everyone with a new release every five minutes.
I don’t usually point to my fork for stuff, preferring people don’t even look at it, but in this case that was the simplest way to produce binaries and containers without mixing it in with the more stable stuff in the regular repo. I don’t want people suddenly getting onto the sqlite train by mistake just because they want the latest nightly or from main or whatever.
This should in fact be quite straightforward, and may be a win for those trying to run it at scale. I’m sure the table creation scripts etc need to be tweaked, but at least all the principles are in place so it would mostly be a question of abstracting some of the query SQL.
I’ve been trying out the binaries in my test environment (under Windows 10 x64). Here is what I’ve noticed so far.
There seem to be quite a large performance impact when scanning new folders, e.g. with this one:
it takes just 5m 30s with the old database to finish the scan, while with the new one, it takes 21m 30s, i.e. it’s 4 times slower.
The old database produces a lot of information with db,model debug logging enabled, while the new one doesn’t seem to produce anything. I suppose the lack of detailed debug information is expected at this stage?
If you try to cancel scanning, e.g. by removing the folder from Syncthing while the initial scan is still going on, it always takes a long time, and this error is recorded in the log:
svcutil.go:163: DEBUG: serviceMap[string, <nil>]@0xc00030fa70: Service sendonly/ua7rz-t6zwc@0xc0008ba008 failed to terminate in a timely manner
Thanks! Performance is much better now. Actually, even better than the old database, as it takes just 4m 20s to scan the same files.
I’ve also tested the 386 version (albeit still under Windows x64) and the result is 7m with the old database vs 6m 40s with the new one.
Just two more questions:
The database is a single file now. Is this going to be a problem when running on filesystems that have filesize limitations (e.g. 4 GB maximum file size on FAT32)?
I’m seeing an empty syncthing.lock file created next to syncthing.exe on each run, and it stays there even after shutting Syncthing down. What is the purpose of this file?
You have the vacuuming in db_service.go, I set it to incremental vacuuming once an hour.
Maybe? If so, don’t do that I guess.
We (mis)used the old database as a lock for having only one Syncthing instance running. SQLite doesn’t work like that, so I added a separate lock file. It should be in the data directory right beside the database, but I guess if that’s where you keep the exe file it could be beside that as well?
Incremental vacuum runs are fine, but they don’t defragment the database file. It would be nice to have CLI flag and/or API endpoint to trigger a full vacuum run.
Yeah, I’m just worried that this has a potential to break some existing installations if the database format change ends up being a part of a standard automatic upgrade.
Understood, just curious if the lock file is supposed to stay there even after shutting Syncthing down. The location is the same for both, yes, but this is the default for Syncthing running in Windows .
By the way, I’m still getting the svcutil.go:163: DEBUG: serviceMap[string, <nil>]@0xc000285080: Service sendreceive/nd4du-fsaee@0xc00011a008 failed to terminate in a timely manner thing when trying to interrupt a folder scan. Should I provide any specific logs to help with finding the culprit?
Where do we expect people to be running fat32 for the system partition or equivalent? Is this an expected Android thing? It sounds like something that would be for external SD cards or something, but also those could then not be used to store video or anything like that so what’s the situation? I don’t want to get too deep into possibly unlikely what-ifs, especially given “we” don’t really have an Android app…
The lock file is just a file that we can hold an exclusive lock on, I didn’t add anything to remove it on shutdown.
Can you run with the internal profiler enabled, and take a goroutine profile before and after asking it to cancel?
Having the exe running in %localappdata% surprises me, but on the other hand I don’t know that we have a default as such, so I guess it’s fine.
I was thinking maybe of some portable installations running from USB sticks and such. I do this myself, although I still don’t use FAT32 as the filesystem there. Just for the record, Windows still defaults to FAT32 when formatting removable drives, at least the smaller ones (just confirmed with an 8GB-sized USB stick).
Now I see what you mean… Putting the exe file there is my own “invention” . However, by default, there is no separation between Syncthing data and config in Windows, so both the config, keys, and database are all located under the same path.