The folder numbers are derived from the primary key in the main DB. But it would be cool to have the folder name as an ASCII-fied suffix in the filename. e.g. folder.0001.ceqta-nnvyd.db
Do you mean the folder ID? The problem is that you are allowed to specify custom IDs, which means that people can put characters in there that aren’t supported by the OS or filesystem .
Yeah it’s a can of worms. Autogenerated ones are fine, but you’d have to hunt others through a full UTF8 transliteration to be sure…
Yeah, ../../../../../etc/passwd
and \\192.168.0.1\C$\WINDOWS\SYSTEM32
are totally valid folder IDs.
Could we log the folder → DB mapping during startup? My only concern is that we might want to know which folder a DB belongs to while providing support and I don’t see average users doing sqlite queries.
Perhaps even have the mapping between folder and DB available in the GUI?
Just my two cents, but I doubt that this so much of a concern that we should add more clutter to our UI.
What about a DB for each remote device rather than by folder? St then considers that device and it’s folders as a single entity. The device’s ID isn’t likely to change, so maybe take the first group of characters from the ID as the DBs name.
Sometimes when I think St is playing up or needs a clean out, I will often delete the v1 index folder or detach a folder from a remote device to let the indexing clear out the DB, either way is time consuming. If the v2 DB was against each remote device, this would allow me to delete that DB and it can reindex just it’s folders. Thus saving time.
Folder state is global across all devices (i.e., list of blocks is shared).
Another thing that came up to my mind, if someone nukes main.db but leaves the folder dbs around, you might have a bad time with indices all mixed up.
We probably want to have a random ascii slug attached to each folder, and then name the dbs after the slug, in case someone nukes not the whole thing.
Isn’t the same true for the old database though? I mean, if someone goes into index-v0.14.0.db
and starts deleting single LDB files willy-nilly, this is also going to wreck havoc, isn’t it?
+1 for a little bit of tamper proofing
I’m not saying it doesn’t work, but the time @calmh is putting into this is mostly to get rid of the flaky corner cases we’re seeing with the old database. So there’s hope that no one will need to regularly nuke their DB after v2 goes live.
I ran a couple of performance tests to get a feel for the actual difference against the v1 database. One of them is similar to the worst case setup tried here, I have four devices with four folders, identical contents and scanned but never talked to each other (this is with fakefs). Start them all and observe what happens until they are all in sync with each other.
Blue line is Go heap size, orange line is peak RSS (there is no instant RSS I could access in the performance counters), green is database size on disk.
Memory usage here is a bit inflated in both cases as the filesystems are fakefs – all file metadata in memory. However, some takeaways:
- v2 (sqlite) takes about twice as long to get initially in sync as v1. This is mostly because the calculateGlobal function for each file takes longer, something that could potentially be optimised.
- the v2 database is a bit over twice as large in steady state (1520 MiB vs 615 MiB)
- however, peak database size is twice as large in v1 compared to v2, is 3106 MiB vs 1520 MiB
- peak RSS is 35% higher in v1 vs v2, 1350 MiB vs 994 MiB
So, apart from the actual desired improvements with SQLite (clarity and correctness) it’s also better on peak disk space and peak memory usage in this test.
Here’s my latest shot at greatness. There’s no between-betas migration for this one, if you’re coming from a previous beta this is a full reset and rescan.
Could we use triggers for calculating global stuff on insert instead?
The logic is a bit too involved to do in pure SQL, we need to parse the version for vector ordering and apply conflict resolution and stuff. So it kinda by necessity becomes a select-all-items-with-that-name, do some amount of processing, update-items-with-appropriate-flags
what am I looking for?
Got a link to the location in the code?
I’m on mobile but search for recalcGlobal
Zombies I noticed that when 5WDSW renames the file, the operation succeeds on both sides but the new name is not even mentioned in its log (look for “renamed-by” and “zombie”).
I probably should stop testing multiple instances on one host, as it seems similar problems do not occur otherwise.
new shorter logs: 3-5WDSW.log (17.4 KB) 3-WTXPS.log (17.9 KB)