Bahaviour of st on sqlite files in various scenarios

graboluk · October 20, 2015, 5:08am

I’m writing a small python program which is gonna use st to syncronize sqlite data across devices.

I have some questions. I run st along with syncthing-inotify. Suppose we have an sqlite file db.sqlite in ~/Sync.

if I open db.sqlite for read and write then will st sync it to other devices while it is open?
if I open db.sqlite for reading only then will st sync it to other devices while it is open?

In case 2) is “no”:
3) If I open db.sqlite for reading and writing then will st sync it to other devices after I close it?

I suppose 3) is obviously yes, but since I don’t know how sqlite and inotify work I prefer to ask.

calmh · October 20, 2015, 6:08am

Number three should be an obvious yes. Numbers one and two depends. If you’re on Windows, they may be locked and we can’t open/sync them. On Unixes, they won’t be locked so Syncthing will at least try to sync the files. That probably works fine if it’s just open for reading. However syncing the shared memory files seldom works well, so you’ll probably have the remote side get stuck in the “no available source device” dance when the data returned doesn’t match the expected hashes if you have the database open for writes. Once the database is closed and rescanned it’ll clear up anyhow.

graboluk · October 20, 2015, 8:53pm

I’m on debian. Let me describe what I’m worried about. Say we have two devices A and B, and corresponding instances P(A) and P(B) of a python program P.

P(A) will write only to ~/Sync/A.sqlite, and P(B) will write only to ~/Sync/B.sqlite. However, P(A) can open both A.sqlite and B.sqlite for reading. Similarly P(B).

What I’m worried about is that P(A) opens r/w A.sqlite, does some crazy stuff to it, syncthing syncronizes it, and then P(B) tries to read it. From what you wrote, am I right that in this situation P(B) might encounter garbage? Or is it always going to be a legal sqlite file?

If it’s always a legal sqlite file and furthermore syncthing will eventually get the correct version of it over to B then I’m fine.

But if it’s not then I’m thinking of renaming A.sqlite to A-danger.sqlite, opening it r/w, doing whatever should be done, and then renaming to A.sqlite. This way, if at any point P(B) sees the file A-danger.sqlite, it will know not to open it, and if it sees A.sqlite then it’s guaranteed it’s what it should be. Do you think it’s a good strategy?

Incidentally, it would be nice if syncthing had some interfaces for this type of situations. For example, imitating inotify functionality, so that my program P(A) would register via REST that he’s interested in ~/Sync/B.sqlite, and then syncthing would notify P(A) via localhost:8384/rest/events about all changes to ~/Sync/B.sqlite (as it stands now I think I’ll code P so that P(A) uses inotify to get notified about changes to B.sqlite).

AudriusButkevicius · October 20, 2015, 9:13pm

It depends on how SQLite does it. If it guarantees atomic writes, then when it will be able to sync (which will probably only happen when you stop messing with the database, as it will be changing faster than we are syncing), it should not have garbage, unless SQLite allows partial writes which leave it in an inconsistent state.

Plus SQLite can potentially have multiple files, and we have no guarantees of atomicity between files, so you might still end up with your pants down.

Whatever you say, syncing a database for use is a bad idea, whichever way you look at it.

You should already be able to do what you are after by listening for DownloadProgress events, but I think syncthing is fine, so I don’t think we need any more bells or whistles or interfaces.

People keep shoehorning random stuff into it, and these stories I keep hearing makes me shiver. Databases are not supposed to be synced. There are reasons why MySQL and others are not doing it, and are using a log. Keep a transaction log, and replay the transactions or something.

graboluk · October 20, 2015, 9:43pm

Frankly I don’t see any potential for corruption in the protocol I’ve described. Note that, as I wrote above, only P(A) will ever write to A.sqlite. I think what I described is completely secure (secure meaning here no data corruption). I’m asking questions so that I can do it efficiently as well.

(As a matter of fact P(A) has also its own non-synced A-local.sqlite file, and only after P(A) updates A-local.sqlite it will propagate the change to ~/Sync/A.sqlite. I don’t think it’s important for avoiding data corruption, I’m doing it so that whatever syncthing has to sync stays small, but this means that even if st goes bollocks I’ll be fine)

Thanks for the hint with DownloadProgress. It seems I’ll be able to use ItemFinished for the purpuse I described.

graboluk · October 20, 2015, 9:58pm

Btw, am I right to think that if A.sqlite is 100Mb and I’m gonna change only 1 row which physically happens to be at the beginning of the file then syncthing will have to send almost the whole file?

This is why I’m planning to have A-local.sqlite in non-synced location and having A.sqlite be always a small file. But maybe I’m wrong to assume that syncthing will have to carry over the whole file?

(so, A.sqlite is something like a transaction log actually. I don’t think sqlite offers a replayable transaction log by its own)

calmh · October 21, 2015, 7:31am

No, that’d be weird. SQLite would either do an in place change, and we sync only the changed block, or an append (and then we sync that block).

I haven’t fully parsed all the discussion above, but in general there’s no guarantee that a synced open database file is a valid database file. In particular, while in the open state, I’d expect the database to only be valid for the combination of database file, journal file and shared memory files. There’s no guarantee we’ll sync those at the same time and so on.

canton7 · October 21, 2015, 10:14am

https://www.sqlite.org/howtocorrupt.html is the link to read here. Not all of those are relevant, of course.

AudriusButkevicius · October 21, 2015, 10:37am

1.2, 1.3 and maybe even 2.4 implies to me that this will not work.

When I mean transaction log, I mean your code stores the statements it executes on the database, and the other side just replays the log.

system · November 20, 2015, 10:37am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.