Database abstraction layer

I found myself wanting the database abstraction layer by @imsodin today because I want to test badger instead of leveldb. (For reasons, mostly that the leveldb implementation is not seeing too much action on issues and such.) But that PR is a bit outdated and the interface it declares isn’t optimal for integrating something like badger. I’d like to propose a tweaked interface and fixup that PR…

Reader is what we can do with a read only transaction, but also with a read-write transaction so it becomes its own interface.

type Reader interface {
    Get(key []byte) ([]byte, error)
    Has(key []byte) (bool, error)
    NewIterator(prefix []byte) Iterator
}

The ReadTransaction is that plus a Release that must be called. All operations (even Get) need to happen in a transaction. Leveldb doesn’t enforce this, but Badger does. In our current code we generally take a read transaction (snapshot) anyway.

type ReadTransaction interface {
    Reader

    Release()
}

The WriteTransaction adds write stuff and Commit. It’s fine to Release (== discard) after Commiting so that one can always be deferred, but if commit is successful then release is a no-op.

The transaction here might be somewhat best effort. This stuff is shaky as is in leveldb, and badger has its own wrinkles – a transaction might grow “too large” and needs to be committed and reopened. I think we should handle as much of that as possible in the implementation so our “user” code doesn’t need to know this.

type WriteTransaction interface {
    Reader

    Put(key, val []byte) error
    Delete(key []byte) error
    Release()
    Commit() error
}

The Iterator interface is simplified. We take a prefix instead of a range, because that’s the sort of scans we do. This can become a range iterator in the backend, or it can be just a seek plus an end check implemented in our wrapper type.

type Iterator interface {
    Next() bool
    Key() []byte
    Value() []byte
    Error() error
    Release()
}

Finally the top level database type and functions to inspect returned errors.

type Backend interface {
    NewReadTransaction() ReadTransaction
    NewWriteTransaction() (WriteTransaction, error)
    Close() error
    IsNotFound(err error) bool
}

No implementation specific types in this interface.

No batches (rwTran can be a batch when appropriate).

Typical iterator usage:

ro := db.NewROTransaction()
defer ro.Release()

it := ro.NewIterator(nil)
defer it.Release()

for it.Next() {
  // it.Key() and it.Value() are valid
}
if it.Error() != nil {
  // iteration failed due to database error
}

It’s debatable whether to return errors for new transactions and iterators. The underlying implementations differ on this. On balance, I think it’s reasonable for NewWriteTransaction() to return an error (it probably does writes as part of this call). The read transaction can return the error in its Get() method, which should anyway be checked. The iterator can return false on Next() and return the error in Error() and it will be handled as part of that required error checking anyway.

Looks good.

To be honest, I’d plug errors everywhere, even in NewReadTransaction. It’s easier to return nil than to latter have erroredTransaction{} that returns error on every call.

1 Like

Also WriteTransaction should encompass ReadTransaction rather than Reader

Also, the IsError might be an annoyance, having to push down the db handle everywhere just for error checking on a transaction.

Yes that one is a conundrum. The thing is though that only the implementation knows what constitutes a “not found”. So it needs to come from the implementation somehow.

Or, we modify the Get() signature to also return a bool.

Or, we modify the Get() contract so that returning nil, nil means not found. But that seems very fragile and not great.

Also, all reads are transactional. I wonder if that is not an overkill in terms of performance. I don’t always care about the transaction if my read is supposed to be atomic.

I’d say we could return DbError that encompasses error, and has IsNotFound, IsCorrupt and other future cruft.

Yet perhaps pushing the handle down is ok.

Same could be said about writes

The transaction stuff is a question. Badger for example doesn’t support operations outside of transactions. We can paper over that by having the operations directly on Backend and having those do the needful in terms of opening a micro-transaction. I don’t think we do too many dirty reads today and, I think, no dirty writes. (Apart from all the writes in fact being dirty in the end because they are just batches, which is part of the motivation for this.)

Okay, we use that functionality today, so lets have it in the interface to start with.

The interfaces look good to me.

I was also previously thinking about a refactor to also make the FileSet iterators transactional, such that pulling can happen in one transaction. That would require returning (“higher-level”) iterators instead of taking iterator functions there. I think such a change should be possible on top of this proposed low-level interface.

1 Like

Yeah. The interface changed slightly:

I delegated the responsibility for marking up errors as “not found” or “closed” to the implementation, so that there can be package scoped functions to interrogate them…

That stuff was all good, and I am now running on Badger myself for fun. But of course this affected error propagation so that’s the swamp I’m in now. :slight_smile: :frowning: