Database abstraction layer

calmh · October 21, 2019, 7:30am

I found myself wanting the database abstraction layer by @imsodin today because I want to test badger instead of leveldb. (For reasons, mostly that the leveldb implementation is not seeing too much action on issues and such.) But that PR is a bit outdated and the interface it declares isn’t optimal for integrating something like badger. I’d like to propose a tweaked interface and fixup that PR…

Reader is what we can do with a read only transaction, but also with a read-write transaction so it becomes its own interface.

type Reader interface {
    Get(key []byte) ([]byte, error)
    Has(key []byte) (bool, error)
    NewIterator(prefix []byte) Iterator
}

The ReadTransaction is that plus a Release that must be called. All operations (even Get) need to happen in a transaction. Leveldb doesn’t enforce this, but Badger does. In our current code we generally take a read transaction (snapshot) anyway.

type ReadTransaction interface {
    Reader

    Release()
}

The WriteTransaction adds write stuff and Commit. It’s fine to Release (== discard) after Commiting so that one can always be deferred, but if commit is successful then release is a no-op.

The transaction here might be somewhat best effort. This stuff is shaky as is in leveldb, and badger has its own wrinkles – a transaction might grow “too large” and needs to be committed and reopened. I think we should handle as much of that as possible in the implementation so our “user” code doesn’t need to know this.

type WriteTransaction interface {
    Reader

    Put(key, val []byte) error
    Delete(key []byte) error
    Release()
    Commit() error
}

The Iterator interface is simplified. We take a prefix instead of a range, because that’s the sort of scans we do. This can become a range iterator in the backend, or it can be just a seek plus an end check implemented in our wrapper type.

type Iterator interface {
    Next() bool
    Key() []byte
    Value() []byte
    Error() error
    Release()
}

Finally the top level database type and functions to inspect returned errors.

type Backend interface {
    NewReadTransaction() ReadTransaction
    NewWriteTransaction() (WriteTransaction, error)
    Close() error
    IsNotFound(err error) bool
}

No implementation specific types in this interface.

No batches (rwTran can be a batch when appropriate).

Typical iterator usage:

ro := db.NewROTransaction()
defer ro.Release()

it := ro.NewIterator(nil)
defer it.Release()

for it.Next() {
  // it.Key() and it.Value() are valid
}
if it.Error() != nil {
  // iteration failed due to database error
}

It’s debatable whether to return errors for new transactions and iterators. The underlying implementations differ on this. On balance, I think it’s reasonable for NewWriteTransaction() to return an error (it probably does writes as part of this call). The read transaction can return the error in its Get() method, which should anyway be checked. The iterator can return false on Next() and return the error in Error() and it will be handled as part of that required error checking anyway.

AudriusButkevicius · October 21, 2019, 7:58am

Looks good.

To be honest, I’d plug errors everywhere, even in NewReadTransaction. It’s easier to return nil than to latter have erroredTransaction{} that returns error on every call.

AudriusButkevicius · October 21, 2019, 8:03am

Also WriteTransaction should encompass ReadTransaction rather than Reader

AudriusButkevicius · October 21, 2019, 8:04am

Also, the IsError might be an annoyance, having to push down the db handle everywhere just for error checking on a transaction.

calmh · October 21, 2019, 8:06am

Yes that one is a conundrum. The thing is though that only the implementation knows what constitutes a “not found”. So it needs to come from the implementation somehow.

Or, we modify the Get() signature to also return a bool.

Or, we modify the Get() contract so that returning nil, nil means not found. But that seems very fragile and not great.

AudriusButkevicius · October 21, 2019, 8:17am

Also, all reads are transactional. I wonder if that is not an overkill in terms of performance. I don’t always care about the transaction if my read is supposed to be atomic.

AudriusButkevicius · October 21, 2019, 8:22am

I’d say we could return DbError that encompasses error, and has IsNotFound, IsCorrupt and other future cruft.

Yet perhaps pushing the handle down is ok.

AudriusButkevicius · October 21, 2019, 8:22am

Same could be said about writes

calmh · October 21, 2019, 8:28am

The transaction stuff is a question. Badger for example doesn’t support operations outside of transactions. We can paper over that by having the operations directly on Backend and having those do the needful in terms of opening a micro-transaction. I don’t think we do too many dirty reads today and, I think, no dirty writes. (Apart from all the writes in fact being dirty in the end because they are just batches, which is part of the motivation for this.)

calmh · October 21, 2019, 8:44am

Okay, we use that functionality today, so lets have it in the interface to start with.

imsodin · October 21, 2019, 9:53am

The interfaces look good to me.

I was also previously thinking about a refactor to also make the FileSet iterators transactional, such that pulling can happen in one transaction. That would require returning (“higher-level”) iterators instead of taking iterator functions there. I think such a change should be possible on top of this proposed low-level interface.

calmh · October 21, 2019, 9:56am

Yeah. The interface changed slightly:

github.com

calmh/syncthing/blob/dbastr2/lib/db/backend/backend.go

package backend

type Reader interface {
	Get(key []byte) ([]byte, error)
	NewPrefixIterator(prefix []byte) (Iterator, error)
	NewRangeIterator(first, last []byte) (Iterator, error)
}

type Writer interface {
	Put(key, val []byte) error
	Delete(key []byte) error
}

type ReadTransaction interface {
	Reader
	Release()
}

type WriteTransaction interface {
	ReadTransaction

This file has been truncated. show original

I delegated the responsibility for marking up errors as “not found” or “closed” to the implementation, so that there can be package scoped functions to interrogate them…

calmh · October 21, 2019, 4:00pm

That stuff was all good, and I am now running on Badger myself for fun. But of course this affected error propagation so that’s the swamp I’m in now.