Delta index implementation

Since I’m not coding this week I figured I could write up an actual spec proposal for how I expect the delta indexes thing to work, post 0.14, and then implement it when I’m back.

Please check my thinking etc:

4 Likes

To ensure index consistency in case of a disconnection during an index exchange, index entries must be sent in order of increasing local version

Would you mind expanding on that?

Very welcomed development, it has been on the top of my wishlist for a long time now :slight_smile:

One question though: Is local_version defined on a per-file basis, or per-folder?

In case it’s the latter, seems like a single file change will cause full index exchange for a given folder (hence not solving the issue with huge repos)?

It’s per file.

1 Like

As we connect to X, X says he has seen highest version 100. We scan our own index and send everything that is above 100, but we’d have to send it ordered by the local version. We cannot send 110 and then 105, because if we disconnect, this would mean on next connect X would say he has seen 110, but in reality missed 105 and others in between 100 and 110.

This potentially means loadig stuff into memory and sorting, or performing multiple passes of the index.

1 Like

Aah, I follow, of course.

It would be very nice if we didn’t have to send updates in ascending order. Perhaps even at the cost of making delta index updates atomic (so if there’s a disconnection halfway through a delta index update, the delta index update is discarded)?

Will a delta update always be the best choice or could there be cases where a full index update should be preferred since the difference between the two local_versions is too big?

Another thing to be mindful about is that files might be updated during the sorting process (which may take some time). I should be handled properly to avoid false sorting results.

Comments.

index_id can be int64 max_local_version should be uint64 and we should think about how we deal with wrap arounds.

1 Like

Can’t the index always be sorted ?

No, because we rely on it being sorted alphabetically for much more frequent operations.

Ok, so why not maintain an updated sorted-index of the index ?

That grows what we store on disk, not by much though, and potentially will cause disk thrashing as we jump around to different keys in the db.

I have a gut feeling that multiple sequential passes might be better.

We also know how many entries we’ll have to send, and therefore know how much ram that will take and can use a different strategy based on that (sort in memory or sort on disk or do multiple passes over the data set while we sort in memory)

Understood, you now best how syncthing works. It was just my 2cts for the brainstorming.

WWait, there’s something I’m still not understanding…

Index update messages are single messages: one message holds all of the index updates. BEP requires a reliable transport, so messages can’t be partially delivered. So you’re not going to get half of an index update message: you’ll get the whole lot, or nothing.

So why is thought being given to disconnections on the middle of partial index updates?

Is the intention to send multiple smaller index update messages, rather than one large one? If so, it would probably be good to call that put in the proposal.

Hmm, if you don’t require that versions within an index update message are sorted, only that all versions in an index update message must have higher than all versions in the previous message, you can save yourself a lot of time sorting.

E.g. You probably know the lowest and highest versions you’re going to transmit before you start scanning the database. Create evenly-spaced buckets within that range, then place versions into those buckets as you do a sequential database scan, but don’t sort the buckets. Each bucket then becomes an index update message, or is split if it is too large.

A bit like quicksort really, but with multiple pivots and only one pass.

We already split index messages into multiple smaller messages, limited to some specific size.

That still requires to load everything into ram though.

Some quick comments here as I’m on mobile… Index sending is already split into multiple messages (index + index updates) as a single message would be too large. A single message must fit in RAM and the index could be potentially gigabytes of data.

Currently the index sender takes a database snapshot (a transaction essentially) so the data it sends is consistent. Things may change during the transmission, but that will be picked up on the next pass and sent as an index update.

Keeping an index sorted by local version is a possibility, but I expect it’s not going to be worth it since that increases the amount of work and disk access we do all the time, while in most cases index updates are small and trivially sorted in memory. The only common exception is when sending a full index to a new device, which will happen approximately only once with this implemented…

I think we can mostly just say that the local version doesn’t wrap or isn’t allowed to wrap. Consider that an implementation using UNIX nanoseconds won’t wrap until the next hundred years or so and few people will actually have a billion database updates per second for the next hundred years. If you do run into that situation it’s time to wipe the index and start over.

1 Like

Ah its the index id that starts random, not the versions, in which case yes, wrap around is unlikely any time soon.

1 Like