Delta index implementation

calmh · July 12, 2016, 10:51am

Since I’m not coding this week I figured I could write up an actual spec proposal for how I expect the delta indexes thing to work, post 0.14, and then implement it when I’m back.

Please check my thinking etc:

canton7 · July 12, 2016, 10:58am

To ensure index consistency in case of a disconnection during an index exchange, index entries must be sent in order of increasing local version

Would you mind expanding on that?

scienmind · July 12, 2016, 3:46pm

Very welcomed development, it has been on the top of my wishlist for a long time now

One question though: Is local_version defined on a per-file basis, or per-folder?

In case it’s the latter, seems like a single file change will cause full index exchange for a given folder (hence not solving the issue with huge repos)?

AudriusButkevicius · July 12, 2016, 3:47pm

It’s per file.

AudriusButkevicius · July 12, 2016, 3:59pm

As we connect to X, X says he has seen highest version 100. We scan our own index and send everything that is above 100, but we’d have to send it ordered by the local version. We cannot send 110 and then 105, because if we disconnect, this would mean on next connect X would say he has seen 110, but in reality missed 105 and others in between 100 and 110.

This potentially means loadig stuff into memory and sorting, or performing multiple passes of the index.

canton7 · July 12, 2016, 4:06pm

Aah, I follow, of course.

It would be very nice if we didn’t have to send updates in ascending order. Perhaps even at the cost of making delta index updates atomic (so if there’s a disconnection halfway through a delta index update, the delta index update is discarded)?

xduugu · July 12, 2016, 4:30pm

Will a delta update always be the best choice or could there be cases where a full index update should be preferred since the difference between the two local_versions is too big?

scienmind · July 12, 2016, 5:05pm

Another thing to be mindful about is that files might be updated during the sorting process (which may take some time). I should be handled properly to avoid false sorting results.

AudriusButkevicius · July 12, 2016, 10:38pm

Comments.

index_id can be int64 max_local_version should be uint64 and we should think about how we deal with wrap arounds.

fulstop · July 13, 2016, 8:49am

Can’t the index always be sorted ?

AudriusButkevicius · July 13, 2016, 9:29am

No, because we rely on it being sorted alphabetically for much more frequent operations.

fulstop · July 13, 2016, 11:49am

Ok, so why not maintain an updated sorted-index of the index ?

AudriusButkevicius · July 13, 2016, 12:01pm

That grows what we store on disk, not by much though, and potentially will cause disk thrashing as we jump around to different keys in the db.

I have a gut feeling that multiple sequential passes might be better.

We also know how many entries we’ll have to send, and therefore know how much ram that will take and can use a different strategy based on that (sort in memory or sort on disk or do multiple passes over the data set while we sort in memory)

fulstop · July 13, 2016, 12:26pm

Understood, you now best how syncthing works. It was just my 2cts for the brainstorming.

canton7 · July 13, 2016, 9:15pm

WWait, there’s something I’m still not understanding…

Index update messages are single messages: one message holds all of the index updates. BEP requires a reliable transport, so messages can’t be partially delivered. So you’re not going to get half of an index update message: you’ll get the whole lot, or nothing.

So why is thought being given to disconnections on the middle of partial index updates?

Is the intention to send multiple smaller index update messages, rather than one large one? If so, it would probably be good to call that put in the proposal.

canton7 · July 13, 2016, 10:09pm

Hmm, if you don’t require that versions within an index update message are sorted, only that all versions in an index update message must have higher than all versions in the previous message, you can save yourself a lot of time sorting.

E.g. You probably know the lowest and highest versions you’re going to transmit before you start scanning the database. Create evenly-spaced buckets within that range, then place versions into those buckets as you do a sequential database scan, but don’t sort the buckets. Each bucket then becomes an index update message, or is split if it is too large.

A bit like quicksort really, but with multiple pivots and only one pass.

AudriusButkevicius · July 14, 2016, 7:38am

We already split index messages into multiple smaller messages, limited to some specific size.

AudriusButkevicius · July 14, 2016, 7:39am

That still requires to load everything into ram though.

calmh · July 14, 2016, 8:29am

Some quick comments here as I’m on mobile… Index sending is already split into multiple messages (index + index updates) as a single message would be too large. A single message must fit in RAM and the index could be potentially gigabytes of data.

Currently the index sender takes a database snapshot (a transaction essentially) so the data it sends is consistent. Things may change during the transmission, but that will be picked up on the next pass and sent as an index update.

Keeping an index sorted by local version is a possibility, but I expect it’s not going to be worth it since that increases the amount of work and disk access we do all the time, while in most cases index updates are small and trivially sorted in memory. The only common exception is when sending a full index to a new device, which will happen approximately only once with this implemented…

I think we can mostly just say that the local version doesn’t wrap or isn’t allowed to wrap. Consider that an implementation using UNIX nanoseconds won’t wrap until the next hundred years or so and few people will actually have a billion database updates per second for the next hundred years. If you do run into that situation it’s time to wipe the index and start over.

AudriusButkevicius · July 14, 2016, 8:34am

Ah its the index id that starts random, not the versions, in which case yes, wrap around is unlikely any time soon.