I’ve been thinking a little about the protocol, how it’s evolved, and how to continue evolving it while keeping forward and backward compatibility.
Some background
The current protocol consists of messages each made up of a number of fields. The type and order of fields is strict - any deviation from the expected will cause a protocol error. There is a slight flexibility in that unknown fields after the end of the message are ignored. Missing fields at the end of a message can be ignored, if one wants to. Data is encoded using XDR, an old, simple and quite fast method of encoding data.
Advantages of the current approach:
- Fast - a message is serialized/deserialized to/from a flat buffer using generated, compiled code.
- Simple - it can be hand decoded. There are not many complex rules to keep in mind.
- Fields are length prefixed and checked against expected maximums, to avoid large allocations due to malformed input.
Disadvantages:
- We can’t reorder fields, add fields inside a message, or change the type of a field without breaking compatibility. The latter happens more often than you think as messages contains structs for many things and changing these counts as adding or removing fields in the middle of a message.
Obviously, before hand rolling this to start with I did take a look at the available options. These are the ones I can remember and why we don’t use them:
- Gob - a Go native thing, only exists in Go.
- JSON - too verbose and slow to parse. We want a binary protocol.
- Protocol Buffers - Generates types that was shit ugly and full of pointers, now a little better with “proto3” syntax but still too many pointers1. Uses serialization which means it’s comparatively slow.
- Cap’nProto, Flatbuffers - No Go implementation?
- FlatBuffers - has a frankly nasty API
- ASN.1 - quite idiosyncratic and the Go package for it doesn’t support the features we need.
However
The “no changing the fields!” limitation is really quite annoying. The intention has always been to stabilize the protocol, and then there will be no need to change it. However that may not be correct (we might not have though of everything) and it leads to annoying attempts to future proof stuff by throwing in “options” and “flags” fields everywhere in the assumption that we might need them at some point in the future.
Seeing as I still don’t like any of the common alternatives (But feel free to try to prove me wrong here! Actual proof of concept code is good here, not anecdotes or feelings.) I’m thinking of implementing a change to our current serialization scheme to gain this advantage.
Proposal
Add fields tags in front of fields.
By example, lets say we have the message (in Go syntax):
type Message struct {
flags int32
name string
enabled bool
}
In the current XDR encoding this becomes
"flags":
four bytes
"name":
four bytes length
variable length data
padding if it was not a multiple of four bytes
"enabled":
one byte data
three bytes padding
The padding is mandated by the XDR standard and not my favourite thing in the world, but it doesn’t really matter in the grand scheme of things.
I propose that we adopt field tags similar to protobuf. The definition would look something like
type Message struct {
flags int32 //tag:1
name string //tag:2
enabled bool //tag:47
}
The tags are allocated manually and can never ever change. When encoded, the tags would be prepended to the field:
"flags":
four bytes tag <1>
four bytes value
"name":
four bytes tag <2>
four bytes length
variable length data
padding if it was not a multiple of four bytes
"enabled":
four bytes tag <47>
one byte data
three bytes padding
When reading, we use the tag values to put the data in the right place. Unknown tags are ignored. When serializing, we could skip fields with zero values completely. Zero values are kept for fields not present in the serialized message.
This is no longer XDR, but our own thing.
Given that, if the padding makes us feel nauseous we could use varint encoding of all numbers (like protobuf), making all numbers take between one and ten bytes depending on size. The above tags would then fit in one byte, the bool would fit in one byte, string lengths usually fit in one byte, and we wouldn’t pad the strings.
"flags":
one byte tag <1>
x bytes value
"name":
one byte tag <2>
x bytes length
variable length data
"enabled":
one byte tag <47>
one byte data
Or if enabled
is false and flags
is zero,
"name":
one byte tag <2>
x bytes length
variable length data
Advantages:
- It’s better
Disadvantages:
- It’s one more unique snowflake protocol, for anyone else having to implement it. Then again, it’s not really rocket science either.
Thoughts?
1) Pointers, when used unnecessarily, cause allocations which cause garbage collection and drive up both memory and CPU usage.