Future stability of the protocol

calmh · May 10, 2016, 2:15pm

I’ve been thinking a little about the protocol, how it’s evolved, and how to continue evolving it while keeping forward and backward compatibility.

Some background

The current protocol consists of messages each made up of a number of fields. The type and order of fields is strict - any deviation from the expected will cause a protocol error. There is a slight flexibility in that unknown fields after the end of the message are ignored. Missing fields at the end of a message can be ignored, if one wants to. Data is encoded using XDR, an old, simple and quite fast method of encoding data.

Advantages of the current approach:

Fast - a message is serialized/deserialized to/from a flat buffer using generated, compiled code.
Simple - it can be hand decoded. There are not many complex rules to keep in mind.
Fields are length prefixed and checked against expected maximums, to avoid large allocations due to malformed input.

Disadvantages:

We can’t reorder fields, add fields inside a message, or change the type of a field without breaking compatibility. The latter happens more often than you think as messages contains structs for many things and changing these counts as adding or removing fields in the middle of a message.

Obviously, before hand rolling this to start with I did take a look at the available options. These are the ones I can remember and why we don’t use them:

Gob - a Go native thing, only exists in Go.
JSON - too verbose and slow to parse. We want a binary protocol.
Protocol Buffers - Generates types that was shit ugly and full of pointers, now a little better with “proto3” syntax but still too many pointers¹. Uses serialization which means it’s comparatively slow.
Cap’nProto, Flatbuffers - No Go implementation?
FlatBuffers - has a frankly nasty API
ASN.1 - quite idiosyncratic and the Go package for it doesn’t support the features we need.

However

The “no changing the fields!” limitation is really quite annoying. The intention has always been to stabilize the protocol, and then there will be no need to change it. However that may not be correct (we might not have though of everything) and it leads to annoying attempts to future proof stuff by throwing in “options” and “flags” fields everywhere in the assumption that we might need them at some point in the future.

Seeing as I still don’t like any of the common alternatives (But feel free to try to prove me wrong here! Actual proof of concept code is good here, not anecdotes or feelings.) I’m thinking of implementing a change to our current serialization scheme to gain this advantage.

Proposal

Add fields tags in front of fields.

By example, lets say we have the message (in Go syntax):

type Message struct {
    flags int32
    name string
    enabled bool
}

In the current XDR encoding this becomes

"flags":
    four bytes
"name":
    four bytes length
    variable length data
    padding if it was not a multiple of four bytes
"enabled":
    one byte data
    three bytes padding

The padding is mandated by the XDR standard and not my favourite thing in the world, but it doesn’t really matter in the grand scheme of things.

I propose that we adopt field tags similar to protobuf. The definition would look something like

type Message struct {
    flags int32   //tag:1
    name string   //tag:2
    enabled bool  //tag:47
}

The tags are allocated manually and can never ever change. When encoded, the tags would be prepended to the field:

"flags":
    four bytes tag <1>
    four bytes value
"name":
    four bytes tag <2>
    four bytes length
    variable length data
    padding if it was not a multiple of four bytes
"enabled":
    four bytes tag <47>
    one byte data
    three bytes padding

When reading, we use the tag values to put the data in the right place. Unknown tags are ignored. When serializing, we could skip fields with zero values completely. Zero values are kept for fields not present in the serialized message.

This is no longer XDR, but our own thing.

Given that, if the padding makes us feel nauseous we could use varint encoding of all numbers (like protobuf), making all numbers take between one and ten bytes depending on size. The above tags would then fit in one byte, the bool would fit in one byte, string lengths usually fit in one byte, and we wouldn’t pad the strings.

"flags":
    one byte tag <1>
    x bytes value
"name":
    one byte tag <2>
    x bytes length
    variable length data
"enabled":
    one byte tag <47>
    one byte data

Or if enabled is false and flags is zero,

"name":
    one byte tag <2>
    x bytes length
    variable length data

Advantages:

It’s better

Disadvantages:

It’s one more unique snowflake protocol, for anyone else having to implement it. Then again, it’s not really rocket science either.

Thoughts?

¹⁾ Pointers, when used unnecessarily, cause allocations which cause garbage collection and drive up both memory and CPU usage.

canton7 · May 10, 2016, 3:30pm

What about msgpack, bson, etc? There’s more overhead than something that’s tagged, but might be worth considering.

Also is there any advantage to rolling our own tagged protocol as well as creating our own serialiser and deserialiser, compared to, say, rolling out own protobuf serialiser and deserialiser? Seems we’re writing the serialiser and deserialiser either way, but adding the effort of designing and documenting the thing, as well as raising the barrier for alternative implementations.

canton7 · May 10, 2016, 3:43pm

Also, I don’t think your proposed protocol is forwards compatible: a client which is old will encounter a tag which it doesn’t recognise, but won’t know how many bytes to skip to reach the next field. If you add that in, you’re well on the way to redesigning protobuf.

canton7 · May 10, 2016, 3:53pm

Thinking about it a bit more, I don’t see the harm in implementing a protobuf serialiser and deserialiser which support a limited subset of the full protobuf spec (the stuff which Syncthing would find useful), and which use Go struct as the definition (or whatever is most convenient), and which could generate . proto files as a nicety to people using other languages.

It wouldn’t be useful for people intending to interact with any old protobuf messages, but it would work for Syncthing.

AudriusButkevicius · May 10, 2016, 4:01pm

I think XDR is already somewhat a special snowflake, as when I tried to find a generator for C#, there was none within the first page of google results.

If you can prove that protobuf3 based implemenatation is reasonably slower or produces a noticable load on GC, then I’d say let’s go with our own magic format, otherwise I’d say let’s go with proto3, as the adoption becomes a breeze.

Also, maybe it’s easier to write all the necessery sugar around flatbuffers to make the API not horrible if it ticks all other boxes?

calmh · May 10, 2016, 4:12pm

The proto3 implementation builds []*FileInfo everywhere we have []FileInfo today. That’s a no go from my point of view, so at the very least that requires separating the types we use internally from the types we send over the wire. Even so, it means deserializing a file index with a million blocks in it is a million separate allocations, as opposed to today’s one allocation. Add reflection on top of that…

I’ve worked quite hard on bringing down the memory usage and allocations in the serialization code as we spend a lot of time running it - to and from the wire as well as the database. It would make me sad to throw all that out and go back to something that apparently doesn’t give a fuck.

Apart from that I agree it would be better.

The flat buffers thing I didn’t look closer at, but it’s so low level that we essentially need to auto generate the flat buffers code unless I misunderstood it…

It takes a couple of hours to even start to prototype each of these things.

calmh · May 10, 2016, 4:25pm

Both seem to act like json and send the actual field names in every message. That triggers even my wastefulness detector.

You’re right, I forgot a few things and it’s almost certainly trickier than I thought.

I don’t really want to develop something complex. Hand rolling the XDR based thing is fine, as XDR is simple enough to do that. Something significantly more complex is a lot of overhead for something that isn’t our core business. I’d much rather use protobuf or something similar. It just needs to not suck.

canton7 · May 10, 2016, 5:10pm

Another option might be to keep using XDR (or similar), but properly version it: keep the definitions from previous versions around, and provide migration-like shims to translate any version to and from the latest. Do proper version negotiation on connection.

AudriusButkevicius · May 10, 2016, 5:12pm

So I understand the downside of proto3, yet I hope they use a byte pool or something of that type to optimize.

I am not suggesting we throw away the amazing work you’ve done, but I question how much of a bottleneck the protocol isso that we are so worried about performance versus ease of consumption and flexibility changing it.

calmh · May 10, 2016, 5:21pm

Fair enough. Lets assume, at least for the sake of the discussion and until proven otherwise, that protobufs allocation strategy isn’t a problem.

I think we would need to separate the types used for serialization from the types used for internal “work” as I don’t think []*Everything is a type I want to work with except under exceptional circumstances. It means that just make()ing the slice isn’t enough, you also need to new() every item in it, items in the middle of a slice could potentially be nil, and so on.

Separating the types like that might not be a bad thing - what we send on the wire shouldn’t necessarily be coupled to our internal data structures.

On the other hand, we also need to serialize stuff into and from the database, so we need the same round trip there. So potentially we’ll end up copying to and from various types that represent the same thing multiple times for the same information.

Assuming that we don’t decide to live with []*Whatever everywhere - but why would we let the code generator of some random serialization protocol dictate what sort of data structures we use, when they are clearly suboptimal?

I really don’t understand the thinking behind the protobufs API. I asked on go-nuts, but noone has answered yet.

capi · May 10, 2016, 5:58pm

While I don’t know anything about the protobuf implementation of Go, I want to add that I do think that it is a very interesting on-the-wire format if you want to target multiple languages.

At work, we switched to using protobufs as the lowest layer of communication between for our entire software stack, we even use it as (typed) communication between our server applications and a web frontend written in TypeScript. It allows for very good integration between a lot of different services, written in a wide variety of languages. It has it’s limitations, though, as the primitve types are quite limited. And you better stay clear of any unsigned types if you want to interact with Java. And you better make everything optional so that you can change it later. But it allows for fast, reliable serialization and we have managed quite some huge changes while still maintaining backward compatibility.

Our pattern for this is like this:

message MessageFrame {
  enum MessageType {
    SOME_MESSAGE = 100;
    SOME_OTHER_MESSAGE = 200;
  }
  required MessageType type = 1;
  optional SomeMessage some_message = 100;
  optional SomeOtherMessage some_other_message = 200;
}
message SomeMessage {
  // ...
}
message SomeOtherMessage {
  // ...
}

I could imagine that having the protocol defined in something more consumable in other languages like XDR could trigger some implementations.

AudriusButkevicius · May 10, 2016, 6:27pm

It seems that it’s just a framing format, with not much usefulness otherwise. I think if the users go the OhAndNowAddUsername way, they might aswell just handle our custom format.

Lastly, we could still be XDR compliant just add an extra FieldInfo field before every existing field to support compatibility… But then do we want to keep all of this extra info in the db too?

All in all, I think you are right, there is nothing out of the box that we could use.

AudriusButkevicius · May 10, 2016, 7:07pm

So I looked at the code generated by Apache Thrift. The generated code is fairly clean, and a small hack to:

https://github.com/apache/thrift/blob/master/compiler/cpp/src/generate/t_go_generator.cc#L367

Could probably give us what we want. Should I exercise the avenue?

calmh · May 10, 2016, 7:10pm

I read through the protobuf spec (https://developers.google.com/protocol-buffers/docs/encoding) and frankly it’s quite neat and sensible. I don’t think it would be impossible to implement a small, fast, code generating protobuf package similar to our current one.

Maybe proto4 will do all we need out of the box. Maybe we can patch the existing protobuf parser to handle slices non stupidly (or understand why they must be the way they are, in the process). Otherwise I think doing the parts we need ourselves look doable.

So I think my final take on this is that we should probably use protobufs if/when we do change, but maybe not the current parser implementation.

calmh · May 10, 2016, 7:15pm

I didn’t know about that one. From a quick google on the phone it looks like a tentacle monster from outer space. Is there a neat binary protocol hiding somewhere in there?

AudriusButkevicius · May 10, 2016, 7:17pm

It’s both RPC via many different transports as well as a wire format. We’d just need to satisfy TTransport and wrap it in

github.com

apache/thrift/blob/aadcf34cbf643b5eff1c771047a05a4c77be9d9e/lib/go/thrift/binary_protocol.go#L45


	strictRead    bool
	strictWrite   bool
	buffer        [64]byte
}


type TBinaryProtocolFactory struct {
	strictRead  bool
	strictWrite bool
}


func NewTBinaryProtocolTransport(t TTransport) *TBinaryProtocol {
	return NewTBinaryProtocol(t, false, true)
}


func NewTBinaryProtocol(t TTransport, strictRead, strictWrite bool) *TBinaryProtocol {
	p := &TBinaryProtocol{origTransport: t, strictRead: strictRead, strictWrite: strictWrite}
	if et, ok := t.(TRichTransport); ok {
		p.trans = et
	} else {
		p.trans = NewTRichTransport(t)
	}

github.com

apache/thrift/blob/master/lib/go/thrift/transport.go#L39


type Flusher interface {
	Flush() (err error)
}


type ReadSizeProvider interface {
	RemainingBytes() (num_bytes uint64)
}


// Encapsulates the I/O layer
type TTransport interface {
	io.ReadWriteCloser
	Flusher
	ReadSizeProvider


	// Opens the transport for communication
	Open() error


	// Returns true if the transport is open
	IsOpen() bool
}

calmh · May 10, 2016, 7:28pm

Try it I guess, if the code looks sensible and there is some adoption behind it. If it’s Apache, at least someone else must be using it.

The fact that there appears to be no spec or even documentation of the actual binary protocol (either of them…) is slightly worrying though.

AudriusButkevicius · May 10, 2016, 8:05pm

So I just tested gogoproto (a fork - specifically the gogofaster compiler), the following protofile:


syntax = "proto2";

package test;

import "github.com/gogo/protobuf/gogoproto/gogo.proto";

option (gogoproto.goproto_getters_all) = false;
option (gogoproto.goproto_stringer_all) = false;

message A {
        optional string Description = 1 [(gogoproto.nullable) = false];
        optional int64 Number = 2 [(gogoproto.nullable) = false];
}

message B {
        repeated A Values = 1 [(gogoproto.nullable) = false];
}

generates the following code:

ype A struct {
        Description string `protobuf:"bytes,1,opt,name=Description" json:"Description"`
        Number      int64  `protobuf:"varint,2,opt,name=Number" json:"Number"`
}

func (m *A) MarshalTo(data []byte) (int, error) {
        var i int
        _ = i
        var l int
        _ = l
        data[i] = 0xa
        i++
        i = encodeVarintExample(data, i, uint64(len(m.Description)))
        i += copy(data[i:], m.Description)
        data[i] = 0x10
        i++
        i = encodeVarintExample(data, i, uint64(m.Number))
        return i, nil
}

...

type B struct {
        Values []A `protobuf:"bytes,1,rep,name=Values" json:"Values"`
}

func (m *B) MarshalTo(data []byte) (int, error) {
        var i int
        _ = i
        var l int
        _ = l
        if len(m.Values) > 0 {
                for _, msg := range m.Values {
                        data[i] = 0xa
                        i++
                        i = encodeVarintExample(data, i, uint64(msg.Size()))
                        n, err := msg.MarshalTo(data[i:])
                        if err != nil {
                                return 0, err
                        }
                        i += n
                }
        }
        return i, nil
}

Running this past python compiler produces valid python code to read these. So what’s missing?

calmh · May 10, 2016, 8:51pm

That looks approximately sane. I was looking at the gogoproto thing but with proto3 syntax and had missed the nullable option.

capi · May 10, 2016, 8:56pm

What’s the sense of something being optional but not nullable? Seems like a contradiction to me.