⁉ Talk to me about protocol buffers

Do you speak protocol buffers? Lets talk about which of the following alternatives is the least surprising one to encounter. The context is that we have a stream of messages over a network connection. The messages are of varying size and varying type.

Option A – Binary Packed Header

The following is sent over the wire:

  • An uint32 containing a message type identifier and some flags (like whether or not the following message is compressed). You need to read this and do some bit slicing to understand which type of message to expect next.
  • An uint32 containing the message length.
  • If the message was compressed, an uint32 with the uncompressed length.
  • The message in protocol buffer format (potentially compressed).

Option B – Protobuf Header

The following is sent over the wire:

  • An uint32 containing a header length.
  • A header message in protocol buffer format that describes the type, length and eventual compressedness of the following message. This is essentially the same data as in the header word in option A, but you get it in protocol buffer format.
  • The message in protocol buffer format (potentially compressed).

Option C – Protobuf Envelope

The following is sent over the wire:

  • An uint32 containing a message length.
  • A wrapper message in protocol buffer format that contains the message type, compressedness and data in a bytes field. You unmarshal the bytes field again, possibly after decompression, to get the real message.

Option D – Protobuf Union Type

The following is sent over the wire:

  • An uint32 containing a message length.
  • A wrapper message in protocol buffer format that has a number of optional fields, one for each possible message type, with exactly one of them set. For compressed messages it includes a bytes field that must be decompressed and then protobuf-unmarshalled again like option C.

Other

What other options are there?

Intentionally not doing this as a poll because I want you to explain which is the best and most idiomatic, and why. :slight_smile: Bonus points for pointing to existing, well deployed things that are not obviously :wtf:.

Whatever we choose here we will hopefully live with for a while so lets do it right.

My gut feeling is that things like compression should be independent of protobuf.

Other than that, I had the same question myself when I last evaluatd protobuf.

Meaning that it should be outside the outermost protobuf message?

Yeah: a small header with necessary stuff, then a byte stream which may be a protobuf message, or may be a compressed stream containing a protobuf message (which may in turn have its own header)

I’ve not strong feelings but I’ve used Option A in the past and it works fine. My case has been slightly different because I’ve been writing these streams to files instead of network. One of the nice things is your frame-decoding is quite straight forward compared to other encodings.

  • Can the stream ever get out of sync or would the transport layer take care of that? If so I’d suggest adding some magic (uint32?) as part of the header to re-sync/random search. This might not be relevant in the over-the-network scenario.

  • If you ever want to adjust this framing protocol on the fly without having clients to restart (probably not an issue), you might want to have a frame-version, maybe as part of your message-type or flags.

  • For easier processing you might want to keep your frame-header fixed length, ie: always throw in compressed/uncompressed fields and let your flags indicate compression algorithm. One of the algorithms can be not-compressed.

1 Like

Okay so that didn’t result in as much input as I’d hoped, even with spreading the link on Twitter… Then I’d probably like to propose something closer to our original approach (option A), basically sending:

  • uint8 message_type (we can have 256 message types)
  • uint8 flags (currently only one flag available, lz4_compressed)
  • uint32 message_length
  • uint32 uncompressed_length (when lz4_compressed=1, and this is included in message_length)
  • … the message in protobuf format

I don’t want to include the serialized message in another protobuf message by default as it’s almost guaranteed to result in an unnecessary buffer copy + allocation per message.

We can make the above fairly flexible and future proof by simply mandating that clients silently ignore messages with unknown type or flags - adding messages then becomes roughly the same as adding attributes inside a protobuf messages… You can do it, but must be prepared for the fact that older clients won’t pick up on them.

However I’m slightly torn between this and option D, with the latter having the advantage that pretty much everything lives in protobuf format with no special stuff outside of it other than a length word.

If this was a poll, I’d vote C. If you care about buffer copy then I’d say B is the right option. Having to do bit shifts and all that magic is unnecessery if protobuf deals with it for you, plus gives you freedom to change stuff easier.

I didn’t wanted to answer because my approach could look a bit legacy: since I work in the telco market, I have always a theoretical approach.

For my perspective, a frame should contain first a session layer, then a presentation layer, and then application layer.

So in order you should have:

  1. An uint32 containing a header length.
  2. A header message in protocol buffer format that describes the type, length and eventual compressedness of the following message.
  3. An uint32 containing a message length. 3-1.“If the message was compressed, an uint32 with the uncompressed length.”
  4. The message in protocol buffer format (potentially compressed).

Of course, I know this approach is more similar to the “classic” way of doing telecommunications in the mobile network, like in SS7 or MTP/STP, so I am not sure it helps.

Hope it does.

That actually ended up being what we implemented, pretty much. Anyway, this is now done so the point is mostly moot now. :slight_smile:

https://docs.syncthing.net/specs/bep-v1.html#post-authentication-messages

1 Like