Introduction to Protocol Buffers

One of the thing I invested some time, has been using Google protocol buffers (PBs) to share data between the backend APIs and the frontend and viceversa. If you don’t know what the PBs are, check out this definition from their official website:

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data — think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Basically, PBs are a way of encoding structured data in an efficient yet extensible format. Most of the existing webapps use JSON as mechanism to exchange data between frontend and backend. Protocol Buffers have several advantages over pure JSON:

it is very easy to describe the data using PB language-neutral syntax; this is an example of protocol buffer:

message Person {

required int32 id = 1;

required string name = 2;

optional string email = 3;

}

the schemas are automatically converted into modules that can be used directly in your code and PBs support several languages out-of-the-box, such as Go, Python, etc.

For instance, Circly frontend is written in typescript, while the backend is written in javascript (Node.js), so using PBs I defined the data schema once and generated the javascript prototypes and the corresponding typescript declaration automatically.

PBs can be used to describe both the messages exchanged by the services and the data stored in the backend databases.

Circly APIs are built with a CRUD model in mind, so most often the messages exchanged between frontend and backend represent entities stored in the database.

No need for validators, parsers, etc. Once a protocol buffer is specified with its schema, we can use pre-existing methods to parse, generate, and validate the instances (messages) described by that protocol buffer.

As you see, there are several good reasons to prefer PBs over pure JSON in our webapps. The reason why I keep saying “pure JSON” is because Protocol Buffers can be used in 2 different flavors:

without any conversion to JSON format: this means that the backend and the frontend will only exchange pure binary data (the PB representation). Using this mechanism we exploit most of the advantages of PBs, including the fact that PB entities are usually smaller in size w.r.t. JSON objects. converting the PBs into JSON before sending it over the HTTP channel. In this situation, the PBs are converted into JSON format before the delivery and converted back into PB objects once they reach destination. With this mechanism, we still have the flexibility of protocol buffers (without the data compression advantages) but we can use them with systems that don’t play well with them.

Given that while building Circly, I already decided that I wanted to use Swagger/OpenAPI which works better with JSON, I decided to go for option 2 (anyway, I am planning to study a mechanism to use option 1 with Swagger/OpenAPI in future).

Let’s see step by step how to implement it in a web application, taking Circly as example.