What the heck is ProtoBuf!
Serialization Of Information and Deserialization Of Data
ProtoBuf is language-neutral, platform-neutral, extensible mechanism for serializing structured data in a forward-compatible and backward-compatible way. In common term it is a data encoding format just like JSON with some advantages and disadvantages. ProtoBuf, more elaborately Protocol Buffer is develop by google and provide data serialization and deserialization at very fast speed also it generate much smaller payload as compared to JSON and other data interchange protocols like XML, YAML, etc.
So the question arises that why we need another data interchange format when we have something like JSON which is used at a very-very large scale in most of the big companies. The answer is that in normal scenarios JSON fits perfectly but if in some scenarios we need more efficient ways to do data serialization and deserialization then protobuf is the best choice.
You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. It support most of the famous languages like java, python golang, c++, etc.
The basic concept of protobuf are shown in the below figure
Protocol buffers are the most commonly-used data format at Google. They are used extensively in inter-server communications as well as for archival storage of data on disk. Protocol buffer messages and services are described by engineer-authored .proto files. The following shows an example message:
message Person {
string name = 1;
int32 id = 2;
string email = 3;
optional string gender = 4;
}
To use protobuf in your project you need three things first is protoc compiler
second is .proto
file that works as schema and last is the code generator for your choice of programming language, you can check list of support languages. Google maintains code generator for supported languages.
Use your favorite programming language to compile this .proto file. For this tutorial I'm gonna use Golang programming language and create a builder class that you can use with code to instantiate a instance of class. Download protoc compiler according to your operating system from hear and downlaod code generator for golang from list the list of supported language. Now we are all set to dive into the coding part of this tutorial.
Create a golang project in you favorite text editor by typing go mod init learn_protobuf
and create a main.go file. This is not a golang tutorial and that is why I'm not gonna example too much about golang. Create a example.proto
file and write schema:
syntax = "proto3";
package main;
option go_package = "./protobuf";
message Person {
string name = 1;
int32 id = 2;
string email = 3;
string phone = 4;
}
Generate the builder functions with this command protoc -I. --go_out=. ./example.proto
We there any error occurs while running this command that means you have not installed compiler or code generator correctly, or if you are running this on Linux then error can also occur. One possible reason is that /go/bin directory is not exported, you can export it by this command export PATH=$PATH:~/go/bin
.
Let's write our main code that will do data serialization and deserialization. First import the required packages and create a struct that will help us to work with json.
package main
import (
"encoding/json"
"fmt"
protoB "protobuf/protoB"
"google.golang.org/protobuf/proto"
)
type PersonJson struct {
Name string `json:"name"`
Email string `json:"email"`
Id int32 `json:"id"`
Phone string `json:"phone"`
}
We will write two function in golang, one is for protobuf example and other is for json example.
Lets write a function for protobuf demonstration that will include serialization and deserialization of data. First we will create variable of Person
(the Person struct is generated by protoc compiler and code generator according to the schema that you have written in example.proto file. Then we will set the fields in the struct and serialize the structured data. Similarly for deserialization we have create another variable of Person
and use the previously generated serialized bytes and both of these thing will be used by protobuf to deserialize the serialized data.
ProtoBuf
func protoBufExample(isPrint bool) {
// Serialization of Structured Data
var personSerializer protoB.Person
personSerializer.Name = "Shikhar Yadav"
personSerializer.Email = "fakemail@gmail.com"
personSerializer.Id = 1
personSerializer.Phone = "9876543210"
personBytes, err := proto.Marshal(&personSerializer)
if err != nil {
panic("[PROTOBUF] -> Error while proto.Marshal: " + err.Error())
}
if isPrint {
fmt.Println("[PROTOBUF] -> personBytes: ", personBytes)
fmt.Println("[PROTOBUF] -> Size: ", len(personBytes))
}
// Deserialization of raw serialized data
var personDeserilziler protoB.Person
err = proto.Unmarshal(personBytes, &personDeserilziler)
if err != nil {
panic("[PROTOBUF] -> Error while proto.Unmarshal: " + err.Error())
}
if isPrint {
fmt.Println("[PROTOBUF] -> Person name: " + personDeserilziler.Name)
fmt.Println("[PROTOBUF] -> Person email: " + personDeserilziler.Email)
}
}
Now just for comparison lets write some for json. I'm not gonna example this part of the code because it will almost same as the previous one.
Json
func jsonExample(isPrint bool) {
// Json serialization of structured data
var personSerializer PersonJson
personSerializer.Name = "Shikhar Yadav"
personSerializer.Email = "fakemail@gmail.com"
personSerializer.Id = 1
personSerializer.Phone = "9876543210"
personBytes, err := json.Marshal(personSerializer)
if err != nil {
panic("[JSON] -> Error while json.Marshal: " + err.Error())
}
if isPrint {
fmt.Println("[JSON] -> personBytes: ", personBytes)
fmt.Println("[JSON] -> Size: ", len(personBytes))
}
// Json deserialization of raw serialized data
var personDeserilziler PersonJson
err = json.Unmarshal(personBytes, &personDeserilziler)
if err != nil {
panic("[JSON] -> Error while json.Unmarshal: " + err.Error())
}
if isPrint {
fmt.Println("[JSON] -> Person name: " + personDeserilziler.Name)
fmt.Println("[JSON] -> Person email: " + personDeserilziler.Email)
}
}
Lets write some runner function that will be use to run these function:
Size Comparison
func sizeText() {
jsonExample(true)
protoBufExample(true)
}
Speed Comparison
func speedTest(rounds int) {
timeProto0 := time.Now()
for i := 0; i < rounds; i++ {
protoBufExample(false)
}
fmt.Println("Time Taken By Protobuf: ", time.Since(timeProto0))
timeJson0 := time.Now()
for i := 0; i < rounds; i++ {
jsonExample(false)
}
fmt.Println("Time Taken By Json: ", time.Since(timeJson0))
}
Now below is the complete code of the tutorial
package main
import (
"encoding/json"
"fmt"
protoB "protobuf/protoB"
"time"
"google.golang.org/protobuf/proto"
)
type PersonJson struct {
Name string `json:"name"`
Email string `json:"email"`
Id int32 `json:"id"`
Phone string `json:"phone"`
}
func jsonExample(isPrint bool) {
// Json serialization of structured data
var personSerializer PersonJson
personSerializer.Name = "Shikhar Yadav"
personSerializer.Email = "fakemail@gmail.com"
personSerializer.Id = 1
personSerializer.Phone = "9876543210"
personBytes, err := json.Marshal(personSerializer)
if err != nil {
panic("[JSON] -> Error while json.Marshal: " + err.Error())
}
if isPrint {
fmt.Println("[JSON] -> personBytes: ", personBytes)
fmt.Println("[JSON] -> Size: ", len(personBytes))
}
// Json deserialization of raw serialized data
var personDeserilziler PersonJson
err = json.Unmarshal(personBytes, &personDeserilziler)
if err != nil {
panic("[JSON] -> Error while json.Unmarshal: " + err.Error())
}
if isPrint {
fmt.Println("[JSON] -> Person name: " + personDeserilziler.Name)
fmt.Println("[JSON] -> Person email: " + personDeserilziler.Email)
}
}
func protoBufExample(isPrint bool) {
// Serialization of Structured Data
var personSerializer protoB.Person
personSerializer.Name = "Shikhar Yadav"
personSerializer.Email = "fakemail@gmail.com"
personSerializer.Id = 1
personSerializer.Phone = "9876543210"
personBytes, err := proto.Marshal(&personSerializer)
if err != nil {
panic("[PROTOBUF] -> Error while proto.Marshal: " + err.Error())
}
if isPrint {
fmt.Println("[PROTOBUF] -> personBytes: ", personBytes)
fmt.Println("[PROTOBUF] -> Size: ", len(personBytes))
}
// Deserialization of raw serialized data
var personDeserilziler protoB.Person
err = proto.Unmarshal(personBytes, &personDeserilziler)
if err != nil {
panic("[PROTOBUF] -> Error while proto.Unmarshal: " + err.Error())
}
if isPrint {
fmt.Println("[PROTOBUF] -> Person name: " + personDeserilziler.Name)
fmt.Println("[PROTOBUF] -> Person email: " + personDeserilziler.Email)
}
}
func sizeText() {
jsonExample(true)
protoBufExample(true)
}
func speedTest(rounds int) {
timeProto0 := time.Now()
for i := 0; i < rounds; i++ {
protoBufExample(false)
}
fmt.Println("Time Taken By Protobuf: ", time.Since(timeProto0))
timeJson0 := time.Now()
for i := 0; i < rounds; i++ {
jsonExample(false)
}
fmt.Println("Time Taken By Json: ", time.Since(timeJson0))
}
func main() {
fmt.Println("ProtoBuf Example")
sizeText()
speedTest(1000000)
}
Advantages
- Generate smaller serialized payload as compared JSON, XML, etc.
- Generate payload at very fast speed.
- Support forward-compatibility and backward-compatibility.
- Support language-neutrality, platform-neutrality.
Disadvantages
- Requires schema for code generation
- Hard to get started as compared to other data interchange formats.
- Much smaller community support as compared to others.
- Mostly used by google and that's why very few language is supported by protobuf because it only supports languages that is used by google internally.
Conclusion
The conclusion of this tutorial is that you always don't need protobuf, you only need this when your application need to optimize to serve thousands of users. One use case is that when you want to store data on disk but you want your data to as smaller as it can get, so in that case you can use protobuf to serialize your data.
Never use protobuf with Rest APIs, you will have hard time testing your APIs
Thank You reading the whole blog :)