What the heck is ProtoBuf!

What the heck is ProtoBuf!

Serialization Of Information and Deserialization Of Data

ProtoBuf is language-neutral, platform-neutral, extensible mechanism for serializing structured data in a forward-compatible and backward-compatible way. In common term it is a data encoding format just like JSON with some advantages and disadvantages. ProtoBuf, more elaborately Protocol Buffer is develop by google and provide data serialization and deserialization at very fast speed also it generate much smaller payload as compared to JSON and other data interchange protocols like XML, YAML, etc.

So the question arises that why we need another data interchange format when we have something like JSON which is used at a very-very large scale in most of the big companies. The answer is that in normal scenarios JSON fits perfectly but if in some scenarios we need more efficient ways to do data serialization and deserialization then protobuf is the best choice.

You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. It support most of the famous languages like java, python golang, c++, etc.

The basic concept of protobuf are shown in the below figure

protocol-buffers-concepts.png

Protocol buffers are the most commonly-used data format at Google. They are used extensively in inter-server communications as well as for archival storage of data on disk. Protocol buffer messages and services are described by engineer-authored .proto files. The following shows an example message:

  message Person {
      string name = 1;
      int32 id = 2;
      string email = 3;
      optional string gender = 4;
  }

To use protobuf in your project you need three things first is protoc compiler second is .proto file that works as schema and last is the code generator for your choice of programming language, you can check list of support languages. Google maintains code generator for supported languages.

Use your favorite programming language to compile this .proto file. For this tutorial I'm gonna use Golang programming language and create a builder class that you can use with code to instantiate a instance of class. Download protoc compiler according to your operating system from hear and downlaod code generator for golang from list the list of supported language. Now we are all set to dive into the coding part of this tutorial.

Create a golang project in you favorite text editor by typing go mod init learn_protobuf and create a main.go file. This is not a golang tutorial and that is why I'm not gonna example too much about golang. Create a example.proto file and write schema:

syntax = "proto3";
package main;
option go_package = "./protobuf";

message Person {
    string name = 1;
    int32 id = 2;
    string email = 3;
    string phone = 4;
}

Generate the builder functions with this command protoc -I. --go_out=. ./example.proto We there any error occurs while running this command that means you have not installed compiler or code generator correctly, or if you are running this on Linux then error can also occur. One possible reason is that /go/bin directory is not exported, you can export it by this command export PATH=$PATH:~/go/bin.

Let's write our main code that will do data serialization and deserialization. First import the required packages and create a struct that will help us to work with json.

package main

import (
    "encoding/json"
    "fmt"
    protoB "protobuf/protoB"

    "google.golang.org/protobuf/proto"
)

type PersonJson struct {
    Name  string `json:"name"`
    Email string `json:"email"`
    Id    int32  `json:"id"`
    Phone string `json:"phone"`
}

We will write two function in golang, one is for protobuf example and other is for json example. Lets write a function for protobuf demonstration that will include serialization and deserialization of data. First we will create variable of Person (the Person struct is generated by protoc compiler and code generator according to the schema that you have written in example.proto file. Then we will set the fields in the struct and serialize the structured data. Similarly for deserialization we have create another variable of Person and use the previously generated serialized bytes and both of these thing will be used by protobuf to deserialize the serialized data.

ProtoBuf

func protoBufExample(isPrint bool) {

    // Serialization of Structured Data
    var personSerializer protoB.Person
    personSerializer.Name = "Shikhar Yadav"
    personSerializer.Email = "fakemail@gmail.com"
    personSerializer.Id = 1
    personSerializer.Phone = "9876543210"

    personBytes, err := proto.Marshal(&personSerializer)
    if err != nil {
        panic("[PROTOBUF] -> Error while proto.Marshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[PROTOBUF] -> personBytes: ", personBytes)
        fmt.Println("[PROTOBUF] -> Size: ", len(personBytes))
    }

    // Deserialization of raw serialized data
    var personDeserilziler protoB.Person
    err = proto.Unmarshal(personBytes, &personDeserilziler)
    if err != nil {
        panic("[PROTOBUF] -> Error while proto.Unmarshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[PROTOBUF] -> Person name: " + personDeserilziler.Name)
        fmt.Println("[PROTOBUF] -> Person email: " + personDeserilziler.Email)
    }
}

Now just for comparison lets write some for json. I'm not gonna example this part of the code because it will almost same as the previous one.

Json

func jsonExample(isPrint bool) {
    // Json serialization of structured data
    var personSerializer PersonJson
    personSerializer.Name = "Shikhar Yadav"
    personSerializer.Email = "fakemail@gmail.com"
    personSerializer.Id = 1
    personSerializer.Phone = "9876543210"

    personBytes, err := json.Marshal(personSerializer)
    if err != nil {
        panic("[JSON] -> Error while json.Marshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[JSON] -> personBytes: ", personBytes)
        fmt.Println("[JSON] -> Size: ", len(personBytes))
    }

    // Json deserialization of raw serialized data
    var personDeserilziler PersonJson
    err = json.Unmarshal(personBytes, &personDeserilziler)

    if err != nil {
        panic("[JSON] -> Error while json.Unmarshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[JSON] -> Person name: " + personDeserilziler.Name)
        fmt.Println("[JSON] -> Person email: " + personDeserilziler.Email)
    }
}

Lets write some runner function that will be use to run these function:

Size Comparison

func sizeText() {
    jsonExample(true)
    protoBufExample(true)
}

Speed Comparison

func speedTest(rounds int) {
    timeProto0 := time.Now()
    for i := 0; i < rounds; i++ {
        protoBufExample(false)
    }
    fmt.Println("Time Taken By Protobuf: ", time.Since(timeProto0))

    timeJson0 := time.Now()
    for i := 0; i < rounds; i++ {
        jsonExample(false)
    }
    fmt.Println("Time Taken By Json: ", time.Since(timeJson0))

}

Now below is the complete code of the tutorial

package main

import (
    "encoding/json"
    "fmt"
    protoB "protobuf/protoB"
    "time"

    "google.golang.org/protobuf/proto"
)

type PersonJson struct {
    Name  string `json:"name"`
    Email string `json:"email"`
    Id    int32  `json:"id"`
    Phone string `json:"phone"`
}

func jsonExample(isPrint bool) {
    // Json serialization of structured data
    var personSerializer PersonJson
    personSerializer.Name = "Shikhar Yadav"
    personSerializer.Email = "fakemail@gmail.com"
    personSerializer.Id = 1
    personSerializer.Phone = "9876543210"

    personBytes, err := json.Marshal(personSerializer)
    if err != nil {
        panic("[JSON] -> Error while json.Marshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[JSON] -> personBytes: ", personBytes)
        fmt.Println("[JSON] -> Size: ", len(personBytes))
    }

    // Json deserialization of raw serialized data
    var personDeserilziler PersonJson
    err = json.Unmarshal(personBytes, &personDeserilziler)

    if err != nil {
        panic("[JSON] -> Error while json.Unmarshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[JSON] -> Person name: " + personDeserilziler.Name)
        fmt.Println("[JSON] -> Person email: " + personDeserilziler.Email)
    }
}

func protoBufExample(isPrint bool) {

    // Serialization of Structured Data
    var personSerializer protoB.Person
    personSerializer.Name = "Shikhar Yadav"
    personSerializer.Email = "fakemail@gmail.com"
    personSerializer.Id = 1
    personSerializer.Phone = "9876543210"

    personBytes, err := proto.Marshal(&personSerializer)
    if err != nil {
        panic("[PROTOBUF] -> Error while proto.Marshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[PROTOBUF] -> personBytes: ", personBytes)
        fmt.Println("[PROTOBUF] -> Size: ", len(personBytes))
    }

    // Deserialization of raw serialized data
    var personDeserilziler protoB.Person
    err = proto.Unmarshal(personBytes, &personDeserilziler)
    if err != nil {
        panic("[PROTOBUF] -> Error while proto.Unmarshal: " + err.Error())
    }

    if isPrint {
        fmt.Println("[PROTOBUF] -> Person name: " + personDeserilziler.Name)
        fmt.Println("[PROTOBUF] -> Person email: " + personDeserilziler.Email)
    }
}

func sizeText() {
    jsonExample(true)
    protoBufExample(true)
}

func speedTest(rounds int) {
    timeProto0 := time.Now()
    for i := 0; i < rounds; i++ {
        protoBufExample(false)
    }
    fmt.Println("Time Taken By Protobuf: ", time.Since(timeProto0))

    timeJson0 := time.Now()
    for i := 0; i < rounds; i++ {
        jsonExample(false)
    }
    fmt.Println("Time Taken By Json: ", time.Since(timeJson0))

}

func main() {
    fmt.Println("ProtoBuf Example")
    sizeText()
    speedTest(1000000)
}

Advantages

  • Generate smaller serialized payload as compared JSON, XML, etc.
  • Generate payload at very fast speed.
  • Support forward-compatibility and backward-compatibility.
  • Support language-neutrality, platform-neutrality.

Disadvantages

  • Requires schema for code generation
  • Hard to get started as compared to other data interchange formats.
  • Much smaller community support as compared to others.
  • Mostly used by google and that's why very few language is supported by protobuf because it only supports languages that is used by google internally.

Conclusion

The conclusion of this tutorial is that you always don't need protobuf, you only need this when your application need to optimize to serve thousands of users. One use case is that when you want to store data on disk but you want your data to as smaller as it can get, so in that case you can use protobuf to serialize your data.

Never use protobuf with Rest APIs, you will have hard time testing your APIs

Thank You reading the whole blog :)