Tuesday, February 11, 2020

Data Serialization - Getting started with Wire Format Compiler

Getting started with Wire Format Compiler

Data serialization is the process of getting an in memory data structure into a stream of bytes that can be sent via a network connection or written to disk and back. For PCs, servers, and smart phones there are many options to do this. There is no wrong or right way to do it, but different concepts offer different advantages and disadvantages.

For embedded systems Wire Format Compiler is a tool that generates C++ source code that performs this task. The generated code is light-weight enough to run on small embedded systems. Its language has been inspired by Google's protocol buffers, but it offers numerous enhancements for use on embedded systems, and provides about 3-4 times better performance when it comes to processing time and ROM footprint. Additionally, with WFC you are able to even target very small embedded systems such as 16 bit devices or even 8 bit devices. For this many options exists that allow to tune the generated code by taking advantage of specific aspects of small devices, such as lower address space, register width or implementation concepts.

First simple Example

As a first example let's take a look at the configuration data of a potential embedded device. The data description in WFC source format looks like this:

message Config
        string     hostname   = 1;
        unsigned   baudrate   = 2;
        string     wifi_ssid  = 3;
        string     wifi_pass  = 4;
        bool       station    = 5;
        fixed32    ipv4       = 6;
        fixed8     netmask    = 7;
        fixed32    gateway    = 8;
This description specifies a data structure with several members. First the data type is specified, second is the name of that structure member, and last the unique identifier. The unique identifier must never be changed after bringing data into real world usage. The data type can be modified for later software updates with some restrictions, that I may talk about in a later post. Backward and even forward compatibility is very strong for WFC generated code, if you follow some rules when extending messages for a later release. Forward compatibility is achieved by being able to skip over unknown data structures, but still extracting what is well known.

The data types used in this example are:
  • string: this will be implemented as a std::string per default, but this can be changed to a target specific more appropriate type. How to do this, I will write about in a different post.
  • unsigned: This data type will be generated as a platform specific unsigned integer data type. Take care to make sure that the default unsigned type has the appropriate range of values. This data type is serialized as a variable width integer that will require 10 byte at most to be stored plus usually 1 or 2 bytes for the message member tag.
  • bool: This is just the regular boolean type. Data serialization relies also on the concept of variable length integer, but boolean data will of course only take 1 byte for its data. Due to relying on the variable integer concept a boolean type can later be changed to an enumerated data type or some integer type that is not serialzed in signed format.
  • fixed8 and fixed32: These are fixed width integers that will be represented as uint8_t and uint32_t accordingly. Fixed8 will require 1 byte, fixed32 needs 4 bytes in their serial representation plus the tag. 
Of course there are many more types. I will take a look at what other types are available in a later post.

Without any additional options WFC will generate code for serializing an deserializing the data structure to and from memory. Additionally, helper functions are generated to calculated the serialized size for a specific data instance, and to serialize the structure directly to a C++ std::string. Furthermore, options exist to also create an ASCII or JSON output for human interaction or interfacing with other tools.

Integrating the generated code into an application

To integrate the generated code into your application, you need to include the generated header file into your code and add the generated C++ source file into your build process. To make use of the generated code, simply create an instance of the classes defined in the generated head file. In this case the class is called Config, just as the message is called.

So let's take a look how initializing, serializing, and deserializing work in practice. This examples is also included in the WFC source distribution and contains multiple variants how to use the generated code. Here we just take a look at the variant with C++ streams:

To initialize the structure, just set every member of it:
void initDefaults(Config &cfg)
        cout << "initializing defaults\n";
Next let's serialize the data to a file:
void saveConfig(const Config &cfg)
        string str;
        cfg.toString(str);      // serialize the data
        ofstream out;
        out.open(CfgName);      // open file
        out << str;             // write to file
        if (out.good())
                cout << "writing config successful\n";
                cerr << "error writing config\n";
Now that was also quite easy. We just serialize the data to a string and write the string to a file. That's it. So now the last important step comes. Deserializing the data:
int readConfig(Config &cfg, const char *CfgName)
        ifstream in;
        in.open(CfgName,ios::in);   /
/ open file
        in.seekg(0,ios::end);       // seek to end...
        ssize_t n = in.tellg();     // to get the size.
        in.seekg(0,ios::beg);       // back to begin
        if (n <= 0)
                return 1;
        char *buf = new char[n];    // allocate buffer
        in.read(buf,n);             // read data
        cfg.fromMemory(buf,n);      // parse the data
        delete[] buf;               // clean-up
        // That's it! Ready to go.
        cout << "read config from " << CfgName << endl;
        return 0;
Reading back is more complex, because we need to know how much memory to allocate and the C++ streams interface has no easy way to determine the size of a file. WFC even has a concept that enables it to find the end of the stream by itself. But this is an extension that I do not want to dive into here, because it needs some additional considerations. Restoring the serialized data is also just one line that asks the relevant class object to parse the data from memory. What is done behind the scene can be found in the code generated by WFC. But you don't have to understand anything about it, to use the generated code.

That's it for this post. The next posts will take a look at more data types, how to do some target specific optimizations, and how to use custom string types to reduce the footprint even further.

Get Wire Format Compiler. Happy hacking!

No comments: