Tuesday, February 11, 2020

Automatic Versioning with Make/CMake and Mercurial

Keeping Version Strings Up-To-Date

A software package and its binary executables should have a version number or a release name. This is more or less common practice, due to the benefits one has, when it is possible to identify the version of a specific software.

But what is the best way to bring the version tag into the software and keeping it up-to-date? Traditionally, version information has been hard coded into the source code. This is the simplest and most straight forward approach, but this comes with a big question: When and how do you update the version to e.g. version 2.0?

It makes sense to do it in an exclusive commit to the revision control system of the project. Additionally, it should be done after performing all those release and regression tests. But if the version string is hard coded, you will be unable to figure out from which commit of your repository a specific binary was built, unless you update the version string on every commit. But if you cannot ensure that only the latest and greatest binary will be used, it can be quite important to identify to which commit a specific binary relates during development.

Furthermore, if you change the hard-coded version string after release testing, how do you know you didn't break the code while changing the version string? So another release test of the version with the correct string might become necessary. And, as we all know, manual tasks are prone to error. So sooner or later, manually updating the version string will be forgotten, the update will be incorrect or introduce some bug. Maybe the code will not compile or even worse the version string will be incorrect or whatever...

So the simple hard-coded version string concept may work well in many cases, but might not make everybody happy. 

Automatic Version Generation

Therefore, it makes sense to think about a better solution. A better approach is to generate the version string during build time. This concept also can provide more information about the build itself. E.g. hostname of build server, version of tools used for building, time of build and so on come to mind.

For this, support from the build system and the version control system is necessary. The build system must trigger an update of the relevant information and link it to the binary.

But how can this information be gathered, when it should be available at build time, but not hard-coded into the sources? One way to do it, is to use the infrastructure of Mercurial as a revision control system and its tagging mechanism. The advantage of this concept is that it also works for archives that have been generated by mercurial, but have no reference to the repository. With other revision control systems you might have to come up with another approach, but probably will find a similar solution.

Using GNU Make or CMake as build systems, Mercurial's infrastructure can easily be employed to provide the necessary information. Mercurial's tags provide the ability to associate a given commit with a version string. Like this a specific revision can be given a name or version after its commit has been submitted and tested. So you can do the commit, test it, and once you are sure all release prerequisites are fulfilled and it is ready for publication, you tag the revision with a version name without changing any code manually. This reduces the risk of breaking anything dramatically.

Furthermore, Mecurial also lets you query the distance to the latest tag. So if the latest tag is always the latest version number, the delta can be used as a patch level to the named version. For a mercurial repository the latest tag and distance as patch level can be queried with 'hg log -r . "{latesttag}.{latesttagdistance}"'.

Now, if the repository gets exported to a zip or tgz archive, the repository cannot be queried anymore. The good thing is that Mercurial creates a file called .hg_archival.txt that contains just this information. To extract the version information from this file some shell scripting with awk or grep and sed is necessary. All this is demonstrated here in a sample repository with a small shell script, which should work on all UNIX based systems like Linux, BSD, Solaris or MacOSX.

The version string itself must be written either to a header file or a source file for make and cmake being able to recognize a change in the version string and trigger the appropriate build steps, unless you want to do a full build every time. So putting the version string in a command line argument for a #define will not yield the intended result, as a source code delta might be overlooked by that approach by GNU Make and CMake. Therefore, just make sure to generate a source file that is picked up during the build process with all relevant version information that you would like to have integrated into the binary.

Example

Let's take a look at a small example, how to use the template repository. The demo repository contains a file called hello.c that prints the version that has compiled into its executable. Both CMake and GNU Make are supported. Support for BSD Make is missing, so on BSD and Solaris you will have to call gmake instead of make.

First, we start by cloning the template repository:
> hg clone mkversion.hg myproject 

Then we build the project with autoconf and make and take a look at what we get:

> ./configure
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... x86_64-pc-linux-gnu
checking for cc... cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ISO C89... none needed
configure: creating ./config.status
config.status: creating Makefile
> make
cc -MM -MG hello.c -o .depend
sh mkversion.sh
creating version.h
cc -g -O2   -c -o hello.o hello.c
cc -g -O2  hello.o -o hello
> cat version.h
#ifndef VERSION_H
#define VERSION_H
#define VERSION         "V0.1.1 (hg:1/7906498bc6e3)"
#define HG_REV          "1"
#define HG_BRANCH       "default"
#define HG_NODE          "7906498bc6e36f95daf03ffce97a18c3000990fb"
#define HG_ID           "7906498bc6e3"
#define HG_TAGS         "tip"
#define HG_LATESTTAG    "V0.1"

#endif
> ./hello
version V0.1.1 (hg:1/7906498bc6e3)


What we see here, is that we got Version V0.1.1 after cloning the repository. But the latest tag is V0.1. So, let's take a look at the log:
> hg log
changeset:   1:7906498bc6e3
tag:         tip
user:        Thomas Maier-Komor <thomas@maier-komor.de>
date:        Thu Jul 25 07:37:32 2019 +0200
summary:     Added tag V0.1 for changeset c6295b293642
 
changeset:   0:c6295b293642
tag:         V0.1
user:        Thomas Maier-Komor <thomas@maier-komor.de>
date:        Thu Jul 25 07:37:25 2019 +0200
summary:     initial checkin of build template with version generation
> hg id
7906498bc6e3 tip
As you can see, the clone updated the sandbox to the latest revision, which is the addition of the tag for changeset 0. Therefore, if we want to get the expected version information, we must update to the revision that we tested and tagged afterwards. I.e.:
> hg up -r V0.1
After that and rebuilding the binary, we get the expected result.
> ./hello
version V0.1.0 (hg:0/c6295b293642)
As written above, Mercurial also provides the infrastructure to determine if the sandbox for building has any uncommitted changes. Like this it is easily possible to integrate this important information into the version string. This template adds a plus character at the end of the version string, if there are uncommitted changes detected at build time. Of course you can change the plus character to something different or even cancel the build if you want to make sure that only reproducible binaries are created. This kind of restriction could also be applied to a specific server.

Let's see how it works. Just make a simple modification to one of the files that are tracked in the repository. E.g. add a newline at the end of hello.c:
> echo >> hello.c
After that trigger a new build with make. The version string then looks like this:
% ./hello 
version V0.1.1+ (hg:1/7906498bc6e3)
This template has no big magic, just a shell script and its integration in the build infrastructure with GNU make and CMake. You can also easily expand it to include the username and/or hostname of the person who triggered the build or whatever else you would like to see. 

Get the template as a Mercurial repository here. I hope you like it. I am rolling this concept out to all of my software development projects.

No comments: