Saturday 16 May 2015

shmake - A Shell-based Build Tool

Beyond Make

Make is a marvelous tool in the Unix tradition: it does its one thing well. But modern software leans on build tools much more heavily, for conditional configuration and deployment, as well as for only rebuilding files when the files they depend on change. If Make was sufficient, then it would not be constantly being reinvented.

Make has always assumed that all the other Unix tools would be available, which makes cross-platform hard. So other build systems have appeared, either native build file generators (like CMake generating makefiles and MSBuild files) or directly tracking dependencies and running the compiler, like SCons or my own Lake.

shmake does not try to be cross-platform; its expressive power comes from the Unix shell. In fact, shmakefiles are shell scripts, except they are not executed standalone - rather they are run from within shmake itself. What it does is use shell as its DSL for expressing builds, rather than extending Make with shell-like constructs (like GNU Make) or inventing its own language (like CMake).

Assume a program consists of two source files and a header; here is a shmakefile for building it:

 #!/bin/sh
 . /tmp/shmake.sh

 COMPILE='gcc -c @(INPUT) -o @(TARGET)'
 LINK='gcc @(DEPS) -o @(TARGET)'

 T hello.o hello.c common.h "$COMPILE"
 T common.o common.c "$COMPILE"
 T hello hello.o common.o "$LINK"
 all hello

shmake thinks very much like Make; there are targets, but they're explicitly indicated by T: the target hello.o is the result of applying a compile command to hello.c, with common.h as an extra dependency. The object files are linked to give a target hello, and finally there's a special 'all' target which depends on hello. The commands use @() for variable expansion; for the first target, TARGET is hello.o, INPUT is hello.c and DEPS would be 'hello.c common.h'. Once the shmakefile is run within shmake, the target dependencies are checked and the tools invoked when a target is older than any of its inputs.

By itself, this isn't an improvement over Make. In particular, manually tracking dependencies (that hello.o depends also on common.h) is tedious and hard to get right. The point of this "hello build' is to emphasize that shmake is not limited to building programs, and is good for general dependency-based programming.

Convenient Compilation

For this simple program, the following shmakefile will do all the above, and more:

 simple$ cat shmakefile
 #!/bin/sh
 . /tmp/shmake.sh

 C hello hello.c common.c

 simple$ shmake
 compiling common.o
 compiling hello.o
 linking hello
 simple$ touch common.h
 simple$ shmake
 compiling hello.o
 linking hello

Note that shmake has detected the dependency of hello.c on common.h!

By default, shmake is not a chatty program (and the -q flag will make it even less chatty). To see the executed commands, use -v for verbose:

 simple$ shmake clean
 simple$ shmake -v
 gcc -c -Wall -MMD -O2 common.c -o common.o
 gcc -c -Wall -MMD -O2 hello.c -o hello.o
 gcc common.o hello.o -Wl,-s -o hello
 simple$ shmake clean
 simple$ shmake -v -g
 gcc -c -Wall -MMD -g common.c -o common.o
 gcc -c -Wall -MMD -g hello.c -o hello.o
 gcc common.o hello.o  -o hello

These are fairly generic flags (never compile C without -Wall; C's warnings are what most other languages would consider errors!) except for -MMD. This asks gcc to write a .d file, which shmake reads to build up target dependencies.

 simple$ cat hello.d
 hello.o: hello.c common.h

So 'C hello hello.c common.c' is a useful shortcut - under the hood, it expands to exactly the first explicit version. Note that since shmake is tracking all created targets, it knows how to construct a default 'clean' target, and the '-g' flag causes a non-optimized debug build to be made. If you pass a target name with extension .a, then C will build a static language; if you pass .so, it will build a shared library.

The implicit FAQ at this point might be:

  • OK, that's convenient. But can these shortcuts be composed?
  • This is a trivial program. How to make it link against libraries?
  • What if I have my own special flags?

First, they are composable because they create targets, and targets can be made to depend on them, just as in Make.

 C first first/*.c
 C second second/*.c
 all first second

Second, the 'C' command supports the usual -I, -L -l flags which are passed to the compile or the link stage. But the 'S' command is more flexible and easier to read. 'S cflags' and 'S lflags' can be used to set custom compiler and link flags, which answers the third point. 'shmake CC=clang' works as expected - name-value pairs become environment variables passed to the script.

Here is a complete shmake file for building Lua - compare to the original makefile.

 #!/bin/sh
 . /tmp/shmake.sh

 LUA_LIB=liblua52.a
 S defines LUA_COMPAT_ALL
 S exports true
 S libs readline
 S libs m

 echo "Building Lua for $PLAT platform"

 case $PLAT in
 freebsd) 
     S defines LUA_USE_LINUX
     ;;
 Linux)
     S defines LUA_USE_LINUX
     S libs dl
     ;;
 ansi)
     S defines LUA_ANSI
     ;;
 posix)
     S defines LUA_USE_POSIX
     ;;
 solaris)
     S defines LUA_USE_POSIX LUA_USE_DLOPEN
     S libs dl
     ;;
 darwin)
     S defines LUA_USE_MACOSX
     ;;
 *)
     echo "unsupported platform $PLAT. One of freebsd,Linux,Darwin,posix,ansi,solaris"
     exit 1
     ;;
 esac

 C $LUA_LIB *.c -x 'lua.c luac.c'
 C lua lua.c $LUA_LIB
 C luac luac.c $LUA_LIB

 all lua luac

This is definitely easier to read than the original. One reason is the 'C' commands support wildcards, and the very useful -x,--exclude flag. Another is how the 'S' command makes conditional use of libraries and includes easier. The original makefile had to shell out make for each case; here we actually have a case statement since we have a real scripting language.

It is straightforward to get a debug build, just clean and use -g. It is also easier to modify. If the first line is replaced with:

 LIB_NAME=liblua52
 if [ -n "$BUILD_SO" ]; then
     LUA_LIB=$LIB_NAME.so
 else
     LUA_LIB=$LIB_NAME.a
     S exports true
 fi

then 'shmake BUILD_SO=1' will make you a shared library. (It will fail to build luac, which needs to be statically linked, but everything else works.)

Scripting with C

Our computers have been steadily getting more memory and faster I/O, even if individual core speeds have stalled. However, C has not been getting bigger. So on modern machines, C programs compile & link very quickly, even without tcc. Of course tcc is convenient because it can skip the irritating compile/link cycle. There is still that irritating boilerplate before you can write code. When testing llib, I found this shmakefile useful:

 c-script$ cat shmakefile
 #!/bin/sh
 . /tmp/shmake.sh

 # simple C script!
 S quiet true

 cfile=$P.c
 sfile=$P.s

 T  $cfile $sfile "
 cat >@(TARGET) <<block
 #include <stdio.h>
 #include <stdlib.h>
 #include <llib/all.h>
 #include <llib/template.h>

 int main(int argc, char **argv)
 {
 #line 1 \"@(INPUT)\"
 block
 cat @(INPUT) >> @(TARGET)
 cat >>@(TARGET) <<block
     return 0;
 }
 block
 "

 # llib needs C99 support
 C99 -g $P $cfile -n llib

 T all $P "./$P $A"

 c-script$ cat hello.s
 FOR (i,argc) printf("hello, world %s\n",argv[i]);
 c-script$ shmake P=hello A='one two three'
 hello, world ./hello
 hello, world one
 hello, world two
 hello, world three

Again, the shell script is doing all the work - 'hello.c' is a target which is generated by inserting 'hello.s' into some boilerplate. The compile step then creates the target 'hello' from 'hello.c', and finally the default target 'all' causes the program to be compiled and run. On this modest laptop, which most of you would scorn as a development platform, this compiles and executes practically instantly.

What is this '-n llib'? It means that the program needs llib. This is a useful concept borrowed from Lake. shmake resolves the need 'llib' by checking in turn if 'llib.need' or '~/.shmake/llib.need' exist. A need file is a simple config file that allows for expansions in the style of pkg-config's .pc files:

 c-script$ cat llib.need
 prefix=../..
 cflags=-I${prefix}
 libs=-L${prefix}/llib -lllib

If shmake cannot find llib.need, it invokes pkg-config.

To make this accessible everywhere, you could have a wrapper called 'crun' that looks like 'shmake -f ~/mystuff/cscript/shmakefile P=$1' and ensure that llib.need sits in ~/.shmake, with actual prefix of installed llib.

Needs are a useful solution to finding libraries, in the same style as pkg-config, but not requiring it to be present. (pkg-config is not universal, and many packages forget to use it even if it's around.) Even within the Linux ghetto, packagers cannot agree where to put things. For instance, on Debian the tcl headers are in /usr/include/tcl8.6, the Lua headers are in /usr/include/lua5.2; other distros may simply drop them in /usr/include or elsewhere.

Short and Sweet

shmake came out of a need to scratch an itch ('why is building software still so irritating?') and the need to test llib. To make things easier, llib is bundled with the tarball. There is no special install required - just copy shmake onto your path. It was written using the time-honoured Barbarian-in-a-Hurry (BIAH) methodology so there isn't much care about memory leaks. It may occaisionally try to eat your homework.

If nothing else, it is a concrete example of using shell as an embedded scripting language. Which is the the point I want to emphasize; these are shell scripts. There are many good resources on how to learn shell, and shell has many uses outside the niche of building, unlike gmake. If you have limited time to master things (and this is usually true these days) then it is better to master shell than gmake, precisely because it is more universally useful, and shmake makes that choice easier.

No comments:

Post a Comment