Sunday, December 23, 2018

Memory Allocation In C

I have been working through an inherited project at work where the bulk of it is written in C.  The project itself is over 15 years old and has been maintained over the years by different people and teams.  It's interesting to see different coding styles between different programmers as well as the capabilities of the libraries you are using.

One thing I have noticed is a lot of poorly written memory management code.  A lot of what this program does is deal with strings in C.  That requires memory management.  You have to be careful when doing this in C.  I have found a lot of memory leaks in the code and just bad practices.  First, I've noticed a mix of memory management functions:
  • malloc(3) from the C library
  • realloc(3) from the C library
  • memset(3) from the C library
  • free(3) from the C library
  • g_malloc0() from glib - (this one is malloc+memset in one step)
  • g_new0() from glib - (this one is like calloc, but with the arguments reversed and rather than a size per element, it expects a type that it will then take the size of itself)
  • g_realloc() from glib
  • g_free() from glib
Before getting in to those, I will mention how I generally deal with memory management for strings in C.  To start with, I write code that will only continue if the memory allocation request succeeded.  It is possible to write code that takes a different approach when allocation fails, but for userspace programs I can't think of a reason where that would be useful.  If the allocation fails, the program should stop.  Second, I follow a practice of initializing all allocated memory to 0.  I do this to avoid security problems down the road or unexpected behavior for partially used buffers.  It goes something like this:
  • Ask for some amount of memory.
  • Check to see if that succeeded or not.  If it failed, abort() the program.
  • Do whatever you need to do.
  • Free the memory as soon as you don't need it anymore.
Sounds easy.  Here's a very common way to ask for 47 bytes of memory:
char *s;
s = malloc(47);

It's not a very good way to do it, but it's a way.  After this call to malloc(), s will either be NULL or a pointer to 47 bytes of contiguous memory the program can use for characters.  So I wrap this in a test to see if I got NULL and print an error message and abort:
char *s;
if ((s = malloc(47)) == NULL) {
    fprintf(stderr, "memory allocation failed\n");

OK, that's better.  But I still need to initalize s to all 0s to follow my own rules:
char *s;
if ((s = malloc(47)) == NULL) {
    fprintf(stderr, "memory allocation failed\n");
} else {
    memset(s, 0, 47);
Forget the hardcoded 47 for a moment (I would normally be using sizeof() here).  In this case, on failure of malloc(), we print an error message and abort.  On success, we initialize the buffer to 0s.  This is great and all, but do I really need to do this for all allocations I have?  The answer is yes, but there is a shorter way to do this and still get the same protections.  Enter calloc(3) and assert(3).

calloc() is a function that will allocate an array of some number of elements where each element is a specific size.  It will also initialize the array to 0s.  You can use this to allocate an array with only a single element to get the same effect on a single variable.  After all, strings are just char arrays in C.

assert() is a function that will help you find bugs in your code.  It takes an expression and if it evaluates to false, it will display an error and abort the program with abort().  You can also disable assert() if you define NDEBUG when you compile the program, just in case you want to compile something extra core-dumpy.  So, using those functions, I can reduce the above to this:
char *s;
assert((s = calloc(1, 47) != NULL);
Technically I don't need the '!= NULL', but I add that here to make it a little more obvious to the reader.  But there you have it.  A single line that checks the return value of the memory allocation and initializes it to all 0s.

What about the realloc() calls?  You can use the assert() wrapper for that too and reduce the code checking the return value of realloc.

Why not glib?

glib (not to be confused with glibc or gnulib; both separate projects) is a C utility library that provides a lot of convenience functions for data types, memory management, string manipulation, filesystem navigation, and so on.  I have used glib a lot and while the functions it provides are useful in some cases, I prefer using the standard C library if it is entirely sufficient for what I am doing.  I find most of glib to be unnecessary.

The memory management functions can also lead to problems.  If you use g_malloc0(), you should make sure to use g_free() because if they change how g_malloc0() works, you want to make sure your free operation continues to work.  The data type functions also leak easily unless you use them in very specific ways.

This is all just my personal preference on glib.  I have not found an instance where I absolutely need it and when given the choice between fewer build and runtime dependencies vs. more, I will pick fewer.

I have been told that glib can make it easier to port code between Linux and Windows.  That may be true, but I've never done that and have no plans to.  If I ever write something on Windows, maybe I will give glib another try.

Thursday, December 13, 2018

du(1) output for just the current directory

I wish this was default behavior of du(1), but it isn't.  While working on projects, sometimes I want to find which subdirectory is consuming the most disk space.  du is the shell command for this as it computes disk usage per subdirectory starting from the directory you give it, or the current directory if you don't give it a directory name.

But the output leaves a bit to be desired.  Typing just 'du' will print a line for each subdirectory from the tree down.  This could be several screenfuls of text and you really only want to know what the summary is from the current working directory.

The key the du max depth switch, which is the -d option (or --max-depth=NUM in GNU speak).  I do this:
du -d 1
And it just summarizes the subdirectories in the current directory I am in.  I usually also use the -h option to get more human-readable sizes.

So now instead of screenfuls of subdirectories nested 47 directories deep, I see something like this:
$ du -h -d 1
21M    ./tests
60K    ./rpmlint
16K    ./requirements
9.2M   ./src
8.0K   ./etc
20M    ./.git
684K   ./misc
12K    ./bin
280K   ./doc
51M    .