Sunday, December 23, 2018

Memory Allocation In C

I have been working through an inherited project at work where the bulk of it is written in C.  The project itself is over 15 years old and has been maintained over the years by different people and teams.  It's interesting to see different coding styles between different programmers as well as the capabilities of the libraries you are using.

One thing I have noticed is a lot of poorly written memory management code.  A lot of what this program does is deal with strings in C.  That requires memory management.  You have to be careful when doing this in C.  I have found a lot of memory leaks in the code and just bad practices.  First, I've noticed a mix of memory management functions:
  • malloc(3) from the C library
  • realloc(3) from the C library
  • memset(3) from the C library
  • free(3) from the C library
  • g_malloc0() from glib - (this one is malloc+memset in one step)
  • g_new0() from glib - (this one is like calloc, but with the arguments reversed and rather than a size per element, it expects a type that it will then take the size of itself)
  • g_realloc() from glib
  • g_free() from glib
Before getting in to those, I will mention how I generally deal with memory management for strings in C.  To start with, I write code that will only continue if the memory allocation request succeeded.  It is possible to write code that takes a different approach when allocation fails, but for userspace programs I can't think of a reason where that would be useful.  If the allocation fails, the program should stop.  Second, I follow a practice of initializing all allocated memory to 0.  I do this to avoid security problems down the road or unexpected behavior for partially used buffers.  It goes something like this:
  • Ask for some amount of memory.
  • Check to see if that succeeded or not.  If it failed, abort() the program.
  • Do whatever you need to do.
  • Free the memory as soon as you don't need it anymore.
Sounds easy.  Here's a very common way to ask for 47 bytes of memory:
char *s;
s = malloc(47);

It's not a very good way to do it, but it's a way.  After this call to malloc(), s will either be NULL or a pointer to 47 bytes of contiguous memory the program can use for characters.  So I wrap this in a test to see if I got NULL and print an error message and abort:
char *s;
if ((s = malloc(47)) == NULL) {
    fprintf(stderr, "memory allocation failed\n");
    fflush(stderr);
    abort();
}

OK, that's better.  But I still need to initalize s to all 0s to follow my own rules:
char *s;
if ((s = malloc(47)) == NULL) {
    fprintf(stderr, "memory allocation failed\n");
    fflush(stderr);
    abort();
} else {
    memset(s, 0, 47);
}
Forget the hardcoded 47 for a moment (I would normally be using sizeof() here).  In this case, on failure of malloc(), we print an error message and abort.  On success, we initialize the buffer to 0s.  This is great and all, but do I really need to do this for all allocations I have?  The answer is yes, but there is a shorter way to do this and still get the same protections.  Enter calloc(3) and assert(3).

calloc() is a function that will allocate an array of some number of elements where each element is a specific size.  It will also initialize the array to 0s.  You can use this to allocate an array with only a single element to get the same effect on a single variable.  After all, strings are just char arrays in C.

assert() is a function that will help you find bugs in your code.  It takes an expression and if it evaluates to false, it will display an error and abort the program with abort().  You can also disable assert() if you define NDEBUG when you compile the program, just in case you want to compile something extra core-dumpy.  So, using those functions, I can reduce the above to this:
char *s;
assert((s = calloc(1, 47) != NULL);
Technically I don't need the '!= NULL', but I add that here to make it a little more obvious to the reader.  But there you have it.  A single line that checks the return value of the memory allocation and initializes it to all 0s.

What about the realloc() calls?  You can use the assert() wrapper for that too and reduce the code checking the return value of realloc.

Why not glib?

glib (not to be confused with glibc or gnulib; both separate projects) is a C utility library that provides a lot of convenience functions for data types, memory management, string manipulation, filesystem navigation, and so on.  I have used glib a lot and while the functions it provides are useful in some cases, I prefer using the standard C library if it is entirely sufficient for what I am doing.  I find most of glib to be unnecessary.

The memory management functions can also lead to problems.  If you use g_malloc0(), you should make sure to use g_free() because if they change how g_malloc0() works, you want to make sure your free operation continues to work.  The data type functions also leak easily unless you use them in very specific ways.

This is all just my personal preference on glib.  I have not found an instance where I absolutely need it and when given the choice between fewer build and runtime dependencies vs. more, I will pick fewer.

I have been told that glib can make it easier to port code between Linux and Windows.  That may be true, but I've never done that and have no plans to.  If I ever write something on Windows, maybe I will give glib another try.

Thursday, December 13, 2018

du(1) output for just the current directory

I wish this was default behavior of du(1), but it isn't.  While working on projects, sometimes I want to find which subdirectory is consuming the most disk space.  du is the shell command for this as it computes disk usage per subdirectory starting from the directory you give it, or the current directory if you don't give it a directory name.

But the output leaves a bit to be desired.  Typing just 'du' will print a line for each subdirectory from the tree down.  This could be several screenfuls of text and you really only want to know what the summary is from the current working directory.

The key the du max depth switch, which is the -d option (or --max-depth=NUM in GNU speak).  I do this:
du -d 1
And it just summarizes the subdirectories in the current directory I am in.  I usually also use the -h option to get more human-readable sizes.

So now instead of screenfuls of subdirectories nested 47 directories deep, I see something like this:
$ du -h -d 1
21M    ./tests
60K    ./rpmlint
16K    ./requirements
9.2M   ./src
8.0K   ./etc
20M    ./.git
684K   ./misc
12K    ./bin
280K   ./doc
51M    .
Enjoy.

Thursday, October 4, 2018

Finding the IP Address of a VM

Virtual machines are a way of life for developers and often times we have many running at the same time on the same host.  We communicate with them over TCP/IP and have to juggle RFC1918 addresses.  It's easy to lose track of the host's IP address if you are creating and destroying VMs frequently.

I recently found myself ssh'ing in my workstation and then needing to log in to a virtual machine.  I did not remember the IP address.  Using virsh, I was able to find it quickly:

$ sudo virsh list
  Id    Name                           State
----------------------------------------------------
 1     rhel7.5                        running


$ sudo virsh domifaddr rhel7.5
 Name       MAC address          Protocol     Address
-------------------------------------------------------------------------------
 vnet0      52:54:00:0a:87:3e    ipv4         192.168.122.118/24


In this example I just have the one virtual machine, but it's pretty simple to see.  I do not often use virsh, but in this case it was helpful.

Wednesday, October 3, 2018

Misc git Tips

git is a version control system that originally began life with the Linux kernel development team (replacing BitKeeper) and has grown to become, arguably, the dominant version control system in use today.  Those who haven't heard of git but have heard of github usually make the association, but it's important to remember that github is just one of many collaboration frontends built on top of git.  Read all about version control over here.

OK, so you're using git for your project.  If you're like me you may follow a model of creating a branch, doing your work there, then merging that branch back in to the main line.  This model is made extremely simple through github's web interface and is called forking a repository and then sending a pull request.

Here are some tips for working with forked repositories.  I am mostly posting this for my own future reference.

Keeping Your Forked Repository Up To Date

Often times you will be doing continual work on a project and want to keep your fork up to date with the main repository.  I do it by creating a new remote called 'upstream', fetching that, and then merging it in to my fork.  Like this (using the timezone database project as an example):

git remote add upstream https://github.com/eggert/tz
git fetch upstream

Now you have the upstream project under the name 'upstream' in your repo.  Your repo is under the name 'origin' and you likely have local tracking branches for work done there.  e.g., 'master' is tracking 'origin/master'.

You just fetched the latest updates from upstream, now you want to merge them in to your repo:

git checkout master
git merge upstream

You can do this for different branches to by checking out a different branch you have and merging from a specific upstream branch.  Such as a stable branch:

git checkout -b stable-release origin/stable-release
git merge upstream/stable-release

Lastly, remember that this is all a one-way pull, so you need to push your newly merged repo back to your origin:

git push

Just don't push to upstream.  In fact, it's good practice to add remotes called 'upstream' using read-only repo URLs.

Branches Get Removed

It's not uncommon for branches to be created and destroyed frequently with git.  While a git fetch operation will pull down new branches, it does not automatically remove any branches that have disappeared at the origin.  Here's how you can do that:

git fetch -p

You can also specify a different remote, such as:

git fetch -p upstream

Remember that you also need to delete any branches you created locally that were tracking removed branches.  Use git branch -D to remove the local branches.

Saturday, June 30, 2018

Wall Hydrants

We have a recurring problem with our outside wall hydrants.  This doesn't include the name of the things, which itself took a while to figure out.

What is a wall hydrant?  The thing you attach a garden hose too on your house.  I've learned they are called many different names, such as spigot, faucet, and sillcock.  However, wall hydrant appears to be the name that the manufacturers use more or less consistently.  And that's important when looking for parts.

So what's the problem?  At the end of each season, I shut off the outside water and drain the hydrants and toss the hoses and attachments (I have tried storing these over winter, but they just crack or corrode.  It's not worth it, so I just buy the cheapest hoses each year and I'm done with it.).  We have frost free hydrants, but I still shut them off and leave them open.  When spring rolls around, I turn on the water and attach hoses.  Every single year these things are broken in some way.  Almost always they are leaking and it's usually the gaskets or packing that has to be replaced.  I've been replacing these parts yearly.  This year was no different.

The hardest part is figuring out what you have.  Identifying these things is difficult.  It's also additionally complicated because companies acquire each other and either continue or rebrand existing product lines.  Our house came with Mansfield branded hydrants, which I learned is now part of Prier.  However, Mansfield was popular and established so they still refer to both Mansfield products and part numbers throughout their documentation.  Prier also gives everything their own part number, so it's a mess.

There are some key things to note when trying to identify which hydrant you have:
  • The handle turn.  Quarter, half, full?  Sometimes this is referred to by degrees.  This is a key indicator for the type you have.
  • The stem type.  Take the thing apart and remove the stem.  You should shut off the water unless you want to get drenched doing this.  The supply to the hydrant needs to be off, not just the hydrant itself.  Take a note of what it looks like and the length of the stem.
Prier provides a guide to help you figure this stuff out for Mansfield-style hydrants.

Now, once you've identified it, you need to order parts.  Oddly, the best option I've found is through Amazon.  Failing that, I've gone with Home Depot and I order it.  They never have anything useful in the store.  Don't be fooled by things they have in stock that look like it would work.  Home Depot tends to only stock their house brand merchandise for stuff like this.  Or one specific model, but it's the whole thing.  Prier makes all sorts of parts and repair kits, so you don't need to buy an entire hydrant.

I spent today replacing washers and graphite packing on both of our hydrants only to find the damn things continue leaking.  With no other ideas, I'm going to order replacement stems and just rebuild it from the inside.  The stems we have are pretty corroded anyway.

When winter comes around, I'm going to bag the hydrants in addition to shutting them off.  Actually, I think I'm going to remove the stems and then bag the hydrant.  If that doesn't work, maybe we just won't have outside water.

Sunday, May 6, 2018

I'd like to move this blog

I'm still using Blogger.  Over the years this site has existed (twenty years!), I have hosted some sort of blog-like thing either locally on my own stuff, locally on some web site software, or through a service like Blogger.  I'd like to get the site back to running on its own and while Blogger is ok, I sort of want my own site again.  But I'm not interested in something as complicated as Wordpress.  If you run a site like this, what simple blog system are you using on your site?  Ideally I'd like to write posts as individual files and drop them somewhere.  Either text or supporting HTML markup.  Comments are useful, so I'm not sure how that would go.  Bonus points for systems that avoid PHP.

Saturday, March 24, 2018

Copying Files From Google Drive

If you use Google Drive for anything, you likely have accumulated a number of files in your "drive" over time.  This is the case for me and what I would like to be able to do is something like rsync to just mirror my entire Google Drive to my local system.  Does anyone know how to do this?

Why, you may ask?  Well, while I find Google Drive useful, I still like having control over all of my data and I really just view Google Drive as a convenience.  Since Google may at any time decide to discontinue it, I want to have all of my data in a location that I know I control.  So I want to keep a local sync of those files for my own backup purposes.

Looking for ideas...