Configuration Management – Vallard's Blog

Last week I researched a few different configuration management tools. Configuration Management is the art, or act of managing lots of computers in some organized fashion. The act of managing a computer involves what is put on the machine as far as software and also figuring out permissions, environmentals etc. The problem isn’t complex when you deal with maybe 1 or 5 machines. However, when you have a cluster, or a cloud, then having a good way to manage them all becomes very important.

In the world I came from, High Performance Computing, the job was a bit easier because every machine was identical. Every ‘node’ did the same thing. The only difference was the IP address, MAC address, and hostname. Everything else was identical. We never did any management other than the initial install plus some post scripts to make sure they were configured perfect. We could spend a few good solid days making sure our postscripts were perfect. That way if a machine died, or a new one needed to be added, installing it was trivial. In this we never needed any post configuration management. In addition the packages required were rather simple because a lot of the required files, libs, and programs were contained on the distributed file system. (NFS, GPFS, or some other way)

Another point to all this is that we usually kept our nodes ‘stateless’, or in other words ‘ram-root’ as it is called. Ram-root just means that the entire operating system resides in memory. You may say “wow, that’s a lot of memory” but keep in mind, the entire OS for HPC environments, including the memory hogging InfiniBand modules could be loaded in less than 200MB image. So when your modern Nehalem machines are usually equipped with 24GB of ram, then what is a measly 200MB of ram? Plus your system runs better cause its only doing what you want. This is all made possible via xCAT.

But, I digress. The world of cloud computing is different. There are different OSes, different applications, and we’re dealing with a very heterogeneous environment. Thus configuring the software on all of these machines is not as trivial of a problem. It’s no longer just one image that you need to be concerned about – it’s many!

Rather than creating my own, (which is never a good idea when there are so many great solutions available), I went to take a look at what was out there.

The most promising that I saw were:

Bconfig (bcfg2)
cfengine
puppet

Never the less, let me give some info on what I found:

cfengine

This tool was created by Mark Burgess. There is an interesting talk he gave to google that is available on YouTube here. cfengine seems to be the most venerable and developed, but it seems from the mailing lists I’ve read that it’s seem to lost its luster in favor of puppet.

Puppet

Puppet seems to be what all the cool kids are using these days. The web site is very well developed, the documentation seems to be organized well and far better than cfengine nor anything else I looked at. This really impressed me: If you want to make a good open source tool that everyone uses you need to do two things right:

1. You have to present it well on a web site with clear documentation, customer testimonials, and all kinds of good information.

2. You need to have to make it easy to use, get, install. IT is too complicated these days. No one wants to spend hours learning something. The easier you can make it to use the more successful it will be.

Puppet may not be better than cfengine (though I think they think it is) and it may not be better than bcfg2. But the presentation is worlds better, and that makes people want to use it. It invites you to use it. xCAT can take a page from that and it’s made me want to double my efforts in revamping the web page.

This shouldn’t be a surprise either. After all, this is what Apple does. They’re a marketing company. Presentation is everything. A good presentation, a good feel, and ease of use will make a tool stand out, even if it isn’t that much better than the rest in the pack.

Part of the marketing is that the person who started puppet used to code vigorously for cfengine adding lots of modules before striking out on his own. This gives people the idea that puppet is the next generation of cfengine. Its a good story. The ease of use is there, and so just on that alone, I can see why its all the rage now days.

bcfg2

bcfg2 or ‘bconfig’ seems to be the lone wolf of the pack. It’s web site even mentions that it doesn’t get as much press as it probably should. Well, what do you expect? This is a national lab full of unsexy engineers. (no offense guys/gals). They’re engineers developing tools. Having said that, Ti Leggett and I spoke and he showed me all the cool things bcfg2 could do. The modules in there seemed very cool as well as the client/server implementation.

My decision

So where does this leave me? Which one do you choose? Well, I hate to say it, but in my situation, I was looking for a solution that could handle an NFS root boot up. It was apparent that they could all handle this in a postscript bring up, but the solutions seemed to fall short when we got a little more specific:

Consider the case of an organization that want’s their images locked down. (meaning NFS root where nearly everything is read only and can’t be touched) This could be a large global organization so /etc/resolv.conf in a lab in Spain isn’t going to be the same as one in Montreal, even though they’re all using the same installation source. Never the less you want /etc/resolv.conf to boot up as a non-writable file, preferably nfs mounted. Sure the user could unmount the file and then change it as root, however no changes they make would stick.

It was a situation such as that where I couldn’t make use of these tools. Perhaps someone knows of a way to do it, but it seems to me that such a tool would need to be integrated into the creation of the ram disk. In addition this global traversing would have to go through a hierarchy of directories:

/foo/globalfiles/

/foo/usafiles/

/foo/newyorkcity/

/foo/datacenter3/

All of these directories may contain an /etc/resolv.conf or a SSH known-host keys that have to be integrated and concatenated down. Perhaps we could look at it from an object perspective instead and this would allow us to see if a node belongs to a particular class. If so how do you establish the hierarchy? It didn’t seem to me that the above tools could handle that. Maybe I’m wrong.

But I think like a lot of other people I would go with Puppet. Not because it’s technically better but because the crowd mind would look like this:

1. If everyone’s doing it, then its going to stick around and I’m not wasting my time learning a dying tool.

2. It’s so easy to learn cause all this documentation, then its not going to take me a long time.

Thus we see my friends, and my point: Sexy wins.