Recently, ( or not so recently, really ), my company decided that we needed a configuration management tool – something to rein in the mess that some of our systems are and give us the tools needed to manage and configure our systems in a sane manner.
Now, if you’re new to configuration management you need to know that there are alot of ways of going about it and quite a few players in the market. There are also alot of different opinions on exactly what configuration management really is. Some believe it to be configuration drift, others think it is actual management ( creating, updating, deleting ) of configuration files, and still others may think it’s just making sure that the system ‘looks’ a certain way. There are hundreds of ways of going about it.
Our parent company was doing this investigation along the same timeline that we were – they, of course, had much more time to bring in potential vendors and build up some in-depth proof of concepts. In the end, they picked a product called Opsware, now known as HP Data Center Automation. Since they had already purchased it and we’re always on the lookout to save a buck or two, we figured that if they had tested Opsware and it was good enough for them, then it was probably good enough for us too.
Oops.
Now, let me start of by saying that I’m not calling out my sysadmin colleagues at the parent company. They are all extremely skilled individuals. What I am saying is that we didn’t take the time to really mesh with them on exactly what the tool does versus what we wanted it for and that’s our fault. The reason I say this is because they POC’d the product more for what they needed it for – provisioning of systems, than what we needed it for – configuration management.
Now, after a few weeks of actually getting to play with the tool, I’ve got some real qualms about its quality as a configuration management suite. The overall feeling I get is that each part of the tool ( and for those who know Opsware, I’m talking specifically about the Server Automation module ) was developed in a silo and no thought has been put into actually making these different pieces flow together. Every time I started to figure out how to do something, I hit a roadblock because the tool just does not work a certain way.
The ironic thing is that Opsware does some of these items really, really well, while others seem to be the cause of a developer brain fart. The ironic part being that the things it does poorly are tasks that are rather simple. In any case, onwards to the issues!
In Opsware, you have 3 categories ( there is a 4th I won’t cover – patching ) of what is termed ‘compliance’. These are defined as Software Policies, Application Configurations and the redheaded stepchild of the product, Audits. Combined, this little trio is what tells you ( sorta ) if a system is ‘compliant’. When I say compliant, I’m saying that the system looks exactly how I’ve modeled it. Packages I want are installed, services I want are running, users are there or as in some cases, removed, etc, etc, etc.
First off is the Software Policy feature. Opsware seems to have attacked this at a strange ( I won’t call it bad yet ) angle. The general idea being that you have a group of people in your organization who set the policies of what software should be installed on the systems, users that should be on them, scripts that should be run, etc. These policy setters can define the policy and then attach it to the systems. Once attached, you ‘remediate’ the system and it applies everything in the policy to the system. In general, I get the concept. In practice, however, I’ve never come across an organization where someone who isn’t a sysadmin is saying what software should be installed on a system or what scripts that system should run.
Barring that, the concept seems to work until we start getting into the nuts and bolts of how it actually determines if my system is compliant or not. More on that in a second, I want to cover the general overview of the other two categories first.
Second, we have the Application Configuration section. The app config area is designed so that you can take configuration files ( /etc/hosts, /etc/fstab, etc ) and variablize them – in essence creating a template of what the config file should look like. You can then attach that app config to a host, fill in the variables and generate the config file and push it to the host. A concept I appreciate – especially because it has some cool inheritance features. For instance, I could define a template for my /etc/hosts file for my whole datacenter – then I could group some hosts and add some specific entries for them, then repeat at the server level and it all comes together into one application configuration file. Pretty cool, except for the fact that this is completely the wrong way to go about it in some cases. For certain configuration files, I don’t want to manage them as a config file – I want to manage the items in the config file as resources on my system and you’ll see why in a little bit. ( On a side note, you can include Application Configurations into your Software Policies. )
Lastly, we have the Auditing and Compliance section. This little baby is the sick nightmare of some developer who evidently didn’t attend many planning meetings. This module is so bad that I don’t even know how to explain it. The general idea behind this section seems to be that you are supposed to model an audit ( or audit policy ) and then run that audit against your systems to see if they meet the rules in the audit. You can also do this for a category called ‘Snapshots’ – basically a point in time picture of what the system looks like. You can then audit your systems against these saved snapshots. If you read that last line and shook your head, then you’re on the same page as me. In my opinion, a good configuration management system is a system where you model how a system should look and the tool makes it look that way. It is *not* a system where you manually configure a host to look some way, take a picture, then make sure other systems looks that way. Why? Because management of that snapshot means you have to manually manage a host everytime you want to make a change to the snapshot. It also means that you generally have *alot* of snapshots, which means it’s even more of a pain to change. Opsware does minimize this in that you don’t have to snapshot *everything* on the system, but it’s still painful and I’ll explain more in detail in a second when we combine all of these tools together.
To continue on the Audit section, you are able to create audit policies. In these policies, you define things like what users should exist on a system, custom scripts that you would like to run ( these are cool ), packages to check for ( not software policies.. ), files that should be on the system, etc. You can even include Application Configurations, although they work differently in Audits. It seems great at first until you realize that some of these things actually require a source – for instance, I can’t tell the Audit that /etc/hosts should be owned by root. That requires me to have a snapshot of another /etc/hosts to compare it against. Hell, I can’t even tell it that a symbolic link should be in place – I need to snapshot the symbolic link from a host where it exists already. Generally, anything that is a file, I need a source to compare it against and the only way to get a source is to compare it against another live machine or a snapshot – both bad ideas. It gets even better when you get into the User and Groups – you can audit these – halfway! Specifically, I can tell my audit that a user should not exist and that will work fine – if it finds my user, I can have the audit delete it. I can’t, however, tell the model that if a user doesn’t exist, to add him. What the.. ? I can tell it to compare the host against a source snapshot and THEN it gives me the option to add the user if it fails to find it on the host I’m auditing but, and I love this, it doesn’t add the user with the same UID! It just spits out a generic ‘useradd steve’, and I end up getting a completely new user ID. Ah, padawan, you are starting to see behind the mask…
Now that I’ve disabused you of the notion that just because it’s commercial, it’s great, let’s really bring out the big guns here. We’re going to run through a few examples of common things you may want to do in a configuration management system and we’ll see how we do them in Opsware.
First off, let’s say we want to manage the users on the system. As I’ve already explained in the audit section, we tried to do that there and it ended up failing because the user ID’s don’t stay the same. Well, guess what? They never do. That’s right, the Software Policy section adds users with the same bone-headed command and doesn’t bother to add the UID in there. Even funnier, however, is that I can store the user in the Opsware ‘Library’, and it has the UID of that user in there! It just doesn’t bother to use it! Instead, the only way to do this is to create app configs for all the files that deal with creation of a user account and have Opsware manually edit those files and put the values in, then run scripts to create the directories and chown them. Not really that intuitive.
Alright, fine you say, let’s manage host entries. I’ve now got 3 ways I can do this – I can create an Audit, add the application configuration template for the /etc/hosts file and tell the audit to make sure that there is a line that begins with x.x.x.x and has hostname a, b, and c. Or I can just attach the Application Configuration to the host and use the template GUI to specify those values. Or I can make a Software Policy and add an Application Configuration in there and when I attach the software policy to a host, it automatically attaches the application configuration and then I can use the same template GUI. And I have to ask – why? Why do I need 3 different ways of managing host file entries? Especially with the Audit module? The Application Configuration module already will tell me if the system isn’t compliant with what I specified in the template GUI – why do I need to create an audit, fill in the same values again, and have it tell me the same thing? Even better, this means I can contradict myself – I can have the Audit remediate the host file and by doing that, make it uncompliant with the Application Config attached to the host.
It gets uglier – for this example, let’s say you have a utility that you’d like to install on a system and when you install that utility, it needs a host file entry. This looks like a great job for a Software Policy. We add the utility package into the policy, then add the app config template for the /etc/hosts file and define that the default template should have this specific host file entry. I attach the policy to the system and it installs my utility and adds my host file entry. Woo! Everything looks great. Now, lets say I want to remove the software policy. I detach it from the host, it uninstalls the RPMs and.. it leaves the host file entry in /etc/hosts. Instead of managing that host file entry as a resource, I have to write uninstall scripts that get rid of it.
Next up – let’s say I have a few files I want to layout on a file system. I can create a zip, import them into Opsware and create a Software Policy for the zip file. Attach it, it deploys, tells me that my system is now compliant with Policy ‘Apache Blog Vhost’. Great. Now, in a fit of anger, someone goes and removes the files that the Software Policy deployed from the zip. A few hours later, I see my blog isn’t working. So I run my software policy compliant check and get all greens. That’s right – the Software Policy doesn’t actually care about what files it lays out. It only cares that *it* considers the zip to have been deployed and until you remove the zip with Opsware, it will continue to tell you that your system is compliant.
It just keeps getting better though! Let’s say I create an Audit that checks for some baseline install items. One of these items may be to ensure that a certain package is installed. Heck, I even have a Software Policy that I use to deploy the package to systems. The bad part here is that the Audit module has no tie in to the Software Policy module! I can’t remediate the audit failure by applying the Software Policy that I already defined. This is a prime example of what I meant when I say that the modules do not interface with each other that well.
All of this gets back to a couple ideas – the main one being that you need to detach the ‘what’ from the ‘why’. The admin shouldn’t need to care how something is implemented nor what files are being modified. The admin only cares about the resource being available – he only cares ‘why’ he needs the resource. I don’t care that a user account requires that the passwd and shadow files have entries for him – I just care that I have the user on the system. This is especially important when you start crossing OS platforms – a user is a user is a user. But Windows, AIX and Linux all implement users a different way. The same goes for other things – like my host file entries. I want to manage the host file entries as specific resources – I don’t want to manage the host file itself with a bunch of different templates – I just want to have a resource that I can add, modify or delete. If I want to manage the host file as a whole, the template style of Opsware works to a degree and for some configurations, but it fails for simple files that I would just rather manually edit.
This is where Opsware fails – it just doesn’t have the right philosophy for dealing with the complex nature of systems and their configurations. It tries to make you manage the implementation and that just doesn’t work. The whole product is disjointed – areas don’t tie in well together and you’re either provided with too many options that just doesn’t make sense or you’re provided with no options or ones that just don’t work.
Before Opsware came our way, I got to fool around with another configuration management suite – Puppet by Reductive Labs. All I can say is that they know what they are doing. They have the right mindset and they have the background to understand the system administrator’s point of view. The only reason we were unable to use their product was its lack of Windows support and spotty AIX support – if, one day, our datacenter becomes Linux only, I will be recommending we ditch Opsware and get Puppet.
Dreams can come true, right?









Great Post, I hope that you dream becomes true soon and you’ll then have more fun doing modern configmanagement aka Data Center Automation.
Please let us then know how you succeeded with Puppet.
Its time the world moved on towards BMC Bladelogic.
Thanks for your sharing, Opsware sounds sucks. But, I heard Bladelogic can manage heterogeneous data center very well. Google is using Puppet for its unique infrastructure, I can not image Google could use same thire own OS on every server. So, Google just only took fair effor on Puppet, then it works everywhere. Good luck!
Bladelogic was actually a less of a fit than Opsware was – also remember that I’m only talking about *one* specific part of one module – Opsware is a much bigger product overall and has many other modules that I have no experience with.
Puppet would work well in our datacenter although we’d have to create, from the ground up, support for Windows – not something we have the time to do. If we were only a UNIX based shop, it would work but there isn’t any chance of us becoming that.
hey… hate to bust into this conversation; but I’m a little desperate – need bladelogic engineers and coming up with nothing. Target is hiring like crazy. jessica.elias@target.com if you’re interested in hearing more…
sorry for the intrusion