Software Updates: The Good, The Bad, and The Fatal
A big part of IT management is handling upgrades and updates of all systems, critical and otherwise. Just about every aspect of IT, every piece of hardware and software, will need to be updated at some point along the line. It may be applying a firewall firmware upgrade or installing patches to an operating system or application stack. In every single one of those cases, there will be some degree of chance that the update blows up everything.
From a purely logistical point of view, there are only three possible outcomes to a firmware or software update:
- Everything goes as planned. Bugs are fixed or new functionality is added, and everything proceeds normally.
- There's no noticeable difference in operation or administration aside from a version number ticking upward.
- You've just turned a working system into a brick.
The odd thing is that there's really no good way to mitigate against that last possibility. Generally speaking, I only update software and firmware if there's a clear-cut reason to do so, such as a significant feature addition, a performance increase, a security fix, or a major bug that's causing problems. I do not upgrade just because there's a new version out. I've been bitten too many times.
Due diligence is the name of the game in every upgrade plan. Researching the new version is an imperative, especially regarding how well it plays with other elements of the device or with software that may be running on the same system. If you run across posts in forums or in blogs regarding problems with the new version, it pays huge dividends to inspect them carefully and ensure you're not about to fall into the same trap. That said, there's absolutely no way to guarantee you won't get stung when you poke at the hornet's nest.
I've had firmware updates go south because of something as simple as bad timing. In one case, the vendor's mechanism for updating required that the device download the update itself after rebooting to a special upgrade mode. Naturally, the vendor's firmware repository went down halfway through the download and left me with a device in an extremely unstable state, with no path forward or back. After hours of digging into a problem that had apparently never occurred before, I was able to trick the update code into thinking it hadn't yet started the download and was able to recover the device.
In other cases, the star-crossed devices turned into paperweights that had to be returned. This is especially common with vendors who are spotty with checks on proper bootloader versions and firmware versions, leading to situations where the bootloader doesn't get updated, but the firmware does, and the device (typically a switch) simply won't boot again.
I've done mass BIOS upgrades on bunches of identical blade servers, only to have one out of a dozen fail to reboot, completely hosed, with no POST, no service processor communication, nada. There was no rhyme or reason to it. The blades were procured at the same time, and all BIOS versions were the same, but one blade went from perfectly normal operation to completely dead on one reboot.
In another case, I "succeeded" in ruining a network management card with a firmware update that was designed for a completely different card. My error was selecting a file that was one character different from the correct file, but instead of doing a model check and issuing a warning, the firmware update went through and rendered the card useless. In times like these, you have to trust that the vendor has done its homework and has implemented significant safety checks in its update code. Sadly, this isn't always the case.
Speaking of the wrong firmware, many vendors seem to make it their business to obfuscate the right firmware versions and compatibilities, complicating the whole process and adding unnecessary risk. I've seen situations where there are three different versions of a BIOS update for a blade server, but only one right update depending on the serial number and hardware revision of the blade. In other cases, you can't update from version X to version Z without updating to version Y first, yet version Y is inexplicably absent from the vendor site. I fully admit I've downloaded firmware updates from shady sites in order to be able to complete the process. That always leaves a bad taste in your mouth, no matter how positive the final outcome.
With software -- as opposed to firmware -- you generally have significant forms of protection against explosive updates. You can take a snapshot of a VM before updating the OS, for example, or back up a database before applying a patch. In most appliances, this isn't an option, and you steel yourself for the interminable waiting period as a device applies an update, at the same time trying to urge the progress meter forward through sheer force of will. Alternatively, you distract yourself with another task so that you won't watch over it like boiling water.
When something goes awry, you're generally left with an uncomfortable decision to make: Do I reboot it and hope it comes back, or do I let it sit seemingly forever, hoping it merely has a timeout to complete before it snaps back to attention?
After watching a Windows Server 2008 R2 box sit at the final stage of applying updates for two hours -- all the while researching possible causes -- you finally bite the bullet and reboot. But it's like working completely in the dark, and in IT, that's always a bad idea.
What else is new? All you can do is arm yourself with as much information as you can, including the vendor's support number, and venture into the wild.