The other day, I caught a message that KSplice was available for Fedora. I thought I’d be a wiseguy and I replied “Yeah, great. Call me in 20 years when it’s available for for RHEL”. Well, as several people pointed out, it turns out the joke is on me.

As you can see, it’s actually available for many Linux-based OSes at various prices. I suppose my confusion stemmed from the fact that I misunderstood what ksplice was.

My impression from a long time ago, when it first came out on Ubuntu, was that it was essentially a kernel patch that dynamically loaded patches and provided the ability to rebootstrap a kernel that was already loaded. As it turns out, it’s a commercial product that offers the ability to not have to reboot your machine to update the kernel. Let me be frank: I’m all about that.

The part that I kind of object to is in the press release, of all things. It’s the opening line of the company profile:

Ksplice is an enterprise software company making reboots a thing of the past.

Please, lets be honest. Reboots are inevitable. Using this product as a stop-gap for untimely reboots may be handy (at the low low price of $50 per year per server), but it can’t (and shouldn’t!) replace regular reboots.

The reasons for scheduled rebooting of machines are numerous. The primary one is that regular reboots assure that the machine is configured to boot correctly. If you’ve got a machine that’s got over 100 days of uptime, how do you know it will start correctly? You last booted it last quarter…what has happened to that machine since then? Changes in installed services, mountpoints, etc…it’s hard to tell if it’s going to be in a known-good state when it comes back up after a power failure.

Another reason to reboot occasionally is to clean up the running state of the machine. What’s that you say? Your machine is running fine? Well, sure, it may be, but how much cruft is left hanging that isn’t obvious? Have you ever used kill -9? Do you know for sure that there aren’t any memory leaks in your running services? Any processes hang while reading I/O and is now stuck in uninterruptible sleep?

Yes, there are lots of things that happen to servers over the course of doing their jobs. A reboot fixes many of them. The only argument against it is uptime.

I’ve written about uptime before, and I still feel the same way. Modern system administration has advanced beyond a single server providing a service. Uptime needs to be measured from the outside in, and according to the availability of the service, not the individual servers comprising that pool.

Feel free to disagree. Let me know if you’ve got an uptime of a year plus and you’re proud of it, or if you would be ashamed to be in that position.

Edit



This entry is causing quite a stir on Reddit. Cxunix from twitter also weighed in on his blog, servermanaged.it (link is in Italian, English translation here).