Building a Better VMM: 8 Ideas

One day, I was out traversing the wide, untamed world of my Twitter feed when I came across this most lovely post by Aidan Finn that does a great job summarizing Hyper-V’s Biggest Weakness. That post is required reading for this one. Actually, that post should be required reading for anyone that has anything to do with Hyper-V. Start there, then come back here.

Aidan’s post is about the state of management tools for Hyper-V in general. I’m going to focus on System Center Virtual Machine Manager (VMM). Even if you don’t use VMM, this article is for you. If you have Hyper-V but not VMM, then that is something that Microsoft seriously needs to address whether you (or they) realize it or not.

I believe that there should be a set of free tools with a premium management pack. The free tools should be enough for anyone to get by with minimal stress and the premium tool should not be required but it should provide a value-add that exceeds its price point. It should also either be a plug-in to the free tools or it should be able to do everything that they do so that a premium user doesn’t need to flip between tools. As these products stand today, none of those things are true. That’s part of the reason so many people aren’t using VMM. What I really want to focus on is the problems in the VMM product.

As I was reading that part of Aidan’s post and several of the comments, my central thought was, “I’m not alone.” Using VMM is a fairly miserable experience for me as well and one that I avoid as much as possible. It seems that each time I do break down and open the application because there’s something that I need that only it can do, I discover some error condition that poses absolutely no problem for my Hyper-V environment but has VMM in a total panic. I then have to spend time, usually hours, correcting that before I can do what it was that I needed to start VMM for in the first place. The program is extremely brittle, which isn’t a good label at all for an infrastructure management component.

If you look at the bottom of Aidan’s article, he links to a site where you can go to provide feedback to the relevant Microsoft team. That’s something I think you should do. Of course, it’s something I think that I should do as well. But as I sat there in front of those feedback blanks, I suddenly had a hard time articulating exactly what I wanted to say. That’s what happens when you hit your maximum frustration point with something. So, the purpose of this article is to lay out some of the things I see that are wrong with VMM and some of my suggestions, and hopefully that will get more community ideas flowing.

As a side note, there really is a benefit in speaking up. I hear a common complaint that Microsoft doesn’t listen to its customers. The truth is, they do. They just may not have listened to that one complaint that you had (I’m still pretty bitter over losing TechNet subscriptions and am not going to get over it any time soon) and they may not be listening in the places where you’re speaking. Use the feedback forms. Sign up for a research panel. Do not go out to some random Internet forum and stir up a bunch of drama. Whatever you do, be an adult about it. There is no one — repeat, no one — that looks at a post littered with “M$” and “Microshaft” and “Winblows” and “Windoze” et al and thinks, “Wow, that is a deep, insightful, carefully crafted commentary that has influenced the way I view things and will surely foster a great deal of admiring respect for the author.” What the few people who don’t roll their eyes and keep scrolling really think is, “Oh boy, yet another immature person that should never have been allowed off the schoolyard, much less into a professional datacenter, that thinks they’re being original and clever by parroting epithets that weren’t very funny when we first heard them 20 years ago and haven’t gotten any better since.” Please submit your ideas, but use your grown-up words. With that said, the VMM product is not in good shape.

My Two Big Wishes

1: Aidan’s article talks about “refactoring” the VMM product Hyper-V management stack. I’m in the mood for something much more aggressive. I think the VMM team should make a plain-text list of all the features that VMM has (and you know, that’s an impressive list), and maybe a few they’d like to add. They should then just throw all the existing code out. Put it in the recycle bin and set the bin on fire. Start from scratch with the user interface and user experience teams leading the way. Get input from research panels. Once all that’s collected, go and make a good product.

2: Beef up Hyper-V Manager and Failover Cluster Manager. I understand why these are separate products and I don’t think it’s asking too much of people to use separate management tools because a failover cluster and a hypervisor are two separate infrastructure components. However, Hyper-V Manager should have much more control over the failover features of the VMs that it manages and it wouldn’t hurt Failover Cluster Manager to have a little more control over host nodes.

My Wishes for VMM

I need to accept the reality that VMM is probably not going to be reset like that. However, my boss made a really good observation that holds up pretty well under testing: Microsoft usually doesn’t get anything right until the fourth or fifth try. VMM is really only on version 2 right now. It’s gone through more version numbers than that so that it can keep up with the rest of the System Center suite, but realistically, everything up to 2008 R2 was version 1 and everything since has been version 2. Hopefully, Microsoft will realize that this is a weak point and throw some resources at it, or we’re not going to have that solid version 4 until 2020 or later. Anyway, rather than spend a lot of energy chasing things that have a nearly zero chance of happening, I’ll list out some things that might.

1. Error Messaging

The error messaging in VMM is…, well, it’s just terrible. The ones that have intelligible text usually have words with little or nothing to do with the actual problem. If you get an error, you’ll likely use its code to search the Internet for solutions. Unfortunately, your best hope is that someone else has a.) figured it out and b.) written a blog article or forum post about it. If not, you usually wind up having to take some pretty poor troubleshooting steps, like reinstalling something or manually tinkering with the database. My first wish is for the error messaging to be improved dramatically. Solutions should be available on TechNet. If not, just follow the Windows team and remove the useless descriptions and replace them with “Oh gee, that didn’t work, terribly sorry” platitudes. Those aren’t helpful either, but at least they don’t have us chasing red herrings.

The other problem with error messaging is that you have to take extra steps just to find out something is in an error state. If I open a dialog box, perform an action, click OK, and am returned to the main screen, my natural assumption is that everything went well. That’s a bad plan with VMM. Everything in this application is considered a “job”. That in itself isn’t so bad, but clicking OK in a dialog doesn’t mean “do what I said,” like it does in other applications. Here, it means “start a job to do what I said.” That’s why the program won’t throw an error, because as far as it’s concerned, its responsibility is satisfied the moment that it submits that job. If it threw an error 15 milliseconds later, that’s your problem. You have to train yourself to start something and then immediately flip to the Job tab to see what happened with it… or then, maybe not. Sometimes, the job window will automatically appear. It even has a little check box on it to prevent such job windows from automatically appearing in the future. It does not have a check box to have that box behave with any sort of consistency. The thing is, this is a problem that has a very old, very reliable, very common solution: status bars. If a job goes south and you don’t want to notify me with a pop-up window, put a nice little flashing red something in a status bar. You also have a list of items, one of which I just worked on. Add a scrolling progress bar field to that list that ignites when a job is operating on that item.

2. Networking

Networking in VMM is just ridiculously over-engineered. Networking in Hyper-V is simple (once you get over the concepts): you set up your physical infrastructure, you make a virtual switch on top of it, you attach virtual network adapters to it, and maybe you configure them with VLANs. All done, no problem. Not so in VMM. VMM tries to wrap everything up in a nice, tidy package, but manages to deliver a mess that you need a real product expert to sort out.

How many places do you really need to go to configure networking? There are at least four that I can recall, and I’m coasting now because my network is mostly configured the way I want it. I dread needing to make any changes to it, though. There’s just way too much going on. I understand that VMM has extra needs because it enables network virtualization features and such that Hyper-V alone can’t provide, but that’s no excuse for the mess that it is.

Does adding a new VLAN really need to be a major networking configuration event? Why do I have to work harder to add a new VLAN to my VMM environment than my network engineers do to add it to the real environment? You might have some grandiose idea that you’re protecting me from myself by restricting where VMM allows a VLAN ID to appear, but I don’t need that, and when I hit my frustration limit I’m just going to use PowerShell to directly assign the VLAN ID where I need it anyway.

Why can’t I convert between a “logical switch” and a “virtual switch”? The thing is, the logical switch can be really annoying to set up using VMM. Creating a virtual switch natively in Hyper-V is ridiculously simple regardless of the circumstances. Why can’t I make the switch using native Hyper-V PowerShell cmdlets and just let VMM take it over as a “logical switch”? Hyper-V doesn’t even seem to think there is a difference, so how much work would that really be?

While we’re on that subject, why is it such a pain to create a logical switch? Specifically, I’m thinking about fully converged switches. You can kick off the creation, but as soon as VMM loses contact with the Hyper-V host, it panics. This would be more acceptable if it didn’t have an agent sitting on the host that should be aware of what’s going on and could be dealing with such things.

With all this over-engineering, you can’t assign a VLAN to a virtual machine when you deploy it. Or, maybe you can, but it’s a big secret. You worked too hard on features that not many care about and ignored the features that nearly everyone cares about.

3. The Agent

The agent seems to be pretty fragile. I do far more agent reinstalls than I believe that I should have to (0 seems like a reasonable limit). Since VMM is something I use in my production environment and not my lab, I leave it alone. It seems only fair to ask that the software I’m not actively trying to break should be fairly solid.

4. Bare Metal Deployment

I spent nearly two months getting bare metal deployments to work. Once I did, I had a nice long two month hiatus between that and my next deployment. In the interim, an update roll-up was released for VMM. I don’t know if that was the cause, but I had another nice long fight with bare metal deployment afterward. The thing is, I’d rather use DSC or just my own PowerShell scripts and not even deal with bare metal deployment, but it’s the only way to get fully converged logical switches working without a lot of other excessively complicated busy work.

The first issue with this feature is that it’s really poorly documented. In conjunction with the really poor error messaging, it’s really hard to determine why something went wrong. One thing that hit us especially hard is that all the necessary firewall ports are not documented. We wound up calling Microsoft support and they, largely on a hunch, decided to run traces on our network. So, the conclusion is, they don’t entirely know how it works, either. Side note: the firewall issue seems to be fairly common at Microsoft. If you use internal hardware firewalls and a Microsoft product doesn’t work right, break out Wireshark and go it alone. Support will take at least a couple of days to sort it out if you get into the top tier and a week or two if you’re in the bottom tier. I think it’s kind of embarrassing that Microsoft doesn’t know the firewall ports for its own products, but the people I’ve talked to there seem to think there’s nothing strange about having developers that set up software communications channels and don’t tell anyone what they are.

The next thing about bare metal deployment, and one that I did leave feedback on, is that it’s annoyingly all-or-nothing. If there’s a problem somewhere, you’ll probably invest twenty minutes in waiting to find out. If it fails, there’s no real retry button. You can retry the job from its last failed point, but since you can’t change any parameters, odds are pretty high that it will just encounter the same problem again. There are many manual data entry points for a bare metal deployment, and you can’t make any errors. Each attempt requires at least one reboot of the target host system, which, since so many modern servers take longer than a 1980’s 80286 system to boot, can cause a great deal of lost time and frustration. This inability to click Back to fix things that the “wizard” should have identified during a pre-flight phase has become a tiresome, endemic problem across the entire Microsoft stack.

5. PowerShell

I’m a big advocate of PowerShell and all the fantastic things it can do, so this doesn’t come easily. The PowerShell module in VMM is… well, it needs some help. Since I don’t use VMM any more often than I have to, I don’t use the PowerShell module much, either. It tends to be on the frustrating side though, as items I expect to pipeline from one cmdlet to another don’t seem to be well-received. That, ultimately, is more of a learning-curve issue than anything else, which is to be expected. What’s not expected is that the authors of the module broke one of the cardinal rules of PowerShell. When a module shadows a cmdlet from a pre-existing module, it should replicate all of the functionality of the original. VMM’s PowerShell module breaks functionality in the base Hyper-V module such that at least the Get-VM cmdlet no longer functions as expected. That wreaks havoc on a lot of my scripts, and I’ve seen it break those from other people too. You have to know PowerShell enough to realize where the break is occurring, because it is not obvious. Since I tend to not load the VMM PowerShell module unless I’ve run out of options, I don’t yet know if there are others that it breaks, but it does create aliases for several other cmdlets in the Hyper-V module.

As for the learning curve, that’s partially the interface’s fault, too. What most people do, regardless of the technology, is undergo all their on-the-job learning in the graphical interface, gradually figure out how to get PowerShell to duplicate that, and then make the jump to using PowerShell. VMM’s interface is so complicated and its processes so error-prone that you’re completely exhausted once you finally get it to do anything. There’s really no energy left over to go wrestling with the PowerShell cmdlets as well.

6. Packaging and Pricing

VMM 2008 R2 came in a “Workgroup” edition that allowed you to manage up to 5 hosts for about $500 at maximum retail cost. This made it good for small and medium businesses. Buying it alone meant that you lost out on the enhancements of Operations Manager, but that’s what I meant by the “value-add” comment above.

Starting with 2012, the “Workgroup” edition was no more. It was replaced by an odd licensing scheme that allowed you to have two virtual machines on a single host at a somewhat reasonable price, but anything after that had a very high price tag attached. The benefit* was that you got all the components of System Center at that price. I have to put an asterisk by “benefit” because not everyone wanted that. The medium-sized organization I was working for at the time of 2012’s debut had elected to forgo Operations Manager in the 2008 R2 edition because it didn’t add anything we really cared about. We evaluated the other components of System Center and decided that we simply didn’t want them and wouldn’t have used them even if they were free. The pricing change in 2012 amounted to a gigantic upsell of products that we had no interest in, effectively escalating the cost of VMM well beyond our budget or interest. The fact that it was, from our point of view, inferior to 2008 R2 didn’t help matters any.

7. It’s OK to be Microsoft

You can use VMM to manage not only manage Hyper-V hosts but ESX and XenServer as well. That’s all well and good, but it does seem like a great deal is sacrificed to shoehorn them in. Do you think that VMware or Citrix would short-change their own interfaces just to cram support for Hyper-V in? Of course they wouldn’t. No one expects them to. So, why is Microsoft sacrificing anything to give support for these products? It’s OK to let them in, but don’t dumb down Hyper-V support to do it. Make them second-class citizens if you have to. People are likely to continue using vSphere and XenCenter anyway. Don’t force us to keep using Hyper-V Manager and Failover Cluster Manager when we’ve bought into your premium platform (at a really high price, too).

8. I Know What I’m Doing. No, Really

Have you ever met anyone that says, “Hooray! VMM has marked my VM as ‘Unsupported Configuration’ and won’t let me manage it!” Me neither. And what, really, is the point of it anyway? So, let’s say I create a virtual machine on a CSV and don’t make it highly available. VMM won’t let me manage it until I rectify the problem. Why can’t I just use VMM to rectify the problem? Does this really make sense to anyone? Yes, I’m aware that a particular VM only has one preferred owner. Why does that cause a hard block on me being able to Live Migrate it to a non-preferred host if I want to? Does someone not understand what the word “preferred” means? I tried to change the VLAN on a guest, VMM freaked out about it for some unintelligible reason, and now it’s unmanageable? Really?

VMM’s Future

There is a pretty simple statement to be made here. If Microsoft is truly serious about becoming the de facto virtualization solution, VMM absolutely must be attended to in a very serious fashion and in very short order. It wants to be a major data center component but it is too unstable, too unwieldy, and too limited in scope to fulfill that dream. We all need to do our part to make it better, and the part that we in the community can do is submit ideas.

If you want to give feedback to the VMM team, use this link (remember to be nice; programming and interface design is really hard work and we don’t know what kind of conditions they’re working under): https://systemcentervmm.uservoice.com/forums/280803-general-vmm-feedback

If you want to vote on Aidan’s idea, this is the link: http://windowsserver.uservoice.com/forums/295050-virtualization/suggestions/7932453-new-integrated-ui-for-hyper-v.