[I’ve updated this series with information based on the new Havana release and changes in vSphere integration with the latest version of OpenStack. Any changes are prefaced with the word “Havana” in bold brackets]
I am often asked by customers to compare the capabilities of OpenStack to the vCloud Suite and KVM to vSphere. specifically, the questions revolve around features in vSphere, such as High Availability (HA), Distributed Resource Scheduler (DRS), and vMotion. These customers want to know if they should choose to use KVM with OpenStack, will they be able to use comparable features to what is available with vSphere? This is a common question since KVM, like OpenStack, is open sourced and is the default hypervisor used when installing OpenStack. Continuing on from part 1 and part 2 of this series, where I reviewed the architecture of vSphere with OpenStack Nova and DRS, I will be spending some time on HA and vMotion before moving on to design and implementation details of vSphere with OpenStack in upcoming posts. Please also see part 4 for my post on Resource Overcommitment and part 5 for Designing a Multi-Hypervisor Cloud.
High Availability
As most readers of this post will know, High-Availability (HA) is one of the most important features of vSphere, providing resiliency at the compute layer by automatically restarting VMs on a surviving ESXi host when the original host suffers a failure. HA is such a critical feature, especially when hosting applications that do not have application level resiliency but assume a bulletproof infrastructure, that many enterprises consider this a nonnegotiable when considering moving to another hypervisor such as KVM. So, it’s often a surprise when customers hear that OpenStack does not have the ability natively to auto-restart VMs on another compute node when the original node fails.
In lieu of vSphere HA, OpenStack uses a feature called “Instance Evacuation” in the event of a compute node failure (keep in mind that outside of vSphere, a Nova Compute node also functions as the hypervisor and hypervisor management node). Instance Evacuation is a manual process that essentially leverages cold migration to make instances available again on a new compute node.
At a high-level, the steps for Instance Evacuation is as follows (performed by a Cloud Administrator):
- Use the nova host-list command to list all compute nodes under management.
- Choose an appropriate compute node (If user data needs to be preserved, the target compute node must have access to the same shared storage used by the failed compute node).
- When the nova evacuate command is invoked, Nova Compute reads the database that stores the configuration data for the downed instances and essentially “rebuilds” the instances on the chosen compute node, using that stored configuration data.
- With Instance Evacuation, the recovered instance behaves differently when it is booted on the target compute node, depending on if it was deployed with or without shared storage:
- With Shared Storage – User data is preserved and server password is retained
- Without Shared Storage – The new instance will be booted from a new disk but will preserve configuration data, e.g. hostname, ID, IP address, UID, etc.. User data, however, is not preserved and a new server password is regenerated.
As mentioned in previous posts, vCenter essentially proxies the ESXi cluster under its management and abstracts all the member ESXi hosts from the Nova Compute node. When an ESXi host fails, vSphere HA kicks in and restarts all the VMs, previously hosted on the failed server, on the surviving servers; DRS would take care of balancing the VMs appropriately across the surviving servers. Nova Compute is unaware of any VM movement in the cluster or that an ESXi host has failed; vCenter, however, reports back on the lowered available resources in the cluster and Nova take that into account, in its scheduling, the next time a request to spin up an instance is made.
So the question that users may ask, particularly those with a VMware background, is “Why would OpenStack be missing such a ‘basic’ feature such as High-Availability?” Speaking for myself, there are a few factors to consider:
- Since OpenStack itself does not have it’s own specific hypervisor, it chooses to expose the functionality that is available that comes with each hypervisor. So for example, HA is available with Hyper-V and vSphere since those hypervisors natively support HA.
- Since OpenStack was designed from the beginning to be a Cloud platform, it follows certain design principles that differ from a “traditional” virtualized infrastructure. Some of these differences are detailed in Massimo Re Ferre’s excellent post on pets vs. cattle.
Some of these “Cloud vs. virtualized infrastructure” principles include the following:
- A VM instance and a compute node are commodity components that provides services. If a VM or a compute node dies, just shoot it and restart.
- Resiliency is multi-layered and is required to be built into both the application and infrastructure layers.
- Scale-out is preferred over scale-up, not only for performance, but for resiliency. By distributing workloads across multiple instances and compute nodes, the impact of a failed instance is minimized, allowing time to evacuate instances.
[Havana] It’s also noteworthy that some vendors, such as Piston Computing, have built HA into their OpenStack distribution. Also, with the new Heat project that is now part of OpenStack, instance-level HA could be written into an OpenStack Cloud using auto-scaling.
Still, Enterprises may wonder how their legacy applications, which require more resiliency at the infrastructure layer, fits in with OpenStack. This is an area where vSphere can provide both a competitive advantage over other hypervisors supported by OpenStack and provide a more robust option in a multi-hypervisor Cloud. Deploying vSphere with OpenStack, users have the option of creating a tiered Cloud architecture where new apps can be hosted on KVM or Xen instances while legacy apps, that require vSphere HA functionality, can be hosted on vSphere.
OpenStack Nova supports VM migrations by leveraging the native migration features inherent within the individual hypervisors under its management. The table below outlines what migration features are supported by OpenStack with each hypervisor:
The biggest difference between vSphere and the other hypervisors in the table above is that cold migration and vMotion cannot be initiated through Nova; it has to be initiated via the vSphere Client or vSphere CLI. So, other than vSphere HA, the hypervisors above supported by OpenStack are mostly in parity with one another, which makes HA and Storage vMotion the most compelling reasons to use vSphere. What will be worth watching in upcoming OpenStack releases are both the development of new features by other hypervisors to match vSphere functionality and VMware’s continued commitment to make vSphere a first class hypervisor solution in OpenStack.
Related articles
- OpenStack For VMware Admins: Nova Compute With vSphere, Part 1 (cloudarchitectmusings.com)
- OpenStack For VMware Admins: Nova Compute With vSphere, Part 2 (cloudarchitectmusings.com)
- OpenStack is not vSphere… It’s Automation. (allyourcloud.wordpress.com)
[…] part 2 for DRS and part 3 for HA and VM […]
[…] how to design, deploy, and configure vSphere with OpenStack Nova compute. Stay tuned. Please see part 3 for information on HA and VM Migration in […]
Great post and a great series Kenneth!
I would like to see some more information about how the failed instance is restarted – where it is placed – in comparison with the way it work in HA.
Another question – does Openstack support connection Directly to an ESXi server as a compute node?
Maish,
Great idea! I will either add more details to this post or in an upcoming post where I’ll do more details on implementation and operations.
It is not possible to run the Nova Compute services on an ESXi or vCenter host on ESXi or Windows. You can connect a compute node to an ESXi host directly but then you don’t get HA, vMotion, or DRS.
Ken
I think there is an error in your hypervisor table. OpenStack with KVM has supported block migration for a long time: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3324
Good catch! Table updated.
Ken
[…] OpenStack Compute For vSphere Admins, Part 3: HA And VM Migration (cloudarchitectmusings.com) […]
[…] This is part 4 in an ongoing series on OpenStack for vSphere Admins. You can catch up on previous posts by following the links here for part 1, part 2, and part 3. […]
[…] Admins. You can catch up on previous posts by following the links here for part 1, part 2, part 3 and part […]
[…] OpenStack Compute For vSphere Admins, Part 3: HA And VM Migration (cloudarchitectmusings.com) […]
[…] blog posts. I encourage readers to review my previous posts on Architecture, Resource Scheduling, VM Migration, and Resource […]
[…] OpenStack Compute For vSphere Admins, Part 3: HA And VM Migration […]