[Updated For Havana] OpenStack Compute For vSphere Admins, Part 3: HA And VM Migration

IndiaTvb78b2b_motor1[I’ve updated this series with information based on the new Havana release and changes in vSphere integration with the latest version of OpenStack.  Any changes are prefaced with the word “Havana” in bold brackets]

I am often asked by  customers to compare the capabilities of OpenStack to the vCloud Suite and KVM to vSphere.  specifically, the questions revolve around features in vSphere, such as High Availability (HA), Distributed Resource Scheduler (DRS), and vMotion.  These customers want to know if they should choose to use KVM with OpenStack, will they be able to use comparable features to what is available with vSphere?  This is a common question since KVM, like OpenStack, is open sourced and is the default hypervisor used when installing OpenStack.  Continuing on from part 1 and part 2 of this series, where I reviewed the architecture of vSphere with OpenStack Nova and DRS, I will be spending some time on HA and vMotion before moving on to design and implementation details of vSphere with OpenStack in upcoming posts.  Please also see part 4 for my post on Resource Overcommitment and part 5 for Designing a Multi-Hypervisor Cloud.

High Availability

As most readers of this post will know, High-Availability (HA) is one of the most important features of vSphere, providing resiliency at the compute layer by automatically restarting VMs on a surviving ESXi host when the original host suffers a failure.  HA is such a critical feature, especially when hosting applications that do not have application level resiliency but assume a bulletproof infrastructure, that many enterprises consider this a nonnegotiable when considering moving to another hypervisor such as KVM.  So, it’s often a surprise when customers hear that OpenStack does not have the ability natively to auto-restart VMs on another compute node when the original node fails.

In lieu of vSphere HA, OpenStack uses a feature called “Instance Evacuation” in the event of a compute node failure (keep in mind that outside of vSphere, a Nova Compute node also functions as the hypervisor and hypervisor management node).  Instance Evacuation is a manual process that essentially leverages cold migration to make instances available again on a new compute node.

At a high-level, the steps for Instance Evacuation is as follows (performed by a Cloud Administrator):

  1. Use the nova host-list command to list all compute nodes under management.
  2. Choose an appropriate compute node (If user data needs to be preserved, the target compute node must have access to the same shared storage used by the failed compute node).
  3. When the nova evacuate command is invoked, Nova Compute reads the database that stores the configuration data for the downed instances and essentially “rebuilds” the instances on the chosen compute node, using that stored configuration data.
  4. With Instance Evacuation, the recovered instance behaves differently when it is booted on the target compute node, depending on if it was deployed with or without shared storage:
    • With Shared Storage – User data is preserved and server password is retained
    • Without Shared Storage – The new instance will be booted from a new disk but will preserve configuration data, e.g. hostname, ID, IP address, UID, etc..  User data, however, is not preserved and a new server password is regenerated.

    instance-life-2

As mentioned in previous posts, vCenter essentially proxies the ESXi cluster under its management and abstracts all the member ESXi hosts from the Nova Compute node.  When an ESXi host fails, vSphere HA kicks in and restarts all the VMs, previously hosted on the failed server, on the surviving servers; DRS would take care of balancing the VMs appropriately across the surviving servers.  Nova Compute is unaware of any VM movement in the cluster or that an ESXi host has failed; vCenter, however, reports back on the lowered available resources in the cluster and Nova take that into account, in its scheduling, the next time a request to spin up an instance is made.

HA

So the question that users may ask, particularly those with a VMware background, is “Why would OpenStack be missing such a ‘basic’ feature such as High-Availability?”  Speaking for myself, there are a few factors to consider:

  • Since OpenStack itself does not have it’s own specific hypervisor, it chooses to expose the functionality that is available that comes with each hypervisor.  So for example, HA is available with Hyper-V and vSphere since those hypervisors natively support HA.
  • Since OpenStack was designed from the beginning to be a Cloud platform, it follows certain design principles that differ from a “traditional” virtualized infrastructure.  Some of these differences are detailed in Massimo Re Ferre’s excellent post on pets vs. cattle.

Some of these “Cloud vs. virtualized infrastructure” principles include the following:

  • A VM instance and a compute node are commodity components that provides services.  If a VM or a compute node dies, just shoot it and restart.
  • Resiliency is multi-layered and is required to be built into both the application and infrastructure layers.
  • Scale-out is preferred over scale-up, not only for performance, but for resiliency.  By distributing workloads across multiple instances and compute nodes, the impact of a failed instance is minimized, allowing time to evacuate instances.

[Havana] It’s also noteworthy that some vendors, such as Piston Computing, have built HA into their OpenStack distribution.  Also, with the new Heat project that is now part of OpenStack, instance-level HA could  be written into an OpenStack Cloud using auto-scaling.

Still, Enterprises may wonder how their legacy applications, which require more resiliency at the infrastructure layer, fits in with OpenStack.  This is an area where vSphere can provide both a competitive advantage over other hypervisors supported by OpenStack and provide a more robust option in a multi-hypervisor Cloud.  Deploying vSphere with OpenStack, users have the option of creating a tiered Cloud architecture where new apps can be hosted on KVM or Xen instances while legacy apps, that require vSphere HA functionality, can be hosted on vSphere.

multi-hv havanaVM Migration

OpenStack Nova supports VM migrations by leveraging the native migration features inherent within the individual hypervisors under its management.  The table below outlines what migration features are supported by OpenStack with each hypervisor:

vmotionThe biggest difference between vSphere and the other hypervisors in the table above is that cold migration and vMotion cannot be initiated through Nova; it has to be initiated via the vSphere Client or vSphere CLI.  So, other than vSphere HA, the hypervisors above supported by OpenStack are mostly in parity with one another, which makes HA and Storage vMotion the most compelling reasons to use vSphere.  What will be worth watching in upcoming OpenStack releases are both the development of new features by other hypervisors to match vSphere functionality and VMware’s continued commitment to make vSphere a first class hypervisor solution in OpenStack.

12 comments

  1. Great post and a great series Kenneth!

    I would like to see some more information about how the failed instance is restarted – where it is placed – in comparison with the way it work in HA.

    Another question – does Openstack support connection Directly to an ESXi server as a compute node?

    • Maish,

      Great idea! I will either add more details to this post or in an upcoming post where I’ll do more details on implementation and operations.

      It is not possible to run the Nova Compute services on an ESXi or vCenter host on ESXi or Windows. You can connect a compute node to an ESXi host directly but then you don’t get HA, vMotion, or DRS.

      Ken

Leave a comment