OpenStack Compute For vSphere Admins, Part 4: Resource Overcommitment in Nova Compute

oversub

This is part 4 in an ongoing series on OpenStack for vSphere Admins.  You can catch up on previous posts by following the links here for part 1, part 2, and part 3, and part 5.

One of the areas in OpenStack that seem to be lacking is solid information on how to design an actual OpenStack Cloud deployment.  Almost all of the available documentation focus on the installation and configuration of OpenStack, with little in the way of guidance on how to design for high availability, performance, redundancy, and scalability .  One of the gaps in documentation is around the area of CPU and RAM overcommitment, aka oversubscription, when designing for Nova Compute.  The current OpenStack documentation points out that the default CPU overcommitment ratio settings, as configured in the Nova Scheduler, are 16:1 for CPU and 1.5:1 for RAM, but does not provide the rationale for these settings or give much guidance on how to customize these default ratios.  The current documentation also does not provide guidance for each hypervisor supported within OpenStack.

In contrast, there is an abundance of guidance to help VMware administrators with sizing vSphere and determining the correct CPU and RAM overcommitment ratios.  Perhaps that should not be a surprise since Public Cloud providers, who have been the early adopters of OpenStack,  typically maintain enough physical compute resources in reserve to handle unexpected resource spikes.  However, with a Private Cloud, more attention must be paid to ensure there are enough resources to handle current and future workloads.  One way to consider this is to view Public Cloud providers as essentially cattle ranch conglomerates, such as Koch Industries, while a Private Cloud is more like a privately-owned cattle ranch that can range from a few dozen head of cattle to thousands of head of cattle.

Questions and Answers signpostTypical Compute Sizing Guidelines

As previously mentioned, OpenStack sets a default ratio of 16:1 virtual CPU (vCPU) to physical CPU (pCPU) and 1.5:1 virtual RAM (vRAM) to physical RAM (pRAM).  Coming from the VMware world, I’ve often heard rule-of thumb guidelines such as assuming 6:1 CPU and 1.5:1 RAM overcommitment ratios for general workloads.  It seems to me that many of these guidelines for general workloads were developed at a time when most shops were consolidating Window servers with only 1 pCPU, which were often only 10% to 20% utilized, and using 2 GB pRAM which were often only 50% utilized; under those assumptions, a 6:1 ratio made sense.  For example, 6 windows servers, each running on a single 2 GHz CPU that is only 20% utilized (400 MHz) and using 2 GB RAM that is only 50% utilized (400 MB), would require only an aggregate of 2.4 GHz of CPU cycles and 6 GB RAM; you could virtualize those 6 servers and put them on a 3 GHz CPU core with 8 GB RAM and still have 20% headroom for spikes; this would allow you to host ~50 VMs on an ESXi server with 8 cores and 64 GB of RAM.  But how feasible are these assumptions today when applications are being written to take better advantage of the higher number of CPU cores and RAM? In those cases, the rule-of-thumb guidelines will likely not work.  This is particularly true for business critical workloads, where a much more conservative approach with NO overcommitment is generally considered a recommended practice.

Compute Sizing Methodologies

So, what is the best way to determine compute sizing for OpenStack with vSphere or another hypervisor, such as KVM?  Over the years, I’ve used different methodologies that range from data collection and analysis, to best-guess estimates, to following general rules-of-thumb, depending on what my customers can and are willing to provide:

  • Using current utilization data – I get this from very few customers but they provide the best input for the most accurate compute design.  What I am looking for is the aggregate CPU cycles and RAM usage rate for all servers that are or will be virtualized and be running in an OpenStack Cloud.
  • Using their current inventory – More common is to get an inventory without any utilization data.  In which case, I use the same methodology above but make assumptions about pCPU and pRAM utilization rate.  I recommend making sure those assumptions are agreed on by the customer before I deliver my design.
  • Using Rule-of-Thumb guidelines – In reality, this is the most common scenario because I often get insufficient data from customers.  At that point I take extra care to confirm I have an agreement with the customer as to what our assumption will be regarding CPU and RAM overcommitment.

Rule-of-Thumb For Sizing CPU ResourcesRot

For all hypervisors, including KVM and vSphere, the guidelines are the same since overcommitment for both are dependent on the amount of CPU cycles that the OpenStack architect assumes are being used at any one slice of time.  For business critical applications, I would start with NO overcommitment and adjust if that instance turns out not to require all the physical CPU cycles to which it has been given.  For today’s general workloads, I start by assuming a 3:1 CPU overcommitment ratio and adjust as real workload data demands.

For example, let’s assume a dual quad-core server with 3 GHz CPUs as our Nova Compute node; that works out to 16 physical cores and 48 GHz of processing resources.  Using the 3:1 overcommitment ratio, it would work out to 48 vCPUs to be hosted with the assumption that each vCPU would be utilizing 1 GHz of actual processing resources.  If each VM instance has 2 vCPUs, that would equate to 24 instances per compute node.  Again that ratio could change based on workload requirements.

This is obviously much more conservative than the 16:1 default overcommitment ratio for OpenStack Nova Compute.  My opinion, however, is that experience and the math does not support such a high overcommitment ratio for general use cases. For example, to pack 16 vCPUs into a single 3 GHz physical core with 80% utilization would mean assuming each vCPU would only require an average of 150 MHz of CPU cycles.  Or we would have to assume significant idle time for those vCPUs.  That’s certainly possible, but I would not be comfortable assuming such low CPU cycle or utilization requirements, without some solid data to back that assumption.

Rule-of-Thumb For Sizing RAM Resources

This is where guidelines may differ depending on the  hypervisor used with OpenStack.  For hypervisors with very advanced native memory management capabilities, such as ESXi, I use a ratio of 1.25:1 RAM overcommitment when designing for a production OpenStack Cloud.  These advanced memory management techniques include Transparent Page sharing, Guest Ballooning, and memory compression.  In a non-production environment, i would consider using the 1:5:1 overcommitment ratio that is the default in Nova Compute.

For example, let’s assume a server with 128 GB pRAM as our Nova Compute node.  Using the 1.25:1 overcommitment ratio and assuming 96 GB RAM available after accounting for overhead and restricting to 80% utilization, it would work out to 120 GB vRAM to be hosted.  If each VM instance requires 4 GB vRAM, that would equate to 30 instances per compute node. Note that you should use the more conservative of the CPU or RAM numbers.

For hypervisors, such as KVM, that do not have the same degree of advance memory management capabilities, I would assume NO overcommitment at all.  This is again in contrast to the default 1:5:1 ratio in OpenStack for all environments.  I would not recommend adopting that aggressive a ratio without some data showing this can be justified.

For example, let’s assume a server with 128 GB pRAM as our Nova Compute node.  Using 1:1 overcommitment ratio (No overcommitment) and assuming 96 GB RAM available after accounting for overhead and restricting to 80% utilization, it would work out to 96 GB vRAM to be hosted.  If each VM instance requires 4 GB vRAM, that would equate to 24 instances per compute node. Note that you should use the more conservative of the CPU or RAM numbers.

Concluding Thoughts

It is also important to factor in the number of Compute Node or ESXi host failures a customer can tolerate. For example, if I have four computes nodes that are 80% utilized across each node, if one node fails, then I am out of resources and either unable to create new instances or will experience significant performance drops across all instances.  In the case of vSphere, depending on how Admission Control is configured , you may not be able to spin up new VMs.  In that case, you may need to design, for example, a 5 or 6 node configuration that can tolerate a single node failure.

Given that resource overcommitment guidelines can differ across hypervisors, I would recommend separating out compute nodes, managing these different hypervisors, using Cells or Host aggregates.  This avoids the Nova Compute Scheduling issues that I referenced in part 2 of this series which can occur when you have multiple hypervisors with different capabilities being managed in the same OpenStack Cloud.  I will provide more details on how to design for this in the future.

In the next post, I will start putting everything together and walking through what a multi-hypervisor OpenStack Cloud design, with vSphere integrated, may look like.

13 comments

  1. Waiting for configuration and networking details for adding esxi server as compute node.Do openvswitch plugin support esxi server.

  2. Really enjoying this series.

    KVM and Xen are capable of ballooning in different ways. There’s varying differences from ESXi around specific guest OSes, like Windows. It looks like they’re both capable of TPS like functionality (KVM calls it KSM and Xen calls it DMC.

    It may be important to note that the scheduler service is not aware of per-VM overhead.

    There’s some work being done in this area, but each underlying hypervisor has different contributing factors to per-VM overhead and different overhead amounts. In order to be aware of the overhead the Virt driver will have to implement some overhead accounting.

  3. […] Having walked through several aspects of vSphere with OpenStack,, I want to start putting some of the pieces together to demonstrate how an architect might begin to design a multi-hypervisor OpenStack Cloud.  Our use case will be for a deployment with KVM and vSphere; note that for the sake of simplicity, I am deferring discussion about networking and storage for future blog posts. I encourage readers to review my previous posts on Architecture, Resource Scheduling, VM Migration, and Resource Overcommitment. […]

Leave a comment