Cisco announced this week enhancements to its Nexus 1000v virtual switch, which will allow VXLAN to be implemented without requiring IP Multicast. Why does this matter and what may it mean to the current IETF standard for VXLAN?
IP Multicast as a requirement for VXLAN has long been cited as a barrier for deployment in production environments. For information on VXLAN and some of its benefits and limitations, please read my earlier blog post and the references I cited in that post. In many ways, what Cisco has done here is not a surprise, given their history of branching off and adding extensions to current and pre-approved standards. Cisco, of course, would argue as they do here that they are merely “evolving the standard to solve customer problems.” Without making a value judgment about that statement, let’s take a look at what Cisco has done with their Nexus 1000v enhancements and the questions they brings up.
The Nexus 1000v was initially released as Cisco’s implementation of a Virtual Distributed Switch for vSphere, using VMware’s virtual Distributed Switch (vDS) API. It can be likened to a virtual version of Cisco’s Catalyst 6500 switch, designed to provide a centralized switch for vSphere environments; a single Nexus 1000v can provide virtual network connectivity for VMs hosted by up to 64 ESXi hosts. Since this initial release, Cisco has announced support for Hyper-V and also for hypervsiors that support the Open vSwitch standard.
In a VXLAN deployment, the Nexus 1000v performs the MAC-in-IP encapsulation and forwarding of packets from one VM to another VM in the same VXLAN network. As per the IETF standard, the Nexus 1000v currently uses IP Multicast to discover as to which IP address should a given packet be sent. To get around the IP Multicast requirement and the overhead it incurs, Cisco is adding two enhancements to the Nexus 1000v.
According to the blog post I reference earlier, Cisco is providing two solutions to allow VXLAN to be implemented without IP Multicast. I suggest reading that post for more details; but here I will briefly describe the two solutions:
- The first solution has the Nexus 1000v replicating packets and sending them to multiple IP address destinations where a target MAC address may be found. Since the replicated packets are sent only to a subset of all possible addresses, this can be done via unicast instead of IP Multicast. This reduces the overhead associated with multicast and allows VXLAN to scale up to support a larger number of devices.
- The second solution relies on the Nexus 1000v Virtual Supervisor Module (VSM) as the control plane that distributes VM MAC addresses to all Virtual Ethernet Modules (VEM), enabling packets to be sent only using unicast mode. This increase the scalability of VXLAN even more than the first option but also more explicitly deviates from the IETF standard, which stipulates no control plane for determining to which IP address a particular packet should be sent.
So what prompted Cisco to go beyond the standard they themselves helped to author? According to Cisco, the primary motivation is to overcome the current scalability challenges with VXLAN that is perceived to be preventing its wide adoption. While addressing the IP Multicast limitation certainly goes a long way towards that, it is important to note that VXLAN has other challenges that needs to be considered. In particular, users should not assume that all scalability issues with VXLAN are resolved or that this will now allow VXLAN to be used as a datacenter-interconnect solution.
- Extending the Fault Domain – As the number of Layer 2 devices increase in a network, the size of the fault domain increases as well. This is compounded if someone attempts to extend the networks across a WAN link to extend a data center. VXLAN still do not have all the tools necessary to reduce the impact of a failure within a single large domain.
- The Traffic Trombone Problem – If a VXLAN is extended between two site and a VM is moved between those sites, traffic in and out of that network will have to go through a vShield Edge instance, even if that Edge instance resides at the data center on the other side of the WAN links from the source or target VM. This causes the WAN links to be a potential chokepoint.
To be clear, Cisco is claiming greater scalability but have not claimed that these enhancements make VXLAN a viable datacenter-interconnect technology. However, there is enough confusion on the matter to justify a warning.
The other question that needs to be raised is what the impact may be of Cisco going beyond the IETF standard. It seems to me that there are two likely scenarios that may play out:
- These enhancements that Cisco is releasing will eventually be added to the standard or an alternate version of these enhancements will be adopted. This is the best-case scenario since it would hep prevent splintering of the VXLAN standard.
- Other networking vendors respond with their own proprietory solutions to differentiate from and to compete with Cisco. This may then cause the standard to splinter and create competing solutions that do not inter-operate. Cisco has already indicated that there are other solutions that will be released which, if they also deviate from the standard, may further splinter the coalition behind VXLAN.
To be fair, Cisco has indicated that VXLAN can be implemented with the Nexus 1000v in such a way that these enhancements are not used. This will give users the option of implementing VXLAN with the IP Multicast requirement if they want to ensure interoperability. My hope is that Cisco will continue to work with the IETF and the other networking vendors, who support the standard, to ensure that all enhancements become part of the approved standard.
Related articles
- Cisco VXLAN Innovations Overcoming IP Multicast Challenges (blogs.cisco.com)
- Word Of Caution About Overextending The Use Of VXLAN (varchitectmusings.com)
- Long-Distance vMotion, Stretched HA Clusters and Business Needs (ioshints.info)
The Traffic Trombone Problem – If a VXLAN is extended between two site and a VM is moved between those sites, traffic in and out of that network will have to go through a vShield Edge instance, even if that Edge instance resides at the data center on the other side of the WAN links from the source or target VM. This causes the WAN links to be a potential chokepoint.
Can you elaborate on that? I understand a vShield Edge device is responsible to taking virtual packets and moving them to the physical network, however what are you specifically trying to say here?