by

vvVirtual networking always takes a significant role in any VMware vSphere design, and even more so if you are leveraging IP-based storage like NAS or iSCSI. If using VMware’s vSAN product, I think it “turns the dial to 11” as the internode communication becomes that much more important versus host-to-target communication. A few months back (based on the date of this post), VMware released an updated vSAN Network Design document that I strongly encourage everyone to read if looking to, or are already running vSAN. For this post however, I am going to dive into what I have used in the field for customer deployments around NIC teaming and redundancy, as well as Network IO Control (NIOC) on the vSphere Distributed Switch (vDS).

Example Scenario

To start, let’s put together a sample scenario to create context around the “how” and “why”. As suggested in the vSAN Network Design document, all the customer designs I have been involved with have incorporated a single pair of ten gigabit Ethernet (10GbE) interfaces for the host-uplink connectivity to a Top of Rack (ToR) or core switch, using either TwinAX or 10GBaseT for the physical layer. This is accomplished using a pair of dual-port Intel X520- or X540-based cards, and allows for future growth if network needs arise down the road. The uplink ports are configured as Trunk ports (if using Cisco) or Tagged ports (if using Brocade/Dell/HP/etc) and the required VLANs for the environment are passed down to the hosts. On the virtual side, a single vDS is created, and each of the hosts in the vSAN cluster are added to the vDS. The required/needed port groups are created and configured with the relevant VLAN id, NIC Teaming and Failover policy (more to come later here). The following figure provides a visual representation:

Figure 1 – Logical vDS Design



NIC Teaming – Keep it Simple Stupid

I like simple things, especially in my infrastructure. Simple things work, and when they don’t, it makes it easier to troubleshoot. My vDS layout for vSAN is no different, but can ease the burden on you and make it so you don’t have to really engage your networking individual/team for higher level configuration [i.e., Link Aggregation Control Protocol (LACP)]. So how do we accomplish this? Easy: on a per vDS port group level, configure Load Balancing for Use explicit failover order and set the Active Uplinks for all the non-vSAN portgroups to vmnic0 (based on figure above) and the vSAN-specific Active Uplinks to vmnic1. Next, set the Standby Uplinks for the non-vSAN portgroups for vmnic1; set the Standby Uplinks to vmnic0 for the vSAN portgroup. The below table provides the example based on our scenario

Table 1 – Active/Standby Settings



Portgroup Load Balancing Active Uplink Standby Uplink MGMT-100 Use Explicit Failover vmnic0 vmnic1 VSAN-110 Use Explicit Failover vmnic1 vmnic0 VMOTION-120 Use Explicit Failover vmnic0 vmnic1 SERVERS-200 Use Explicit Failover vmnic0 vmnic1

For the remainder of a given port group’s settings, accept the default setting of Link status only for Network Failure Detection, and Yes for Notify Switches and Failback options.

Figure 2 – Teaming and Failover Settings



With this configuration, you will provide vSAN with the full use of one 10GbE link (with redundancy) and allow other traffic in your environment to “share” a different 10GbE link (with redundancy).

Network I/O Control or “Trust in the Software”

Now that we have the vDS Port Group layout taken care of, it is time to tackle Network I/O Control (NIOC). NIOC provides the ability to properly balance network traffic on the shared 10GbE Ethernet interfaces; it accomplishes this by utilizing Network Resource Pools (NRPs) and Shares to determine the bandwidth that is provided to different network traffic types leaving the vDS. Each NRP is assigned a physical adapter share value that determines the total available bandwidth guaranteed to that traffic type. These reservations only apply when a physical adapter is saturated, and are implemented to ensure a guaranteed floor of network bandwidth for each of the NRPs. NRPs are based on individual physical uplinks, and are not an aggregate of uplinks assigned to the vSphere Distributed Switch. Only traffic types that are accessing the uplink at that time are calculated to determine the usable network bandwidth.

Note – The above is a brief description of how NIOC functions and works in a vSphere environment. For a deeper dive see Frank Denneman’s blog and have a look at his post A Primer on Network IO Control.

Just having NIOC enabled only gets us part way there. As mentioned above, we want to leverage NRPs to assign/provide share values to when traffic is constrained. By default, VMware has preconfigured values for the standard traffic types you see in a vSphere environment (see table 2 – Default NRP Shares). Under normal circumstances and deployments, these defaults alone are usually accepted and no tweaking/tuning need be considered.

Table 2 – Default NRP Shares



Traffic Type Shares Shares Value Fault Tolerance (FT) Traffic Normal 50 Management Traffic Normal 50 NFS Traffic Normal 50 Virtual Machine Traffic High 100 iSCSI Traffic Normal 50 vMotion Traffic Normal 50 vSAN Traffic Normal 50 vSphere Data Protection Traffic Normal 50 vSphere Replication (VR) Traffic Normal 50

When adding vSAN into the mix, this stance changes a bit. Per VMware’s own vSAN Networking Design white paper, “vSAN should always have the highest priority compared to any other protocol”. So, with that in mind—and taking our example scenario into consideration—we only need to account for Management, vMotion, virtual machine, and vSAN traffic going across the wire. The new NRP Shares would be configured to look like the following:

Table 3 – vSAN Enabled NRP Shares



Traffic Type Shares Shares Value Management Traffic Low 25 Virtual Machine Traffic Normal 50 vMotion Traffic Low 25 vSAN Traffic High 100

And now some math… 😊

With the NRP Shares put into place, what does our worst-case scenario look like if a host had to failover to a single 10GbE adapter with our NIOC configuration? Table 4 provides the answer.

Table 4 – NIOC Minimums



NRP Share RAW Value Minimum Bandwidth Management Traffic (25) 25/200 (12.5%) 1,250 Mb Virtual Machine Traffic (50) 50/200 (25%) 2,500 Mb Virtual SAN Traffic (100) 100/200 (50%) 5,000 Mb vMotion Traffic (25) 25/200 (12.5%) 1,250 Mb Total 100% 10,000 Mb Wrapping Up

Keeping things simple tends to make one’s life easier, and especially so when dealing with IT infrastructure. What has been described above is a good start or foundation to a vSAN deployment, as it relates to the virtual networking layer. Depending on the size of your environment, this configuration could be more than adequate; but this isn’t meant to be a one size fits all post. Make sure to read through and understand the vSAN Networking Design white paper, especially if you look to leverage multiple vSAN vmkernel ports or wish to use LACP in your environment. These might require a higher level of configuration and/or complexity in your environment, but the design requirements and justifications might validate the need for these additional changes.

Thanks for reading!

-Jason

Share this: Twitter

LinkedIn

Facebook

Reddit

