iSCSI MultiPathing with VMware vSphere
Every now and then I come across an iSCSI configuration which does not conform to best practices. There’s several great posts that cover this, but I thought I’d try to briefly cover some the basics and FAQs in this post.
There are some unique problems which could be lurking under the hood of some environments if they do not conform to best practices – let’s take a look.
This is something most VMware adminsitrators are familiar with. You might have four 1GB NIC ports bonded together into a single vSwitch and available for use by virtual machines. Does this mean that you essentially have 4GB of bandwidth available for your VMs? Well….yes and no….
Yes, because you have a total pool of 4GB available. No, because a single VM (or conversation) will only use one pNIC (physical NIC at a time). Stated another way, you have a 4GB logical pool of bandwidth, but a single VM/session can not use more than what is available on one pNIC at a time (1GB in this case). For a bit more detail on this see an earlier post on Load Balancing in vSphere.
Getting to iSCSI we more or less have the same thing. Lets say you created an iSCSI port group (VMkernel) and gave it access to two active NICs within the vSwitch, such that the NIC Teaming for your iSCSI port group looks like this:
Does this mean you have 2GB available for iSCSI? Absolutely not, and you have no multipathing either.
In ESX 3.5 only a single iSCSI session / TCP connection to a target is supported as noted in the iSCSI Configuration guide which explains “storage systems with a single target containing multiple LUNs have all LUN traffic on that one connection.” ESX 4.0 was improved to allow multiple iSCSI sessions, but you can’t get to where you want to be just by aggregating the NICS. Here’s is what the iSCSI Multipathing whitepaper (ESX 4 and 5) says:
In case of simple network adapter teaming, traffic will be redirected at the network layer to the second network adapter during connectivity failure through the first network card, but failover at the path level will not be possible, nor will load balancing between multiple paths.
So we do have fault tolerance (at the NIC/port level), but we have no load balancing or multipathing. What you really want to to have two iSCSI port groups (each with their own IP) and each port group with 1 active NIC. If you’ll forgive my graphic skills I’ve attempted to visualise this below:
Above we have two iSCSI port groups, each with their own vmkernel IP. Each port group has only one active pNIC assigned and ideally each one going to a different physical switch on the network. With this configuration we have true multipathing being done by VMWare’s iSCSI Software Initiator within ESX. To make sure an iSCSI port group is only using one NIC, you should modify the NIC Teaming to look like this:
And of course for a final step we need to bind our vmknics to the Software iSCSI adapter. All of this is well detailed in the Multipath Configuration for Software iSCSI whitepaper.
One more quick note — if you are running vSphere 5.0 please make sure you have Update 1 installed as this corrects a bug in ESX 5.0 in which an All Paths Down (APD) condition can occur due to iSCSI traffic taking the wrong path — even with a correct iSCSI configuration — which can severely cripple the affected ESX host(s).