How to balance VMware ESX hosts paths on HP EVA arrays
Here at 64k, in our smaller cube near the vending machines, we storage-oriented folks like to mull over ideas big and small, 4k at a time. We also deal in a great number of puns, so consider yourself warned. Today, in our maiden voyage, I’d like to talk about some of my experience with HP’s line of EVA storage arrays. As many of our readers know, the EVA line is a middle tier offering from HP. Though likely to be usurped in the near future by 3PAR’s goodies, I am not here to begin that debate. Rather, let us delve into a few common gotcha’s that can be overlooked in environments where EVAs live.
The tight rope act begins with the storage array, our bright and shiny EVA. At a fundamental level, an EVA is comprised of two controllers. The operating environment of the EVA is such that it can, in a semi-intelligent fashion, manage vdisk ownership between the two controllers itself. By default, vdisks are set to no preference for a failover/mode setting at the time of creation. This means the EVA will decide which controllers get which vdisks when it (the EVA itself) boots. Every vdisk is assigned to a controller (and only one controller). If the non-owning controller is receiving the IO for a server(s) talking to a vdisk, it will after a period of time change the ownership of the vdisk. This will reduce the load crossing the mirror ports. While the EVA can run in this fashion, it is sub-optimal.
The other side of the tight rope of this balancing act is the hosts. IO can walk many paths from host to array, some optimal and others not. The start of such begins at the host’s adapter. If it is a dual port (or multiple single port) host, then you have all the more paths to choose from. Even in the case of a single port host, you can still cover multiple paths to arrive at the vdisk. The handling of the proper path comes in the form of multipathing software. From HP for Microsoft operating systems, we have Device Specific Module (DSM), which uses MS’s MPIO stack as its basis. HP makes specific DSM’s for each of its line of arrays. Without the MPIO stack, the host will see a drive presented once for each host port. In an 8×00 series array, that is 8! So clearly the MPIO software and HP’s DSM is needed for correct operation. The default install does not enable Adaptive Load Balance (ALB). This hampers read operations by not passing through the correct controller for a vdisk. Note that non-MS based operating systems (like VMware) have their own multipathing stacks. In the case of VMware ESX(i) 3.x, the options are fixed and mru. In the case of vSphere, we get round robin added to the mix. In pre-vSphere environments, the fixed path does not by default balance load across the host ports. You can end up with all your VM traffic running over one host port! Yikes!
Now, to balance things out, let me start with the array. A good habit to get into involves understanding your environment from an IO perspective. You need to understand the profile, or workload, of your IO, so that you can balance between the controllers (among other things!). Make sure to capture your performance data using evaperf (or other tools) to allow you the view of your controller’s current load. As you add new vdisks, you can balance them by setting the failover/mode setting to the controller with failover + failback. This will allow the balancing to remain should you lose and regain a controller. Further, this specifies the controller for the vdisk in terms of mastership. This helps from the host side as the controller it needs to talk through is clearly defined. One thing to keep in mind also is the need to accept all load on one controller should failure occur. This should be something you are aware of via your performance data. A good rule of thumb is a controller should be no more than 30% ideally (at least in my experience). And as always, have the latest Command View and XCS code. One other thing to check for balance is to make sure the host ports are set to their top speed (4GB, except the very old EVA models) as well as properly balanced on the fabric (equal ports on both sides). One customer I came across had all ports from controller A on fabric A and all ports of controller B on fabric B! Definitely a big problem there!
For the host side, there is a bit more that can be done. There is some work to be done on the array as well, which I will address. The hosts should have the latest firmware, drivers, and software for their HBAs. Additionally, make sure you have the latest HP DSM software. Within the DSM software, you will want to enable Automatic Load Balancing. As I stated before, this is not enabled by default. To enable, just right click on each LUN (listed by WWN) that is listed and choose Enable ALB.
So, as a quick explanation: write requests from hosts will hit the controller that owns the vdisk in question, but that write will propagate over the mirror link into both controllers’ cache. This is in case a controller is lost, the write can still be committed. Read requests will hit whichever controller, and if it is the wrong controller, will have to travel over the mirror ports to the correct controller. This is sub-optimal, but is alleviated by enabling ALB. ALB communicates with the array and will always communicate its read requests through the owning controller. Very handy!
Now, from a VMware standpoint, let’s talk about fixed and then round robin (two most common multipathing situations found today). For Fixed, you will need to balance IO to your datastores over the host ports of the controllers. Also keep in mind which controller you selected at the array. As an example, if I have 8 datastores of average IO (no virtualized heavy apps) then I would want 4 datastores on each controller. To further balance, I would have each datastore talking over one of the host ports for each of the controllers (4 ports per controller x 2 controllers). The IO is evenly balanced. To set this, simply go into each datastore properties (via the VI Client) and pick the WWN for the corresponding host port). Under heavy IO circumstances, you may not be able to move your traffic to a different host port. Just try again at a later date. When it comes to round robin, the IO works a bit differently. Round Robin will send IO to each host port in turn after a certain amount of IOPS. In the HP best practices for vSphere on the EVA, it states to change this value to 1 (and thus pushing even IOPS over every host port visible). There was a bug which would, after a reboot of the ESX(i) host, reset this to a very high number. I have found in my experience that leaving it as-is seems to work fairly well. I would guess there is good reason that HP came up with that figure, and so at this point, with vSphere 4.1, I would suspect you could set this without issue.
Presented here are some of the findings I have come across in working with different customers. I figure that having these kinds of storage discussions can help to make for a very engaging conversation. Let me know what you think (and if I make any errors, which being human, am prone to!