With any new hypervisor or one that someone has not worked with previously, high availability will be one of the early questions. What happens to the VMs running on a host that crashes for one reason or another. Acropolis hypervisor (AHV) is built on CentOS KVM, which does not offer HA natively. Nutanix has built additional functionality as part of AHV to offer virtual machine High Availability (HA) as a feature to ensure virtual machine availability in the event of a host or block outage. In the event of a host failure the VMs previously running on that host will be restarted on other healthy nodes throughout the cluster. There are three HA configuration options available to account for different cluster configuration scenarios.
By default, all AHV clusters will provide a best effort level of HA even when the cluster was not specifically configured for HA. Best effort HA works without reserving any resources. Admission control is not enforced and hence there may not be sufficient capacity available to start all the VMs from the failed host. What this means is that even when AHV clusters have not been configured for HA, clusters are still protected. Depending on how many compute resources are available within the cluster all or a subset of the affected VMs may be restarted.
When an Acropolis cluster is configured for HA, the process is accomplished through Prism and is enabled with a single click. Prism will examine the cluster and will configure the cluster reservation for a specific number of host failures or segment reservations. The reservation method decision is based upon the uniformity of the hosts configuration within the cluster and selects the method with the least amount of overhead.
Host reservations – With this method an entire host is reserved for failover protection. The least used host in the cluster is selected as a reserve node, and all VMs on that node are migrated off to other nodes in the cluster so that the full capacity of that node is available for VM failover. This is the default HA method when all hosts within the cluster have the same amount of RAM. Prism will configure the number of failover hosts to match the number of failures the cluster will tolerate for the configured Replication Factor (RF).
Segment reservations – This method divides the cluster into fixed size segments of CPU and memory. Each segment corresponds to the largest VM that is guaranteed to be restarted in case the failure occurs. The other factor is the number of host failures that can be tolerated. Using these inputs, the scheduler implements admission control to always have enough resources reserved so that VMs can be restarted upon failure of any host in the cluster. This is the default method used when hosts in the cluster have different amounts of RAM.
As you add more blocks and nodes to you AHV cluster as part of the expand cluster function, Prism will configure the new AHV hosts with the same profile settings as other hosts in the cluster. This ensures that the new resources are added to the HA calculations for the cluster and any changes are made to all hosts.
How to configure HA via Prism
Configuring HA on AHV in Prism is one-click, just like many other functions that Nutanix has built so far. Once you are logged into Prism you click on the gear in the upper right, find the Manage VM High Availability choice and select.
About Brian Suhr
Brian is a VCDX5-DCV and a Sr. Tech Marketing Engineer at Nutanix and owner of this website. He is active in the VMware community and helps lead the Chicago VMUG group. Specializing in VDI and Cloud project designs. Awarded VMware vExpert status 6 years for 2016 - 2011. VCP3, VCP5, VCP5-Iaas, VCP-Cloud, VCAP-DTD, VCAP5-DCD, VCAP5-DCA, VCA-DT, VCP5-DT, Cisco UCS Design