What is vSphere High Availability ?
vSphere HA is an high availability solution from VMware which enables reduction of down time on virtual machines. vSphere High Availability provides availability at virtual machine level, Guest operating system and applications.
High availability is achieved in different ways for various components of virtual infrastructure:
1.Networking : NIC Teaming, Multiple Kernel port groups
3.Viratual Machines : Vmotion, High Availability, FT
vSphere HA leverages multiple Esxi Hosts configured as a cluster to provide rapid recovery from outages and cost effective high Availability for applications running in virtual machines.
vSphere High Availability protects application availability in the following ways:
- It protects against a server failure by restarting the virtual machines on other hosts within the cluster.
- It protects against application failure by continuously monitoring a virtual machine and restarting it in the event that a failure is detected.
Advantages of using vmware High Availability over traditional failover solutions:
- Minimal setup
- Reduce hardware costs
- Increased application availability
- DRS and vMotion integration
1. Requirements for enable High Availability:
- Virtual Center
- Creation of Cluster
- IP Address (Mostly router /Switch) for isolation check
- minimum two hosts with static IP
- required license
- at least one management network for sending the High Availability heart beats
- Shared storage and correct network configurations on all hosts
- for virtual machine monitoring, Vmware tools needs to be installed on all the virtual machines
Types of heart beating used to with High Availability
- Network (VMkernel Port Group)
- Datastore (Minimum 2 shared datastores)
2. Working of High Availability
Once we enable vmware High Availability on the cluster, the hosts which are part of the cluster and virtual machines on them are protected by High Availability.once hosts added one of the hosts is elected as master hosts and rest as slaves, master hosts, monitors other slave hosts and virtual machines and in case of any network issue, virtual machines guest operating system issues, the virtual machines are either moved to another hosts or restated respectively.
Host with highest number of data stores is elected as master, Master host has the below roles and responsibilities.
- Monitoring the state of slave hosts. If a slave host fails or becomes unreachable, the master host identifies which virtual machines need to be restated.
- Monitoring the power state of all protected virtual machines. If one virtual machine fails, the master host ensures that it is restarted. using a local placement engine, the master host also determines where the restart should be done.
- Managing the lists of cluster hosts and protected virtual machines
- Acting as vCenter server management interface to the cluster and reporting the cluster health state.
- Orchestrate restarts of protected virtual machines
If a master host is unable to communicate directly with the agent on a slave host, the slave host does not respond to ICMP pings, and the agent is not issuing Heartbeats it is considered to have failed. The host’s virtual machines are restarted on alternate hosts.If such a slave host is exchanging heartbeats with a datastore, the master host assumes that it is in a network partition or network isolated and so continues to monitor the host and its virtual machines.
Once HA is enabled, in the selected datastores, an auto generated file is created with list of protected virtual machines (powered on Virtual machines in High Availability enabled cluster), so that master hosts knows which virtual machines needs to be restated /migrated when High Availability triggers.
3. Types of host Failures:
There are 3 types of host failures They are:
- Host Stops functioning (freeze/hung state)
- Host becomes network isolated
- Host loses network connectivity with master host
The master host monitors the liveness of the slave hosts in the cluster. This communication is done through the exchange of network heartbeats every second. when the master host stops receiving these heartbeats from a slave host, it checks for host liveness before declaring the host to have failed. The liveness check that the master host performs is to determine whether the slave host is exchanging heartbeats with one of the datastores. Also, the master host checks weather the host responds to ICMP pings sent to its management IP addresses.
If master host is unable to communicate directly with the agent on a slave host, the slave host does not respond to ICMP pings, and the agent is not issuing heartbeats it is considered to have failed. The host’s virtual machines are restarted on alternate hosts. If such a slave host is exchanging heartbeats with a datastore, the master assumes that it is in a network partition or network isolated and so continues to monitor the host and its virtual machines.
Host Network isolation occurs when a host is still running, but it can no longer observe traffic from vSphere High Availability agents on the management network. If a host stops observing this traffic, it attempts to ping the cluster isolation addresses. If this also fails, the host declares itself as isolated from the network.
The master host monitors the virtual machines that are running on an isolated host and if it observes that they power off, and the master host is responsible for the virtual machines, it restart them.