This post is the note I take as I am reading the vSphere 6.x HA Deepdive book, plus my understanding of the materials.
Here we focuse on the files stored on the shared datastore, aka. remote files. Each host also restores some configuration files in a directly accessible datastore, aka. local files)
- Protectedlist file
- Naming: protectedlist
- Owner: The master locks this protectedlist file
- The master uses this file to claim the “ownership” of the datastores stored the VM configuration file. When a host is isolated, if the host can access the datastore, then it will validate whether a master owns the datastores. If no master owns the datastores, the isolation response will not be triggered and restarts will not be initiated. (see page 29 & 30 of the book about isolation response).
- The master uses this protectedlist file to track the VMs protected by HA and the states of the VMs (powered on / off)
- The master distributes this protectedlist file to all datastores in use by the VMs in the cluster
- "poweron" file
- Naming: host-<number>-poweron
- Owner: per-host (master & slaves)
- The host uses this "poweron" file to track the powered on virtual machines on a host
- The slaves use this "poweron" file to inform the master that it is isolated from the management network
- No datastore heartbeat: the master determines a host has failed
- The top line of "poweron" file is 1 (means isolated); if 0 means not-isolated
- Heartbeat file
- Naming: host-<number>-hb
- Owner: per-host
- Each host creates a heartbeat file on the designed heartbeating datastores
- On VMFS datastore, "heartbeat region" is used to check the heartbeat update
- On NFS database, the time-stamp of the file is check (each host writes to its heartbeat file once every 5 seconds)