Search This Blog

Recover Microsoft Cluster VMs Not Power On After Migration

A lesson to remember if you do not have the time to read this entire post: do not migrate the cluster VMs without fully understanding the impact.

Here is our story.

We had a Microsoft SQL 2008 Cluster VMs in the CIB (see my previous post about various Microsoft Cluster VMs configuration). The shared disks of the cluster VMs were on an EMC SAN. When the free space of EMC SAN was running low, an engineer migrated the cluster VMs (the VMs were powered off during the migration) to the VSAN v.6.1 hosts and storage. The migration completed successfully, but the VMs would not power on with the error message “Cannot use non-thick disks with clustering enabled (sharedBus='physical'). The disk for scsi1:0 is of the type thin.”

Because VSAN does not support Microsoft Cluster with the shared disk (non shared disk cluster, e.g. SQL AlwaysOn Availability Group is supported), this is no option but migrating the VMs back to the original hosts and SAN storage.

PS: In this case, the new target storage is VSAN. I think if the new target storage were the traditional SAN,  the cluster would break too. Because the cluster VMs were not shared anymore after the migration (see below). But you probably could recover the cluster by reconfiguring the VMs to share the shared disks without migrating the VMs back to the original storage.

When we reviewed the disks of the migrated VMs on the VSAN storage, each VM had its own copy of the shared disks. So the cluster VMs were not shared the shared disks any more. We could not simply migrate the VMs back to the original hosts and SAN storage.

When we reviewed the original EMC SAN storage, the VMDK files of the shared disks were still left there, only the non shared disk (e.g. the OS’s C drive) was completely migrated to the VSAN storage.

vmdk.files.left.on.the.san

Recovery Procedure:

  1. Document the SCSI controller ID (e.g. SCSI (1:0)) of each shared disk from the migrated VMs. This may not be very important. But we are going to use the same SCSI controller for each corresponding disk when re-adding the shared disks
  2. Since the VMDK files of the shared disks were still left on the original SAN storage, we can speed up the recovery by migrating the non shared disks of each VMs only. In this case, we are only migrating the hard disk 1 of each VM (the OS drive) back to the original SAN.
  3. How to migrate only the OS drive back to the original host and storage? We used VMware vCenter Converter, and only select the hard disk 1. This worked beautifully.
    • vmware.converter.select.os.drive.only
  4. PS. In this case the VMs were migrated to the VSAN storage. We could not use scp to copy the VMDK file manually between the hosts. If we want to use scp, we need to migrate the VMDK files to a non-VSAN storage first. This is why I think vCenter Converter is the best tool in this case.
  5. Now the non-shared disk of each VM are back to the original host and SAN storage. Make sure both VMs are registered on the same ESXi host.
  6. If the VMs were not on the same ESXi host, use Migrate, Change host, check the checkbox “Allow host selection with this cluster” (this option is not selected by default) to put both VMs on the same ESXi host.
    • vm.migrate.allow.host.selection
  7. Re-add the SCSI controller(s) to the first VM and set the SCSI Bus Sharing to Virtual
  8. Re-add the shared disks using the existing VMDK files to the first VM; match the SCSI ID documented in the first step. We also make sure the order of the hard drives matching the original VM’s configuration
    • re-add.hard.drive.with.existing.vmdk 
  9. Power on the first VM
  10. Log in Windows and verify the shared drives’ drive assignments are correct
  11. Launch Failover Cluster Manager to verify the cluster services and applications are online
  12. Re-add the SCSI controller(s) to the second VM and set the SCSI Bus Sharing to Virtual
  13. Re-add the shared disks using the existing VMDK files to the second VM; match the SCSI ID documented in the first step
  14. Power on the second VM
  15. Log in Windows and verify no shared drive is shown in Windows Explorer, and they should be shown “reserved” in the Disk Management
  16. Launch Failover Cluster Manager to verify the second node is online

No comments:

Post a Comment

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “ Use WinSCP to Transfer Files in vCSA 6.5 ”. When I try the same SFTP server setting in vCSA 6.7...