Search This Blog

Do Not Upgrade Dell Server with H730 and FD332-PERC Controller to VSAN 6.2

VMware released VSAN 6.2 on March 15, 2016. However, if your VSAN is running on a Dell server with H730 or FD332-PERC controller, do not upgrade to VSAN 6.2.

See KB2144614 for more information.

My IOPS Calculator

I look for an IOPS calculator for the EMC VNX2 5400 storage that I am working on. There are many available on the Internet. But none of them gives me exactly what I want. More frustrated, different calculators produce different results. So I decide to build one myself. Here is what I get.

My IOPS calculator concept:

 

My IOPS Calculator Download (save the spreadsheet and open in Excel)

Fix “Deprecated VMFS volume(s) found on the host” in vSphere 6.x

An ESXi 6.x host shows an warning message “Deprecated VMFS volume(s) found on the host. Please consider upgrading volume(s) to the latest version.”

vsphere.6.deprecated.vmfs.warning

After verifying all the datastores mounted on the host are VMFS5, I restarted the management agent on the host. That cleared the warning.

This is a known issue on vSphere 6 (KB2109735).

VSAN Free Storage Catches

VSAN is a hot topic nowadays. Once it is set up, it’s easy to management and use. No more creating LUN and zoning.

We recently experienced some catches about its free available storage - at least we didn’t think about or were told before; or maybe our expectation to VSAN was too positive.

Our VSAN hardware disk configuration:

  • 3 x Dell PowerEdge R730 nodes
  • 2 x 400 GB SDD per node (372.61 GB is shown in VSAN Disk Management)
  • 14 x 1 TB SATA per node (931.51 GB is shown in VSAN Disk Management)
  • Two disk groups (7 SATA + 1 SSD) per node

Calculation of each node storage capacity (RAW):

931.51 x 14 = 13,041.14 GB = 12.73549 TB

Total storage capacity (RAW)

931.51 x 14 x 3 = 39,123.42 GB = 38.20646 TB

This calculation matches the storage capacity shown in the VSAN Cluster’s Summary.

vsan.total.storage.capacity

We are adding more VMs to the VSAN. Once the free storage drops below about 12 TB (about one node’s RAW capacity), the VSAN health check starts showing critical alert “Limits Health - After 1 additional host failure” (KB2108743).

vsan.health.alert

And the component resyncing starts more frequently.

vsan.resyncing.components

My take away:

  • I understand there is an overhead for VSAN (or any storage product) to offer the redundancy. But the way VSAN displaying the free storage is quite difference than the traditional SAN storage and it can be confused. The free storage shown in VSAN does not mean you should use it. Otherwise, the VMs may be down when a host is down or taken down for maintenance.
  • The used storage in the Summary tab is the previsioned storage, not the actual space in use.
  • The frequent resyncing component can potentially impact the overall VSAN storage performance.

Recover Microsoft Cluster VMs Not Power On After Migration

A lesson to remember if you do not have the time to read this entire post: do not migrate the cluster VMs without fully understanding the impact.

Here is our story.

We had a Microsoft SQL 2008 Cluster VMs in the CIB (see my previous post about various Microsoft Cluster VMs configuration). The shared disks of the cluster VMs were on an EMC SAN. When the free space of EMC SAN was running low, an engineer migrated the cluster VMs (the VMs were powered off during the migration) to the VSAN v.6.1 hosts and storage. The migration completed successfully, but the VMs would not power on with the error message “Cannot use non-thick disks with clustering enabled (sharedBus='physical'). The disk for scsi1:0 is of the type thin.”

Because VSAN does not support Microsoft Cluster with the shared disk (non shared disk cluster, e.g. SQL AlwaysOn Availability Group is supported), this is no option but migrating the VMs back to the original hosts and SAN storage.

PS: In this case, the new target storage is VSAN. I think if the new target storage were the traditional SAN,  the cluster would break too. Because the cluster VMs were not shared anymore after the migration (see below). But you probably could recover the cluster by reconfiguring the VMs to share the shared disks without migrating the VMs back to the original storage.

When we reviewed the disks of the migrated VMs on the VSAN storage, each VM had its own copy of the shared disks. So the cluster VMs were not shared the shared disks any more. We could not simply migrate the VMs back to the original hosts and SAN storage.

When we reviewed the original EMC SAN storage, the VMDK files of the shared disks were still left there, only the non shared disk (e.g. the OS’s C drive) was completely migrated to the VSAN storage.

vmdk.files.left.on.the.san

Recovery Procedure:

  1. Document the SCSI controller ID (e.g. SCSI (1:0)) of each shared disk from the migrated VMs. This may not be very important. But we are going to use the same SCSI controller for each corresponding disk when re-adding the shared disks
  2. Since the VMDK files of the shared disks were still left on the original SAN storage, we can speed up the recovery by migrating the non shared disks of each VMs only. In this case, we are only migrating the hard disk 1 of each VM (the OS drive) back to the original SAN.
  3. How to migrate only the OS drive back to the original host and storage? We used VMware vCenter Converter, and only select the hard disk 1. This worked beautifully.
    • vmware.converter.select.os.drive.only
  4. PS. In this case the VMs were migrated to the VSAN storage. We could not use scp to copy the VMDK file manually between the hosts. If we want to use scp, we need to migrate the VMDK files to a non-VSAN storage first. This is why I think vCenter Converter is the best tool in this case.
  5. Now the non-shared disk of each VM are back to the original host and SAN storage. Make sure both VMs are registered on the same ESXi host.
  6. If the VMs were not on the same ESXi host, use Migrate, Change host, check the checkbox “Allow host selection with this cluster” (this option is not selected by default) to put both VMs on the same ESXi host.
    • vm.migrate.allow.host.selection
  7. Re-add the SCSI controller(s) to the first VM and set the SCSI Bus Sharing to Virtual
  8. Re-add the shared disks using the existing VMDK files to the first VM; match the SCSI ID documented in the first step. We also make sure the order of the hard drives matching the original VM’s configuration
    • re-add.hard.drive.with.existing.vmdk 
  9. Power on the first VM
  10. Log in Windows and verify the shared drives’ drive assignments are correct
  11. Launch Failover Cluster Manager to verify the cluster services and applications are online
  12. Re-add the SCSI controller(s) to the second VM and set the SCSI Bus Sharing to Virtual
  13. Re-add the shared disks using the existing VMDK files to the second VM; match the SCSI ID documented in the first step
  14. Power on the second VM
  15. Log in Windows and verify no shared drive is shown in Windows Explorer, and they should be shown “reserved” in the Disk Management
  16. Launch Failover Cluster Manager to verify the second node is online

Fix A SAN Datastore Inaccessible On A ESXi Host

A SAN datastore is shown inaccessible on one of the ESXi hosts in the cluster. Other ESXi hosts can access that datastore without problem.

esxi.host.datastore.inaccessible

Solution: restart the ESXi management agents on the ESXi host

There are a few ways to restart the management agents (KB1003490).

  • From the Direct Console User Interface (DCUI)
    • Press F2 to customize the system
    • Log in as root
    • Under Troubleshooting Options, select Restart Management Agents
  • From the Local Console (Alt + F2) or SSH
    • Log in as root
    • run these commands
      • /etc/init.d/hostd restart
      • /etc/init.d/vpxa restart
      • if the hostd is not restart, use KB1005566 to find and kill hostd Process ID (PID), then start it again (/etc/init.d/hostd start)
  • alternatively
    • To reset the management network on a specific VMkernel interface, by default vmk0
      • esxcli network ip interface set -e false -i vmk0; esxcli network ip interface set -e true -i vmk0
      • Note: run the above commands together, using a semicolon (;) between the two commands
    • To restart all management agents on the host
      • services.sh restart
      • Caution:
        • check if LACP is enabled on the VDS’s Uplink Port Group
        • If LACP is not configured, the services.sh script can be safely executed
        • If LACP is enabled and configured, do not restart management services using services.sh. Instead restart independent services using /etc/init.d/hostd restart and /etc/init.d/vpxa restart.
        • If the issue is not resolved, take a downtime before restarting all services with services.sh

Backup Consistency Types

This post is to summarize the various backup consistency types:

  • Inconsistent Backup
    • Any file changed after it was backed up but before the job completed, the result is an inconsistent backup
    • e.g. File A and B, File A is backed up, then File A and B are changed, then File B is backed up. Now the backup of File A and B is inconsistent
    • The content in memory or pending I/O is not backed up
  • Crash-Consistent Backup
    • All data is backed up at exactly the same time via techniques like Volume Shadow Copy Service (VSS) to take a block-level snapshot, and then the backup software pulls its backup from that snapshot
    • This backed up data is in the same state it would have been if the system had crashed
    • The content in memory or pending I/O is not backed up
    • Many applications, like Active Directory, have an automated recovery mechanism and will attempt to handle the inconsistent problem without administrator intervention. If these automated systems aren’t successful, a manual process is needed. For Microsoft SQL, you may need to know how to replay logs into a database file.
  • Application-Consistent Backup
    • For Windows applications, the application manufacturer provide a VSS writer. When the VSS service is triggered, it will notify these writers that a backup is occuring. Then it’s up to the VSS writer how to handle it.
    • A proper VSS writer will make the application flushing all of its memory and I/O operations to the disk, as it would be if the application were properly closed
    • When the VSS snapshot is complete, it signals the VSS writers, then application resume normal operation and the backup software pulls its backup from that snapshot
    • If an application does not provide or properly register a VSS provider but its data resides on a volume with VSS enabled, the data is backed up in a crash-consistent state
  • Image-Level Backup
    • The other backups back up when a machine is actively running
    • An image-level backup backs up when the machine is shut down

Source: VSS Crash-Consistent vs Application-Consistent VSS Backups

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “ Use WinSCP to Transfer Files in vCSA 6.5 ”. When I try the same SFTP server setting in vCSA 6.7...