Search This Blog

Configuring VCSA 6.5 Backup Lessons Learned

vCenter Server Appliance (vCSA) 6.5 comes with the built in backup functionality. Starting a backup is quite easy - login the vCSA web console and click Backup button on the Summary page (see this post for the step-by-step screen shots).
Even it looks a very simple task, I have learned a few lessons when configuring the vCSA backup.
Lesson #1: vCSA backup location is <host_name>/<folder_name>
If using FTP protocol, the backup location is not just the FTP server host name or IP address; it MUST include the folder name. There is a “/” between the host name and folder name.
Otherwise, the error message is “FTP location is invalid”.
vCSA.Backup.FTP.Location.Is.Invalid
Lesson #2: vCSA backup supports the FTP virtual host name if entering the username correctly - <ftp virtual hostname>|<ftp username>
See my Lesson #2 in “Setting Up IIS 8 FTP Server Lessons Learned” about the FTP virtual host name login. There is a “|” between the hostname and username.
Otherwise, the error message is “Access to the remote server is denied. Check your credentials and permissions”.
vCSA.Backup.Access.to.The.Remote.Server.Is.Denied
Lesson #3: Use curl to troubleshoot vCSA backup error
After entering the correct settings, vCSA backup wizard validates the settings and starts the backup. The backup fails with “BackupManager encountered an exception. Please check logs for details”, but it does not provide much details or the location of the log file.
vCSA.Backup.BackupManager.Encountered.An.Exception
After some digging, I found the backup log file in /var/log/vmware/applmgmt/backup.log. In the log file, there is a curl error “Connection time-out”.
vCSA.Backup.Backup.log
This gives me a hint that vCSA backup uses curl to transfer the backup file from vCSA to the FTP location. Recently I am also learning curl to transfer file, so I’m a little familiar with curl. (I will publish what I learn from curl in a future post).
From vCSA console, enter “curl -u <ftp user>:<password> -l <ftp server>”. It should list the file and directory on the FTP server. But I got the timeout error. I also tried running curl on a Windows computer, and got the timeout error too. This leads to me think the problem is on the FTP server. Finally the fix is to restart the FTP service (see Lesson #1 on “Setting Up IIS 8 FTP Server Lessons Learned”).
I am not sure why the wizard was able to successfully validate the FTP server setting when the FTP server connection is blocked by the Windows Firewall. When troubleshooting the Windows Firewall, I thought I could use the FTP command to connect to the FTP site, but using curl would fail. I’m not 100% sure about this, since I can’t replicate the issue again. After restarting the Microsoft FTP service, everything is working okay.
Anyway, using curl is the best tool to troubleshoot the vCSA backup failure.
Lesson #4: vCSA backup location must be an empty folder
After successfully running a backup, I try running the backup one more time with the same setting. I got the following error. (PS. In the screenshot below, I removed the virtual hostname on the FTP site, so I can just use the username).
vCSA.Backup.Location.Folder.Is.Not.Empty

Setting Up IIS 8 FTP Server Lessons Learned

To test vCSA 6.5 built-in backup, I need a FTP server. Since I already have a Windows Server 2012 R2 running IIS 8 with web service, adding the FTP server feature is just a few clicks.

Even I have not used the Microsoft FTP server since IIS 6, and there are lot of changes between IIS 6 and IIS 8, I thought setting up the FTP server should be a piece of cake. I was wrong! The following are what I have learned on setting up the FTP server in IIS 8.

Lesson #1: Windows Firewall

After installing the FTP service and creating a new FTP site in IIS Manager, I can’t connect to the FTP site from a remote computer; FTP from the server to itself is okay. It must be a Windows firewall issue.

  • I check the Windows Firewall’s Inbound Rules, three FTP rules are created and enabled; and Outbound Rules, two FTP rules are created and enabled. I guess they are automatically created by the FTP service installation. These rules look right, but I still can’t connect from a remote computer.

Windows.Firewall.Inbound.Rule.FTP

Windows.Firewall.Outbound.Rule.FTP

  • Disabling the Windows Firewall on the server, I can connect. This confirms the Windows Firewall causing the issue, but what is the problem? I don’t want to disable Windows Firewall.
  • The default FTP rules are allowed the program “%windir%\system32\svchost.exe”. I’m not sure what the executable runs the FTP service. (Later, I find it via Microsoft FTP Service, General, Path to executable: “C:\Windows\system32\svchost.exe -k ftpsvc”)
  • I created my own FTP rules required in my case - two inbound rules and one outbound rule (highlighted in above pictures) with the same protocol and port number, except that I allow any program. This works! I can connect to the FTP site from a remote computer. (Actually, see Lesson #2 below - it’s not fully working yet. I get another error after entering the login name).
  • I think the default FTP rules don’t work, until I find this post.
  • I delete the FTP rules I created, and restart the “"Microsoft FTP Service”. The FTP connect is still working.

Summary:

  • When troubleshooting issues related to Windows Firewall, restart the application service or the server after adding or changing the rules.
  • Restarting the FTP site in IIS Manage does not work; disabling and enabling the firewall or rule does not work. Restarting the FTP service is required.

Lesson #2: FTP site virtual host name

After the connection problem is resolved (see lesson #1), I continue further on the FTP login. However, after entering the user name, I get the error message “530 Valid hostname is expected. Login failed”.

FTP.Valid.Hostname.Is.Expected

After searching the error message, I learn about the FTP virtual host name

In the past I had used the IIS web site virtual hostname to handle multiple web sites on a single IP address and port number. But I don’t recall if the FTP service in IIS 6 has the host name option. When creating the FTP site, I entered the DNS name of the FTP site as the host name.

FTP.Host.Name

Summary:

  • use <ftp virtual hostname>|<ftp username> as the login name for the FTP server uses the virtual hostname
  • FTP.Virtual.Hostname.Login
  • If you are not going to run multiple FTP sites on the same IP address and port number, leave the host name blank.

VUM 6.5 “Cannot download patch definitions” via UMDS 6.5 Work Around

My nested vSphere lab environment does not have the access to the Internet (there is no physical network adapter as the uplink on the lab port group). To update and patch the ESXi host using vSphere Update Manager (VUM), I installed Update Manager Download Service (UMDS) on a Windows Server VM with dual NICs - one for Internet, another for the lab port group. Use the UMDS to download the update, configure IIS as the web server for the update repository, and configure VUM to use the http share repository. It worked fine in vSphere 6.0 and 6.2.

Recently I upgraded the lab environment to vSphere 6.5. The vCenter Server Appliance and ESXi hosts are upgraded to 6.5 successfully.

vCenter Server Appliance 6.5 bundles with VUM. VUM no longer requires a Windows Server. For UMDS, it can be installed on a Windows or Linux server. Since I already have a Windows server for UMDS. I continue using it instead of setting up a Linux server. But UMDS can’t be upgraded from the previous version to 6.5.

I uninstalled UMDS 6.0 and SQL Server 2012 Express on the server, and installed UMDS 6.5 with SQL Server 2012 Express from the ISO. I used the UMDS 6.0 repository folder for UMDS 6.5 and configured UMDS to download the host update only. Since IIS is already set up and I used the same repository folder, no change is needed in IIS. UMDS 6.5 successfully downloaded the update files from VMware.

I configured the VUM 6.5 to use the IIS shared repository. VUM validated the URL successfully. However, when I clicked “Download Now” button in VUM, I got the “Cannot download patch definitions” error.

Troubleshooting

umds.missing.6.5.metadata

Solution

  • A similar issue happened in vSphere 5.5 (KB2061622)
  • I can’t find a patch available for ESXi 6.5 yet, but I can apply the similar work around by removing <metadata> reference for vmw-ESXi-6.5.0-metadata.zip in
    hostupdate\vmw\__hostupdate20-consolidated-metadata-index__.xml
  • Now the download patch definition is successful when I click “Download Now” button in VUM
  • However, running UMDS again will re-add the vmw-ESXi-6.5.0-metadata.zip reference, and VUM will fail to download the patch definition
  • This may be a bug in vSphere 6.5 (like vSphere 5.5). Hope VMware can fix in the next update

Stop A Task Stuck in vCenter Server Appliance

  1. find out the name of vSphere host running the stuck task if possible
  2. SSH to the vCenter Server server appliance
  3. service vmware-vpxd restart
    • After restart the vmware-vpxd service on the vCSA, the stuck task should disappear from the vSphere web client
    • However, the task may be still running on the vSphere host
    • Use the following steps to stop the stuck task on the vSphere host
  4. SSH to the vSphere host running the task
  5. /etc/init.d/hostd restart
  6. /etc/init.d/vpxa restart

VSAN 6.2 On-disk Format Upgrade Fails at 5%

I am working on upgrading our VSAN from 6.1 to 6.2. See this from the upgrade step overview.

After upgrading each VSAN host to ESXi 6.0U2 (the latest build 4510822 as of 11/01/2016), the last step is to upgrade the on-disk format from v2 to v3.

In our case, the on-disk format upgrade fails at 5% with the error message “General Virtual SAN error. Disk Format conversion failed due to unexpected error”.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.01

However, check the disk format in VSAN cluster, Manage, Settings, Virtual SAN / Disk Management. A disk group is upgraded to the interim version 2.5 each time I run the on-disk format upgrade. In the screen shots below, I ran the on-disk format upgrade twice. Two of the disk groups are upgraded to v2.5.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.02

I keep running the on-disk format upgrade. In our VSAN, we have 4 hosts with 2 disk groups on each node. The on-disk format failed six times. On the seventh time, all disk groups are upgraded to v2.5.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.03

Then the upgrade moves forward to the next process - starting remove disks from one of the VSAN host.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.04

I have not figured out the cause of the failure. Re-running the upgrade process until all the disk groups are upgraded to the format v2.5 is able to keep the process moving forward.

VMware Tools Stuck in “Upgrade in progress” Fix

I notice the VMware Tools status on some VMs (mostly Linux) is “Upgrade in progress”.

vmware.tools.upgrade.in.progress.01

For these VMs, I cannot vMotion them to another host; and some of these VMs, the “Edit Settings” and “Edit Resource Settings” are grayed out.

Solution:

  • Find the ESXi host running the VM
  • Use vSphere C# Client to connect to the ESXi host directly; Do not connect to the vCenter Server.
  • Locate the VM in the vSphere C# Client
  • Right-click on the VM, Guest, End VMware Tools Install

vmware.tools.upgrade.in.progress.02

  • VMware Tools status changes back to running.

vmware.tools.upgrade.in.progress.03

Then I can vMotion the VM or run the VMware Tools installation again.

Fix App Stuck on Updating in IOS

The Google Keep app on my iPhone 5s with IOS 9.3.3 is stuck on updating. I could neither launch the app nor delete it normally by holding down the icon (the X icon showed up, but nothing happened when clicking it). I tried the following fixes on the web, but no luck.

  • cancel the update and retry updating
  • reboot the phone and delete the app or retry updating
  • log out the App store on the phone (Settings, iTunes & App Store, Apple ID, Sign Out) and log in and retry updating

Some other suggestions are to reset the phone or connect the iPhone to a computer with iTune and delete the stuck app from iTune. I don’t want to either of these.

Luckily I found the following solution to delete the stuck app and re-download it from the App store.

  • Navigate to Settings, General, Storage & iCloud Usage, Manage Storage
  • Find the stuck app and click on it
  • Click Delete App
  • Then go back to App Store and re-download it under Updates, Purchased

vCSA “Syslog endpoint servername:514 is unreachable” Error

I am configuring the vCSA syslog to a third-party syslog server (e.g. a Splunk forwarder) via UDP port 514 (see the instruction in http://www.virtuallyghetto.com/2015/03/a-preview-of-native-syslog-support-in-vcsa-6-0.html). The syslog server receives the log from the vCSA. However, the VMware Syslog Service Health Messages reports a “Syslog endpoint servername:514 is unreachable” critical error.

It turns out the vCSA syslog uses the TCP port 514 for the syslog server health check. Since my syslog server (like many normal syslog servers) only licenses on the UPD port 514, the vCSA health check reports the syslog sever is not reachable.

Solution

  • Find a TCP port that the syslog server is licensing. Any licensing TCP port should work, it does not have to relate to the syslog.
  • SSH to vCSA
  • cd /etc/vmware-syslog
  • vi vmware-syslog-health.properties
  • Change the “cls.strata.ping.port” setting to the TCP port licensing on the syslog server (the default is 514)
  • Save the setting
  • Restart the VMware Syslog Service
  • Check the VMware Syslog Service Health Messages, it should show “Syslog endpoint <servername>:<tcp port> reachable”

One More Reason Not to Disable IPv6

Almost every existing operating system supports IPv6 and enables it by default nowadays. Some system admins still like to disable IPv6, because they think they would not deploy IPv6 in the near future. However, disabling IPv6 can be against the software vendor recommendation or experience the unexpected bug.

For example, Microsoft do not recommend disabling IPv6 in Windows. See “IPv6 for Microsoft Windows FAQ” and “How to disable IPv6 or its components in Windows”.

Recently, VMware ESXi 6.0.x has a known issue when IPv6 is disabled. See “Provisioning the TCP/IP stack does not work when IPv6 support is disabled on the host (2146023)

To avoid the unexpected issue, we should  leave IPv6 enabled (the default).

Connect to vCSA using WinSCP

The default shell of the vCSA is the Appliance Shell (/bin/appliancesh), which doesn’t work with WinSCP.

There are two solutions to work around this issue:

  1. Change the default shell of the root account to the Bash shell (/bin/bash)
  2. Configure WinSCP to use the SFTP protocol (yes, SFTP; not SCP) with the shell setting “shell /usr/lib64/ssh/sftp-server”

PS. Both of these solutions require enabling SSH login and Bash shell on the appliance.

  • For the solution #1, here are the commands (see VMware KB2107727 for the full instruction).
    • SSH to vCSA
    • shell.set --enable True
    • shell
    • chsh -s /bin/bash root
    • change back: chsh -s /bin/appliancesh root
  • For the solution #2 (credit to http://www.v-front.de/2015/03/vcsa-60-tricks-shell-access-password.html), the previous site does not provide the details on configuring WinSCP. I had an issue when setting up the first time, and someone also commented the shell trick does not work anymore. So I document the step-by-step instruction below.
  • Personally, I prefer the solution #2. Since I don’t need to mess-up the default shell of the root account.

The following instruction tested on vCSA 6.0 update 2 (6.0.0.20000) with WinSCP version 5.9 (build 6786).

  • Login vCSA web console (https://<vcsa-server>:5480)
  • Under Access, click Edit, select the checkboxes for “Enable ssh login” and “Enable bash shell”
    • vcsa.web.console.01
  • Change the Timeout value if necessary
    • vcsa.web.console.02
  • Create a new site in WinSCP
  • Select “SFTP” under File protocol, type the vCSA host name, root and its password.
    • winscp.config.01
  • Click the Advanced dropdown to edit the Advanced Site Settings
    • winscp.config.02
  • Under Environment, SFTP, SFTP server, enter “shell /usr/lib64/ssh/sftp-server” (without quotes)
    • winscp.config.03
  • Click OK and Save the setting, and click Login

Event ID 36886 “No suitable default server credential exists on this system” Fix

Recently, we created a new child domain in the existing AD forest with two new Windows Server 2012 R2 domain controllers. The AD authentication and AD replication between DCs are working fine.

Today, we are trying to set up a third party app (Splunk) with the secure LDAP authentication to the child domain AD. The child domain DC servers are hardened to require signing on the LDAP server signing requirements policy.

However, we get an error “the connection reset by the peer” in the third party app’s LDAP connection test. On the DC server, there is a warning in System event - Event ID 36886 “No suitable default server credential exists on this system. …”

ldaps.authentication.error.01

Troubleshooting

  • According to MS KB321051, “The LDAPS certificate is located in the Local Computer’s Personal certificate store.”
    • Open the Certificates MMC for the Local Computer on the child domain controller. There is a server cert for this domain controller.
    • However, this cert is not fully trusted because the root CA cert is not trusted, which it is caused by not in the Trusted Root Certification Authorities store.
    • ldaps.authentication.error.03
    • I guess this server cert is created / issued when promoting the server to DC, and the cert is issued by the internal Windows Server 2008 enterprise intermediate CAs. (There is a Windows Server 2008 enterprise root CA and intermediate CA in the AD forest.)
    • I find the root CA cert is in the Intermediate Certification Authorities store. I’m not sure why. (The enterprise root CA and intermediate CA are set up by someone else.) This is the cause of the issue.
    • ldaps.authentication.error.04

Solution

  • There are two ways to put the root CA cert back to the trusted root CA store.
    • I can copy and paste the root CA cert from the intermediate cert store to the trusted root CA store.
    • I download the CA certificate chain, open the root CA cert, and install it on the child DC server. Make sure specify the store location is the Local Machine and the Trusted Root Certification Authorities. The default automatic selection will place the root CA cert in the intermediate CA store again.
  • Once the root CA cert is in the right store location, the child DC’s cert shows trusted. The LDAPS connection test in the third party app is successful.
  • ldaps.authentication.error.05

Configuring Windows Firewall Settings in Two GPOs May Corrupt the GPO

On my group policy implementation, in addition to learning about the security setting persistence in GPO, I also observe the “corrupted” GPO breaks the GPO replication between two Windows Server 2012 R2 domain controllers.

Here are the group policies applied to the domain controller OU.

DC.GPO.01 

  • Default Domain Policy and Default Domain Controller Policy are untouched as the Windows default
  • CDE Domain Policy is applied at the domain level with some firewall rules to allow the antivirus server to communicate with the antivirus client on the servers; and the Windows firewall state is left to “Not Configured” (the default). So the administrator on the server can turn off the firewall when it’s needed.
  • PCI Win2012 R2 Hardening - Domain Controller is applied at the Domain Controllers OU level with the PCI compliance settings, including the Windows firewall state to “On”. So the Windows firewall cannot be turned off.

One of the new features in Windows Server 2012 R2 is to detect the GPO replication in Group Policy Management. After configuring and applying the new GPO, I check the status of the new GPO, it shows this GPO is not replicated from the first DC to the second DC with the error “SysVol Inaccessible”.

DC.GPO.02

Troubleshooting

I unlink the new GPO from the domain controller OU, it still shows the same error. As I learn from the previous post, unlinking the GOP does not mean rolling back all the security setting applied to the server.

From the first DC, I can browse to the SysVol share on the second DC via Windows Explorer. Other GPOs are still in sync between the two DCs. So the communication between the two DCs should be fine.

Searching this error on the web, I don’t find any direct relate to this error. Some post does give me an idea that the GPO may corrupt.

I back up and delete the GPO, and verify the replication between the two DCs in sync - selecting the domain name and clicking the status tab in Group Policy Management.

After deleting the GPO, the GPO folder is still left on the drive. This prevents restoring the GPO. I have to manually delete the GPO folder from C:\Windows\SYSVOL\domain\Policies. I also get the access denied error when deleting the folder. Just wait and give enough time for the deletion replication to complete, then I can delete the GPO folder. This also removes this GPO folder from other DCs.

Then I restore the GPO from the backup. When reviewing the settings in the restored GPO, I notice the Windows firewall settings, under Computer Configuration, Policies, Windows Settings, Security Settings, Windows Firewall with Advanced Security, are reverted back to the default (not configured). All my customized Windows firewall settings are gone, but other group policy settings are not impacted.

This leads me to think the corrupted GPO issue (if it is truly corrupted and the replication fails) causing by the Windows firewall settings configured in two GPOs, even there is not conflict in their GPO setting.

To prove my thought, I reconfigure the firewall setting in the new GPO again. Bang! this new GPO is not in sync between the two DCs again!

Then I do the fix again - deleting the GPO and restoring the GPO. This time, I configure the firewall setting in the “CDE Domain Policy” only. All GPOs are in sync between the DCs.

DC.GPO.03

Conclusion

  • I am able to duplicate the GPO sync issue between the two DCs if the Windows firewall settings are configured in more than one GPO.
  • Configuring all the firewall settings in one GPO does not have the GPO sync issue.
  • The GPO sync issue may cause by the corrupted GPO. I don’t get the corruption or sync error (other than SysVol inaccessible) error, like other posts on the internet. So I cannot prove the GPO is corrupted.
  • I don’t find any KB or post against configuring firewall settings in multiple GPOs. The closest one I found is this post. It just says merging the firewall setting under the older “Windows Firewall” section and “Windows Firewall with Advanced Security” section may have unpredictable results. In my setup, I only configure the settings in “Windows Firewall with Advanced Security”.
  • I don’t know the root cause of the GPO sync issue. For now, I will configure all the Windows firewall settings in only one GPO.

GPO Security Setting Persistence

For security compliance purpose, I want to enforce some Windows settings on the Windows server via the group policy. Some GPOs already exist in the domain. Instead of modifying the existing GPOs, I create a new GPO and link it to the OU where the servers locate. I think I can easily roll back / undo the new settings if they cause any issue, by moving the server out of the OU or unlinking the GPO to the OU.

It turns out my original thought is not 100% correct. Some security settings still persist even if the setting is no longer defined in the policy.

For example, on a new Windows Server 2012 R2 server, the local security policy setting of “Network Security: LAN Manager authentication level” is “Not Defined”. In my new GPO, I configure this setting to “Send NTLMv2 response only; Refuse LM & NTLM”. After linking this new GPO to the OU, I cannot RDP to the servers with the domain login (the error message is “The logon attempted failed”, even my username and password are correct), but the local login is okay.

I unlink the new GPO from the OU for troubleshooting and expect the security settings rolling back to their original configuration. However, I get the same error. And the local security policy on the server still configures “Send NTLMv2 response only; Refuse LM & NTLM”. I reset this setting on the local security policy back to the default “Not Defined”. That solves the problem.

Some posts give me the idea on the fix: post 1, post 2.

Now I learn the “tattooing” behavior on the security settings in the GPO.

Persistence in security settings

Security settings may still persist even if a setting is no longer defined in the policy that originally applied it.

Persistence in security settings occurs when:

  • The setting has not been previously defined for the computer.

  • The setting is for a registry object.

  • The setting is for a file system object.

All settings applied through local policy or a Group Policy Object are stored in a local database on your computer. Whenever a security setting is modified, the computer saves the security setting value to the local database, which retains a history of all the settings that have been applied to the computer. If a policy first defines a security setting and then no longer defines that setting, then the setting takes on the previous value in the database. If a previous value does not exist in the database, then the setting does not revert to anything and remains defined as is. This behavior is sometimes called "tattooing."

Registry and file settings will maintain the values applied through policy until that setting is set to other values

Reference: Administer Security Policy Settings, “Persistence in security settings”

How to Convert Eager Zeroed Thick Disk to Lazy Zeroed Think Disk

According to the VMware KB2145183, an Eager Zeroed Thick disk cannot be directly converted / migrated / cloned to Lazy Zeroed Think disk. The work around is:

  1. Convert the Eager Zeroed Think disk to Thin
  2. Convert the Thin disk to Lazy Zeroed

The vmkfstools can be used for the conversion (see my previous post). Here is the commands:

  1. vmkfstools -K <vmdkFile>
  2. vmkfstools -j <vmdkFile>

Error “Idm client exception: Error trying to join AD, error code [11]” when joining a VCSA to AD domain

On a newly created VCSA appliance, I got the following error when joining it to the Active Directory domain

vcsa.joining.ad.error

I used the domain’s netbios name\user-name as the user name.

Fix: use the User Principal Name (UPN), user-name@fqdn-domain-name, as the user name. After joining the domain, reboot the VMware Platform Service Controller (PSC).

Troubleshoot vMotion Error 195887167

Updated on 07/13/2016. See the update this post, I might find the ultimate solution, even I am still not sure what the cause of the issue.

Recently I had an issue to vMotion some VMs between the vSphere v 6.x cluster hosts. Long story short, here are the symptoms:

  • I consistently got the error “Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout” when vMotioning (host only, no storage vMotion) on some VMs, particularly the vCenter Server Appliance VM. I had two vCSA VMs. Both were have the same issue. But vCSA is not the only VM that I got the error.
  • I successfully vMotion some VMs between the hosts. The vMotion network configuration should be okay.
  • In other words, some VMs are okay; some are not. The size (CPU, RAM, storage) of the VMs does not seem the problem. The successfully vMotioned VMs can have more/less CPU, RAM, storage than the failed VMs.
  • The failed VM has more disks than other VMs. The vCSA VM created 11 disks by default.
  • When the VMs were powered off, vMotion successfully.
  • Restarted the hosts and restarted the VMs. No difference.
  • Verified no IP address conflicts.
  • Tried one VMkernel adapter for both management and vMotion or a dedicated VMkernel adapter for vMotion. No difference.
  • Tested vmkping successfully between the hosts.
  • In the vmkernel.log of the hosts, the error is “2016-05-18T22:47:34.959Z cpu15:39089)WARNING: Migrate: 270: 1463611379229538 D: Failed: Connection closed by remote host, possibly due to timeout (0xbad003f) @0x418000e149ee” on the destination host or “2016-05-19T18:29:23.930Z cpu1:130133)WARNING: Migrate: 270: 1463682486286991 S: Failed: Migration determined a failure by the VMX (0xbad0092) @0x41803a7f6993” on the source host.

Possible solutions

  • Remove the snapshot on the VM if it has one
    • After removing the snap shot on one of vCSA VMs, vMotion worked fine. But another vCSA had no snapshot, it still failed.
  • Try using the vSphere Client instead of the vSphere Web Client. This worked on some VMs, but not always.
  • Assign VM’s network adapter to different port group; and change back to its original port group
    • This seems the ultimate fix. After doing this, the vCSA VMs, which failed vMotion consistently, are vMotioned successfully.

Conclusion

07/13/2016 update

  • I run into the same error when migrating some VMs, including the vCenter 6.x appliance (vCSA), between hosts in the Vmware cluster. I am able to migrate other VMs. This leads me to believe the problem on the VM, instead of the VM infrastructure.
  • Within the VMs having this error, some of them, which I can power off, are migrated successfully. However, I can not power down the vCSA VM. Because I cannot perform the migration without the vCenter available.
  • I try assigning the NIC of the vCSA VM to different port group, and change back. However, I cannot do that, because this VM is the vCenter server and is configured with a distributed switch. If I change the vCSA to different port group (configured with a different VLAN), the vCenter server will be down (because its NIC is assigned to the wrong port group with the wrong VLAN); and I cannot change it back to the original port group with the right VLAN.
  • I try connecting directly to the ESXi host of the vCSA VM via the vSphere C# client, then assign the NIC of the vCSA VM to different port group. However, I cannot do that either. Because the host is only configured with the distributed switches, there is no other port group in the selection at this situation.
    • vmotion.error.01
  • I try create a new port group with the ephemeral port binding on the distributed switch. This new ephemeral port group is available in the vSphere C# client when connecting to the host directly. Then I assign the VM to the ephemeral port group and change back. However, the migration still fails with the same error.
  • Since I could fix this issue last time by changing the port group and changing back, I guess that somehow reset the NIC on the VM or the virtual switch port to which the VM is connected. That gives me an idea to manually assign the VM to another virtual switch port.
    • vmotion.error.02

Solutions

  • In the screenshot above, the NIC of the vCSA is assigned to port 214 on the vSwitch.
  • Log back in the vCenter via the vSphere C# client or Web client (I cannot see the ports on the distributed switch, nor change the port assigned to the VM’s NIC when connecting to the ESXi host via the vSphere C# client)
  • I find an used port on the same port group (e.g. port 423 in my case).
    • vmotion.error.03
  • Edit the vCSA VM setting and assign its NIC to the unused port.
  • Then I can successfully migrate the vCSA VM to another host.

Conclusion

  • Changing the NIC to an used port will be my first attempt when this issue happens again (I bet it will happen).
  • I still don’t know the cause of the this issue.

10/15/2019 update

  • I got the exact error again when Storage vMotion a VM across two vCenter (from vCSA 6.0 to vCSA 6.5). Deleted the VM snapshot, reran the vMotion, and completed successfully.

Change Windows Server 2008 or 2012 Network Profile

Sometimes a Windows server is assigned to the incorrect network profile. It can cause applying the wrong Windows Firewall rules. Here is how to change its network profile.

For standalone server

  • Can change the profile to public or private; but cannot set to domain
  • For Windows Server 2012
    • Open PowerShell as administrator
    • Get-NetconnectionProfile | Set-NetconnectionProfile -NetworkCategory [Private | Public]
  • For Windows Server 2008 or 2012
    • gpedit.msc, Computer Configuration, Windows Settings, Security Settings, Network List Manager Profiles
    • Select the network name, Properties, Network Location
    • Under Location Type, select Private or Public

For domain joined server

Extend Microsoft Cluster Shared Disk in VMware

A VM shared disk on Microsoft Cluster Service (MSCS) is running out of disk space. The VMs are on a single host (aka cluster in a box - CIB). I can think of two ways to expand the disk storage.

  • create a new big shared disk for the cluster, migrate the data, then change the new disk to the same drive letter as the original disk
  • extend the size of the existing shared disk

Obviously the latter seems simpler, but it requires special attention. The shared disk format in MSCS VMs must be in eager zeroed thick format. However, when extending an eagerzeroedthick VMDK, the extended chuck is in lazy zeroed thick format by default (reference “Extending an EagerZeroedThick Disk”. In my test, vSphere 6 has the same behavior)

Here is how I extend the MSCS shared disk

  • Power off both servers in the cluster
  • Increase the VMDK disk size. There are two ways:
    • GUI: edit the VM settings, increase the shared disk size
    • CLI: use vmkfstools -X <newsize> -d eagerzeroedthick <vmdkfile>
  • Using the GUI, the extended chuck will be in lazy zero thick format. The VM will fail to power on with the error “VMware ESX cannot open the virtual disk for clustering…”

cluster.vm.power.on.error

  • There are two ways to convert the extended chuck to eagerzeroedthick format
    • Migrate the VM to another storage, and specify the eager zero thick format for the disk
    • Use vmkfstools -k <vmdkfile>
      vmkfstools.convert.eagerzeroedthick
  • Once the entire shared disk is the eager zeroed thick format, the VM will be able to power on.
  • Extend the Windows partition as KB304736

vmkfstools Examples

When searching an issue on expanding a shared disk on Microsoft clustering VMs (CIB), I have learned more about the vmkfstools command.

The vmkfstools --help displays many options, but lack of explanation. So I document them here. (reference: vSphere Storage, Using vmkfstools)

# vmkfstools --help

OPTIONS FOR FILE SYSTEMS:

vmkfstools -C --createfs [vmfs3|vmfs5]
               -b --blocksize #[mMkK]
               -S --setfsname fsName
           -Z --spanfs span-partition
           -G --growfs grown-partition
   deviceName

           -P --queryfs -h --humanreadable
           -T --upgradevmfs
   vmfsPath
           -y --reclaimBlocks vmfsPath [--reclaimBlocksUnit #blocks]

OPTIONS FOR VIRTUAL DISKS:

vmkfstools -c --createvirtualdisk #[gGmMkK]
               -d --diskformat [zeroedthick
                               |thin
                               |eagerzeroedthick
                               ]
               -a --adaptertype [buslogic|lsilogic|ide
                                |lsisas|pvscsi]
               -W --objecttype [file|vsan]
               --policyFile <fileName>
           -w --writezeros
           -j --inflatedisk
           -k --eagerzero
           -K --punchzero
           -U --deletevirtualdisk
           -E --renamevirtualdisk srcDisk
           -i --clonevirtualdisk srcDisk
               -d --diskformat [zeroedthick
                               |thin
                               |eagerzeroedthick
                               |rdm:<device>|rdmp:<device>
                               |2gbsparse]
               -W --object [file|vsan]
               --policyFile <fileName>
               -N --avoidnativeclone
           -X --extendvirtualdisk #[gGmMkK]
               [-d --diskformat eagerzeroedthick]
           -M --migratevirtualdisk
           -r --createrdm /vmfs/devices/disks/...
           -q --queryrdm
           -z --createrdmpassthru /vmfs/devices/disks/...
           -v --verbose #
           -g --geometry
           -x --fix [check|repair]
           -e --chainConsistent
           -Q --objecttype name/value pair
           --uniqueblocks childDisk
   vmfsPath

OPTIONS FOR DEVICES:

           -L --lock [reserve|release|lunreset|targetreset|busreset|readkeys|readresv
                     ] /vmfs/devices/disks/...
           -B --breaklock /vmfs/devices/disks/...

vmkfstools -H --help

vmkfstools Command Syntax

vmkfstools options target

Options: separate into three types - File System Options, Virtual Disk Options, and Storage Device Options.
Target: partition, device, or path

File System Options

  • Listing Attributes of a VMFS Volume
    The listed attributes include the file system label, if any, the number of extents comprising the specified VMFS volume, the UUID, and a listing of the device names where each extent resides.
    vmkfstools -P -h <vmfsVolumePath>
    vmkfstools -P -h /vmfs/volumes/netapp_sata_nfs1/
  • Creating a VMFS Datastore
    vmkfstools -C vmfs5 -b <blocksize> -S <datastoreName> <partitionName>
    vmkfstools -C vmfs5 -b 1m -S my_vmfs /vmfs/devices/disks/naa.
    ID:1
  • Extending an Existing VMFS Volume
    vmkfstools -Z <span_partition> <head_partition>
    vmkfstools -Z /vmfs/devices/disks/naa.disk_ID_2:1 /vmfs/devices/disks/naa.disk_ID_1:1
    Caution: When you run this option, you lose all data that previously existed on the SCSI device you specified in span_partition.
  • Growing an Existing Extent
    vmkfstools –G device device
    vmkfstools --growfs /vmfs/devices/disks/disk_ID:1 /vmfs/devices/disks/disk_ID:1

Virtual Disk Options

  • Creating a Virtual Disk
    vmkfstools -c <size> -d <diskformat> <vmdkFile>
    vmkfstools -c 2048m testdisk1.vmdk
  • Initializing a Virtual Disk
    vmkfstools -w <vmdkFile>
    This option cleans the virtual disk by writing zeros over all its data. Depending on the size of your virtual disk and the I/O bandwidth to the device hosting the virtual disk, completing this command might take a long time.
    Caution: When you use this command, you lose any existing data on the virtual disk.
  • Inflating a Thin Virtual Disk
    vmkfstools -j <vmdkFile>
    This option converts a thin virtual disk to eagerzeroedthick, preserving all existing data. The option allocates and zeroes out any blocks that are not already allocated.
  • Removing Zeroed Blocks (Converting a virtual disk to a thin disk)
    vmkfstools -K <vmdkFile>
    Use the vmkfstools command to convert any thin, zeroedthick, or eagerzeroedthick virtual disk to a thin disk with zeroed blocks removed.
    This option deallocates all zeroed out blocks and leaves only those blocks that were allocated previously and contain valid data. The resulting virtual disk is in thin format.
  • Converting a Zeroedthick Virtual Disk to an Eagerzeroedthick Disk
    vmkfstools -k <vmdkFile>
    Use the vmkfstools command to convert any zeroedthick virtual disk to an eagerzeroedthick disk. While performing the conversion, this option preserves any data on the virtual disk.
  • Deleting a Virtual Disk
    vmkfstools -U <vmdkFile>
    This option deletes files associated with the virtual disk listed at the specified path on the VMFS volume.
  • Renaming a Virtual Disk
    vmkfstools -E <oldName> <newName>
  • Cloning or Converting a Virtual Disk or Raw Disk
    cloning:
    vmkfstools -i <sourceVmdkFile> <targetVmdkFile>
    vmkfstools -i /vmfs/volumes/templates/gold-master.vmdk /vmfs/volumes/myVMFS/myOS.vmdk
    converting: vmkfstools -i <sourceVmdkFile> -d <diskfomrat> <targetVmdkFile>
  • Extending a Virtual Disk
    vmkfstools -X <newSize> [-d eagerzeroedthick] <vmdkFile>
    use -d eagerzeroedthick to ensure the extended disk in eagerzeroedthick format.
    Caution: do not extend the base disk of a virtual machine that has snapshots associated with it. If you do, you can no longer commit the snapshot or revert the base disk to its original size.
  • Displaying Virtual Disk Geometry
    vmkfstools -g <vmdkFile>
    The output is in the form: Geometry information C/H/S, where C represents the number of cylinders, H represents the number of heads, and S represents the number of sectors.
  • Checking and Repairing Virtual Disks
    vmkfstools -x <vmdkFile>
    Use this option to check or repair a virtual disk in case of an unclean shutdown

Storage Device Options

  • Managing SCSI Reservation of LUNs
    Caution: Using the -L option can interrupt the operations of other servers on a SAN. Use the -L option only when troubleshooting clustering setups.
    • vmkfstools -L reserve <deviceName>
      Reserves the specified LUN. After the reservation, only the server that reserved that LUN can access it. If other servers attempt to access that LUN, a reservation error results
    • vmkfstools -L release <deviceName>
      Releases the reservation on the specified LUN. Other servers can access the LUN again
    • vmkfstools -L lunreset <deviceName>
      Resets the specified LUN by clearing any reservation on the LUN and making the LUN available to all servers again. The reset does not affect any of the other LUNs on the device. If another LUN on the device is reserved, it remains reserved
    • vmkfstools -L targetreset <deviceName>
      Resets the entire target. The reset clears any reservations on all the LUNs associated with that target and makes the LUNs available to all servers again.
    • vmkfstools -L busrest <deviceName>
      Resets all accessible targets on the bus. The reset clears any reservation on all the LUNs accessible through the bus and makes them available to all servers agai
    • When entering the device parameter, use the following format:
      /vmfs/devices/disks/vml.vml_ID:P

Hidden Options (reference: “Some useful vmkfstools ‘hidden’ options”)

  • VMDK Block Mappings
    vmkfstools -t0 <vmdkFile>
    Display the chuck file format in a VMDK file.
    • VMFS -- = eager zeroed thick
    • VMFS Z- = lazy zeroed thick
    • NOMP -- = thin

VSAN v6 Provision Thick Disk

I always think when creating or migrating VM on a VSAN datastore, its disk should be thin provisioned. However, I discovered some VM disks in our VSAN datastore are “thick” provisioned even all the VM storage policies are set to 0% object space reservation. How is it possible? After some digging, here is what I learn.

Thick Disk Format on VSAN

VSAN defines the disk type (thin or thick) via the Object Space Reservation setting in the VM Storage Policies. By default, this value is 0%, implying the disk is deployed as thin.

If the value is set to 100%, meaning the space for the disk is fully reserved, which can be thought of as full, thick provisioned. This behaves similarly to thick provision lazy zeroed. There is no eager-zeroed thick format on VSAN. (reference: Virtual SAN 6.2 Design and Sizing Guide, page 65)

Benefit to Provision Thick Disk on VSAN

Based on my understanding of VSAN disk IO operating (VSAN mirrors write IOs to all active mirrors, there are acknowledged when they hit the flash buffer!), typically there is no performance difference between thin and lazy zeroed thick provision on VSAN. Remember, there is no eager-zeroed thick format on VSAN (see above). Also see the Yellow-Bricks post. (PS: Duncan’s post may misspeak about VSAN eager zero thick provision.)

Provision Thick Disk on VSAN (Intentionally or By Accident)

There are several possible ways to provision a thick disk on VSAN.

  • Possibility #1
    • Define a thick VM Storage Policy
    • Set the Object Space Reservation to 100%
    • Use vSphere Web Client (cannot use vSphere C# Client)
    • Select the thick VM storage policy
  • Possibility #2
    • Use vSphere C# Client
    • Select “Thick Provision Lazy Zeroed” or “Thick Provision Eager Zeroed” on the disk type
    • I don’t know what the actual impact on VSAN when selecting eager zero. In my test, the VM disk is still created correctly. I will do more research and post an update.
  • Possibility #3
    • P2V a physical server to VM
    • By default, P2V uses thick provision on the disk
    • Change to Destination Disk to thin provision by select Advanced, Destination layout, Type, Thin
    • p2v.data.copy.advanced
    • p2v.data.copy.destination.layout
  • For VSAN 5.5, there is one more method, see here.

Change Thick Provisioned Disk to Thin on VSAN

Unfortunately, there is not a simple way to change a thick provisioned disk to thin on VSAN. Simply changing the VM storage policy on the disk has no impact.

In order to convert a thick disk to thin provisioned, do a storage migration of the disk to a SAN / NFS / local storage, then migrate back to the VSAN datastore. Make sure select the thin provision storage policy during the migration.

Brocade FC Switch FOS v7.2.0a WebTools Access in Windows Server 2012 R2 with IE 11

I got some errors (see at the end of the post) when setting up a brand new Brocade Fibre Channel switch running FOS v7.2.0a on a Windows Server 2012 R2 server with IE 11. The following instruction fixed the error.

  • Install Oracle JRE 1.7.0 update 25 Windows x86 version
    • According to its release note, FOS v7.2 is qualified and supported only with Oracle JRE 1.7.0 update 25.
    • Install JRE Windows x86 version (32-bit), instead of Windows x64 version (64-bit) even Windows Server 2012 R2 is a 64-bit OS
  • Launch “Java (32-bit)” in Control Panel
    • Security tab, lower Security Level to Medium
    • java.security
    • (optional) Advanced tab, set “Perform certificate revocation checks on” to “Do not check”. This will speed up the “Verifying application” process if the server does not have the Internet access.
    • java.advanced
  • Launch Internet Explorer
    • Click Tools, “Compatibility View settings” to add the Brocade switch IP address to the compatibility view list
    • ie.compatibility.setting.01
    • ie.compatibility.setting.02
  • Enter the IP address of the Brocade switch in Internet Explorer
    • brocade.fc.webtool.01
    • brocade.fc.webtool.02
    • brocade.fc.webtool.03

The error messages I experienced and possible solutions

  • “The version of Java plugin needed to run the application is not installed. The page from where the plugin can be downloaded will be opened in a new window.”
    • Install the supported JRE version. see FOS release note for the supported JRE version
    • Install the 32-bit version of JRE, instead of 64-bit version
    • Verify Java Plug-In is enabled in IE
    • Add the FC switch IP to IE’s compatibility list
  • “Unable to launch the application” or “Unable to load resource: http://<switch-ip>/loc_res.jar
    • Install the supported JRE version. See FOS release note for the supported JRE version
  • “Application Blocked by Security Settings”
    • Lower the Java Security to Medium in Java 1.7 Update 25. For the newer version of Java, add the FC switch URL to the Java Security Exception Site List.

Create CD/DVD ISO File in Windows 10

Windows 8 or later has the built-in feature to open / mount ISO files. However, creating an ISO file requires other tools. There are many free utilities available.

Today, I tried ISO Recorder v.3.1.3 64-bit. The last update on its web support says it supports Windows Vista and 7. In my test, it works in Windows 10 as well.

Create an ISO file

There is no icon or shortcut to launch ISO Recorder after the installation. To create an ISO file, right-click on the CD / DVD drive, and select “Create image from  CD/DVD”.

iso.recorder.create

Mount an ISO file

Before installing ISO Recorder, Windows 10 will automatically mount the ISO to a virtual CD when double-clicking an ISO file. After installing ISO Recorder, double clicking an ISO file will launch ISO Recorder to write the image to a CD.

To mount the ISO file, right-click on the ISO file, and select “Open with” and “Windows Explorer”.

iso.recorder.mount

PS. I almost forget another free tool - ImgBurn. This was my go-to CD ISO creation and writing tool. The current version is v.2.5.8.0 released on June 16, 2013. This may still work in Windows 10, but I have not tried it yet.

Other free CD tools: MagicISO Virutal CD, WinISO 5.3, and CDBurnerXP.

Do Not Upgrade Dell Server with H730 and FD332-PERC Controller to VSAN 6.2

VMware released VSAN 6.2 on March 15, 2016. However, if your VSAN is running on a Dell server with H730 or FD332-PERC controller, do not upgrade to VSAN 6.2.

See KB2144614 for more information.

My IOPS Calculator

I look for an IOPS calculator for the EMC VNX2 5400 storage that I am working on. There are many available on the Internet. But none of them gives me exactly what I want. More frustrated, different calculators produce different results. So I decide to build one myself. Here is what I get.

My IOPS calculator concept:

 

My IOPS Calculator Download (save the spreadsheet and open in Excel)

Fix “Deprecated VMFS volume(s) found on the host” in vSphere 6.x

An ESXi 6.x host shows an warning message “Deprecated VMFS volume(s) found on the host. Please consider upgrading volume(s) to the latest version.”

vsphere.6.deprecated.vmfs.warning

After verifying all the datastores mounted on the host are VMFS5, I restarted the management agent on the host. That cleared the warning.

This is a known issue on vSphere 6 (KB2109735).

VSAN Free Storage Catches

VSAN is a hot topic nowadays. Once it is set up, it’s easy to management and use. No more creating LUN and zoning.

We recently experienced some catches about its free available storage - at least we didn’t think about or were told before; or maybe our expectation to VSAN was too positive.

Our VSAN hardware disk configuration:

  • 3 x Dell PowerEdge R730 nodes
  • 2 x 400 GB SDD per node (372.61 GB is shown in VSAN Disk Management)
  • 14 x 1 TB SATA per node (931.51 GB is shown in VSAN Disk Management)
  • Two disk groups (7 SATA + 1 SSD) per node

Calculation of each node storage capacity (RAW):

931.51 x 14 = 13,041.14 GB = 12.73549 TB

Total storage capacity (RAW)

931.51 x 14 x 3 = 39,123.42 GB = 38.20646 TB

This calculation matches the storage capacity shown in the VSAN Cluster’s Summary.

vsan.total.storage.capacity

We are adding more VMs to the VSAN. Once the free storage drops below about 12 TB (about one node’s RAW capacity), the VSAN health check starts showing critical alert “Limits Health - After 1 additional host failure” (KB2108743).

vsan.health.alert

And the component resyncing starts more frequently.

vsan.resyncing.components

My take away:

  • I understand there is an overhead for VSAN (or any storage product) to offer the redundancy. But the way VSAN displaying the free storage is quite difference than the traditional SAN storage and it can be confused. The free storage shown in VSAN does not mean you should use it. Otherwise, the VMs may be down when a host is down or taken down for maintenance.
  • The used storage in the Summary tab is the previsioned storage, not the actual space in use.
  • The frequent resyncing component can potentially impact the overall VSAN storage performance.

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “ Use WinSCP to Transfer Files in vCSA 6.5 ”. When I try the same SFTP server setting in vCSA 6.7...