Search This Blog

VSAN 6.2 On-disk Format Upgrade Fails at 5%

I am working on upgrading our VSAN from 6.1 to 6.2. See this from the upgrade step overview.

After upgrading each VSAN host to ESXi 6.0U2 (the latest build 4510822 as of 11/01/2016), the last step is to upgrade the on-disk format from v2 to v3.

In our case, the on-disk format upgrade fails at 5% with the error message “General Virtual SAN error. Disk Format conversion failed due to unexpected error”.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.01

However, check the disk format in VSAN cluster, Manage, Settings, Virtual SAN / Disk Management. A disk group is upgraded to the interim version 2.5 each time I run the on-disk format upgrade. In the screen shots below, I ran the on-disk format upgrade twice. Two of the disk groups are upgraded to v2.5.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.02

I keep running the on-disk format upgrade. In our VSAN, we have 4 hosts with 2 disk groups on each node. The on-disk format failed six times. On the seventh time, all disk groups are upgraded to v2.5.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.03

Then the upgrade moves forward to the next process - starting remove disks from one of the VSAN host.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.04

I have not figured out the cause of the failure. Re-running the upgrade process until all the disk groups are upgraded to the format v2.5 is able to keep the process moving forward.

VMware Tools Stuck in “Upgrade in progress” Fix

I notice the VMware Tools status on some VMs (mostly Linux) is “Upgrade in progress”.

vmware.tools.upgrade.in.progress.01

For these VMs, I cannot vMotion them to another host; and some of these VMs, the “Edit Settings” and “Edit Resource Settings” are grayed out.

Solution:

  • Find the ESXi host running the VM
  • Use vSphere C# Client to connect to the ESXi host directly; Do not connect to the vCenter Server.
  • Locate the VM in the vSphere C# Client
  • Right-click on the VM, Guest, End VMware Tools Install

vmware.tools.upgrade.in.progress.02

  • VMware Tools status changes back to running.

vmware.tools.upgrade.in.progress.03

Then I can vMotion the VM or run the VMware Tools installation again.

Fix App Stuck on Updating in IOS

The Google Keep app on my iPhone 5s with IOS 9.3.3 is stuck on updating. I could neither launch the app nor delete it normally by holding down the icon (the X icon showed up, but nothing happened when clicking it). I tried the following fixes on the web, but no luck.

  • cancel the update and retry updating
  • reboot the phone and delete the app or retry updating
  • log out the App store on the phone (Settings, iTunes & App Store, Apple ID, Sign Out) and log in and retry updating

Some other suggestions are to reset the phone or connect the iPhone to a computer with iTune and delete the stuck app from iTune. I don’t want to either of these.

Luckily I found the following solution to delete the stuck app and re-download it from the App store.

  • Navigate to Settings, General, Storage & iCloud Usage, Manage Storage
  • Find the stuck app and click on it
  • Click Delete App
  • Then go back to App Store and re-download it under Updates, Purchased

vCSA “Syslog endpoint servername:514 is unreachable” Error

I am configuring the vCSA syslog to a third-party syslog server (e.g. a Splunk forwarder) via UDP port 514 (see the instruction in http://www.virtuallyghetto.com/2015/03/a-preview-of-native-syslog-support-in-vcsa-6-0.html). The syslog server receives the log from the vCSA. However, the VMware Syslog Service Health Messages reports a “Syslog endpoint servername:514 is unreachable” critical error.

It turns out the vCSA syslog uses the TCP port 514 for the syslog server health check. Since my syslog server (like many normal syslog servers) only licenses on the UPD port 514, the vCSA health check reports the syslog sever is not reachable.

Solution

  • Find a TCP port that the syslog server is licensing. Any licensing TCP port should work, it does not have to relate to the syslog.
  • SSH to vCSA
  • cd /etc/vmware-syslog
  • vi vmware-syslog-health.properties
  • Change the “cls.strata.ping.port” setting to the TCP port licensing on the syslog server (the default is 514)
  • Save the setting
  • Restart the VMware Syslog Service
  • Check the VMware Syslog Service Health Messages, it should show “Syslog endpoint <servername>:<tcp port> reachable”

One More Reason Not to Disable IPv6

Almost every existing operating system supports IPv6 and enables it by default nowadays. Some system admins still like to disable IPv6, because they think they would not deploy IPv6 in the near future. However, disabling IPv6 can be against the software vendor recommendation or experience the unexpected bug.

For example, Microsoft do not recommend disabling IPv6 in Windows. See “IPv6 for Microsoft Windows FAQ” and “How to disable IPv6 or its components in Windows”.

Recently, VMware ESXi 6.0.x has a known issue when IPv6 is disabled. See “Provisioning the TCP/IP stack does not work when IPv6 support is disabled on the host (2146023)

To avoid the unexpected issue, we should  leave IPv6 enabled (the default).

Connect to vCSA using WinSCP

The default shell of the vCSA is the Appliance Shell (/bin/appliancesh), which doesn’t work with WinSCP.

There are two solutions to work around this issue:

  1. Change the default shell of the root account to the Bash shell (/bin/bash)
  2. Configure WinSCP to use the SFTP protocol (yes, SFTP; not SCP) with the shell setting “shell /usr/lib64/ssh/sftp-server”

PS. Both of these solutions require enabling SSH login and Bash shell on the appliance.

  • For the solution #1, here are the commands (see VMware KB2107727 for the full instruction).
    • SSH to vCSA
    • shell.set --enable True
    • shell
    • chsh -s /bin/bash root
    • change back: chsh -s /bin/appliancesh root
  • For the solution #2 (credit to http://www.v-front.de/2015/03/vcsa-60-tricks-shell-access-password.html), the previous site does not provide the details on configuring WinSCP. I had an issue when setting up the first time, and someone also commented the shell trick does not work anymore. So I document the step-by-step instruction below.
  • Personally, I prefer the solution #2. Since I don’t need to mess-up the default shell of the root account.

The following instruction tested on vCSA 6.0 update 2 (6.0.0.20000) with WinSCP version 5.9 (build 6786).

  • Login vCSA web console (https://<vcsa-server>:5480)
  • Under Access, click Edit, select the checkboxes for “Enable ssh login” and “Enable bash shell”
    • vcsa.web.console.01
  • Change the Timeout value if necessary
    • vcsa.web.console.02
  • Create a new site in WinSCP
  • Select “SFTP” under File protocol, type the vCSA host name, root and its password.
    • winscp.config.01
  • Click the Advanced dropdown to edit the Advanced Site Settings
    • winscp.config.02
  • Under Environment, SFTP, SFTP server, enter “shell /usr/lib64/ssh/sftp-server” (without quotes)
    • winscp.config.03
  • Click OK and Save the setting, and click Login

Event ID 36886 “No suitable default server credential exists on this system” Fix

Recently, we created a new child domain in the existing AD forest with two new Windows Server 2012 R2 domain controllers. The AD authentication and AD replication between DCs are working fine.

Today, we are trying to set up a third party app (Splunk) with the secure LDAP authentication to the child domain AD. The child domain DC servers are hardened to require signing on the LDAP server signing requirements policy.

However, we get an error “the connection reset by the peer” in the third party app’s LDAP connection test. On the DC server, there is a warning in System event - Event ID 36886 “No suitable default server credential exists on this system. …”

ldaps.authentication.error.01

Troubleshooting

  • According to MS KB321051, “The LDAPS certificate is located in the Local Computer’s Personal certificate store.”
    • Open the Certificates MMC for the Local Computer on the child domain controller. There is a server cert for this domain controller.
    • However, this cert is not fully trusted because the root CA cert is not trusted, which it is caused by not in the Trusted Root Certification Authorities store.
    • ldaps.authentication.error.03
    • I guess this server cert is created / issued when promoting the server to DC, and the cert is issued by the internal Windows Server 2008 enterprise intermediate CAs. (There is a Windows Server 2008 enterprise root CA and intermediate CA in the AD forest.)
    • I find the root CA cert is in the Intermediate Certification Authorities store. I’m not sure why. (The enterprise root CA and intermediate CA are set up by someone else.) This is the cause of the issue.
    • ldaps.authentication.error.04

Solution

  • There are two ways to put the root CA cert back to the trusted root CA store.
    • I can copy and paste the root CA cert from the intermediate cert store to the trusted root CA store.
    • I download the CA certificate chain, open the root CA cert, and install it on the child DC server. Make sure specify the store location is the Local Machine and the Trusted Root Certification Authorities. The default automatic selection will place the root CA cert in the intermediate CA store again.
  • Once the root CA cert is in the right store location, the child DC’s cert shows trusted. The LDAPS connection test in the third party app is successful.
  • ldaps.authentication.error.05

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “ Use WinSCP to Transfer Files in vCSA 6.5 ”. When I try the same SFTP server setting in vCSA 6.7...