Search This Blog

Showing posts with label vmware. Show all posts
Showing posts with label vmware. Show all posts

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “Use WinSCP to Transfer Files in vCSA 6.5”. When I try the same SFTP server setting in vCSA 6.7, it doesn’t work. But it works when removing “shell” in the setting.

So for vCSA 6.7, the SFTP server setting in WinSCP is “/usr/libexec/sftp-server

Since I upgraded my vCSA 6.5 to 6.7, I cannot test if the setting without “shell” woks in vCSA 6.5 or not. Please test and comment if you have vCSA 6.5 available.

09/15/2021 Update:

Since I discovered the above WinSCP setting in May 2020, I had not transferred files to vCSA using WinSCP until today. But the above setting doesn't work anymore. My vCSA had been updated multiple times for 6.7 patches. 

My current vCSA version 6.7 Update 3n (6.7.0.48000). The WinSCP SFTP server setting needs to be "shell /usr/libexec/sftp-server", like vCSA 6.5.

So if you have need to transfer files to vCSA 6.7 via WinSCP, try one of the following settings

  • shell /usr/libexec/sftp-server
  • /usr/libexec/sftp-server

Installing vSphere Client fails with the error “VMInstallHcmon - Failed to install hcmon driver”

When installing the latest vSphere Client 6.0 on my Windows 10 computer, I got the following error “VMInstallHcmon – Failed to install hcmon driver”

Troubleshooting

  • Try KB2006486. But I don’t see Non-Plug and Play Drivers and VMware hcmon on my Windows 10 computer
  • Try renaming the C:\Windows\System32\drivers\hcmon.sys file. Still get the same error

Solution

  • On my laptop, vSphere Client 5.5 and 6.0 (older build), and their respective Update Manager plug-in are installed
  • Remove these older clients and plug-ins
  • vSphere Client 6.0 installation completes successfully

Use WinSCP to Transfer Files in vCSA 6.5

To use WinSCP to transfer files with vCSA, VMware KB2107727’s solution is temporarily changing the default shell from appliancesh to bash, then changing back after the transfer. This works in vCSA 6.0 and vCSA 6.5.

In vCSA 6.0, there is a tick to change the WinSCP’s Advanced, SFTP server setting to "shell /usr/lib64/ssh/sftp-server" (without the quotes) for file transfer without changing the default shell. See “Connect to vCSA using WinSCP

VMware changed the OS from SLES to Photon in vCSA 6.5. The above setting doesn’t work anymore. There is no ssh directory under /usr/lib64/. However, a quick search and find the sftp-server is moved to /usr/libexec/. Using the setting “shell /usr/libexec/sftp-server” in vCSA 6.5 works fine.

Here is the detail instruction.

  • Log in vCSA VAMI UI (https://vcsa-ip:5480)
  • Under Access, enable SSH Login. (PS: enabling Bash Shell is not necessary)
  • Open WinSCP, select File protocol: SFTP
  • Enter the vCSA hostname, port number 22, root, and root’s password
  • Click Advanced
  • Under Environment, SFTP, Protocol options
  • Set SFTP server to “shell /usr/libexec/sftp-server” (without the quotes)winscp.vcsa.sftp.server.setting

PS: I tested the setting in WinSCP v5.9.6 build 7601 and vCSA v6.5.0.5600 build 4951144.

VCSA 6.5 “The appliance management service is not running” Fix

Scenario

In vSphere Web Client 6.5, under Home, Administration, Deployment/System Configuration, Nodes, the vCenter Server node shows an error message “The appliance management service is not running”. An error message “HTTP response with status code 503, 503 Service Unavailable (Failed to connect to endpoint: _serverNamespace = /vmonapi action =Allow _port = 8900" also appears in the web client.

Troubleshooting

  • Login the VMware Appliance Management UI (https://psc:5480 or https://vc:5480). All the health status are good.
  • SSH to VC appliance. Check service status (KB2109887)
    • # service-control –list
    • # service-control –status
    • applmgmt (VMware Appliance Management Service) is running
    • vmonapi (VMware Service Lifecycle Manager API) is not running

Solution

  • Restart vmonapi service or restart all services
    • # service-control --start vmonapi
    • # service-control –start –all
  • PS: if restarting all the services, it may take some time before all services turn back to Good (green) in the node’s Summary page. e.g. VMware Performance Charts service takes more than 30 minutes to change from Warning, Unknown, and then Good.

VCSA 6.5 Syslog vs vRLI’s vSphere Integration

I write this post after reading William Lam’s “What logs do I get when I enable syslog in VCSA 6.5?” and doing some of my experiment on my VCSA 6.5 and vRLI 4.5 setup.

Background

Recently I completed a fresh VCSA 6.5 (external PSC and VC) deployment with vRealize Operations Manager (vROPS) 6.6 and vRealize Log Insight 4.5 installation. In vROPS, I configured vSphere and vRLI solutions; in vRLI, I configured vSphere and vROPS integration. I thought I completed all the setup until reading William’s blog post.

Confusion

There are a lot of information on his blog post. I was a little lost at the beginning, and I was wondering: should I configure VCSA syslog to vRLI? Is the same as vRLI’s vSphere integration? If I read his blog carefully, I would find the answer there. I didn’t fully understand it until I did my own experiment. Here is the quote. I highlighted a few key points.

I personally think the vSphere Integration is a nice solution if you have both Windows vCenter Server and the VCSA and to be able to get data consistency between the two platforms from a logging standpoint. It is definitely useful if you need to quickly enable all ESXi hosts connected to the vCenter Server and have them remotely syslog to the vRLI instance. If you only have the VCSA, you would get more information by configuring the remote syslog capability in VCSA rather than using the vSphere integration feature of vRLI. This especially true if you need the vpxd.log which is generally required for troubleshooting and debugging vCenter Server issues when calling into VMware Support. The other added benefit to using the VCSA option is that structure log entries are processed directly on the VCSA rather than having to be remotely queried via the vSphere APIs, processed and then store in vRLI which would add additional load onto vRLI, especially if you need to configure additional vCenter Server instances.

Summary

I summarize based on my understanding of this topic here. Please refer his blog for the full details.

  • VCSA 6.5 has a new remote syslog functionality comparing to VCSA 6.0. This function is not available in Windows vCenter Server 6.5
  • VCSA 6.5’s remote syslog configuration is in the VAMI UI (https://[VCSA]:5480). This setting available in both PSC and VC for external deployment. See William’s post’s “Logs forwarded by VCSA Deployment Type” for the logs forwarded in different VCSA deployment type
  • VCSA 6.0’s remote syslog configuration is in the vCenter via vSphere Web Client
  • VCSA 6.5 has a new Enhanced Logging feature (see William’s blog for what the enhanced means; see my screen shots in this post for a better example)
  • After completing vRLI’s vSphere integration, “enable streaming of events to syslog” is enabled (vSphere Web Client, vCenter, Configure, Advanced Settings, vpxd.event.syslog.enabled). This setting is mentioned in another person blog. I am not sure what the default VCSA setting is. Put it here for the reference only
  • VCSA 6.5 remote syslog is not configured even completing vSphere integration in vRLI
  • VCSA 6.5 remote syslog is “pushing” the logs to vRLI
  • vRLI’s vSphere integration is “pulling” the logs from VCSA (via vSphere API). This supports both a Windows vCenter Server and VCSA.
  • vRLI’s vSphere integration can also automatically configure the ESXi hosts connected to the vCenter Server and have them remotely syslog to vRLI. (vSphere Web Client, ESXi host, Configure, System/Advanced System Settings, Syslog.global.logHost)
  • By default, vCenter Server log (vpxd.log) is not forwarded to a remote syslog server. It is recommended enabling it for troubleshooting purposes. (vSphere Web Client, vCenter, Configure, Advanced Settings, config.log.outputToSyslog; then restart vCenter Server service in System Configuration, Services, VMware vCenter Server)
  • Other VCSA 6.5 logs can be forwarded to a remote syslog server. but it’s not supported by VMware. See the link at the end of William’s post for more details
  • This is the most important and useful point I have learned. VCSA 6.5 remote syslog sends more information to vRLI comparing to vRLI’s integration. I think this is what the Enhanced Logging means. See my screen shots below. For example, I modified the Tools Upgrades option on a VM.
    • Without VCSA remote syslog configured, vRLI has one entry in the log. It shows the name of the VM (highlighted in yellow)’s toolsUpgradePolicy is changed from “manual” to “upgradeAtPowerCycle”vRLI.log.without.VCSA.remote.log.enabled
    • With VCSA remote syslog configured, vRLI has two entries in the log. In additional to the regular log, the second entry shows the name of the user made the change (highlighted in the red box).vRLI.log.with.VCSA.remote.log.enabled
  • My recommendation is to configure both vRLI’s vSphere integration (for automate configuring the ESXi log host) and VCSA remote syslog (for the enhanced logging). This would duplicate some log entries in vRLI and consume more vRLI log storage. But it is well worthy!

vSphere 6.5 New Feature – VMware Orchestrated Restart

Let me back to the old ESXi 3 day – when I was just using the standalone ESXi hosts or vCenter without HA and DRS. In case of the power outage or air conditioning failure in the data center, all the ESXi hosts were powered down. Once the environment problem was resolved, I could manage the VM startup sequence by configuring the switched PDU to start the hosts accordingly, and configuring the VM startup order at the host level.

However, once I deployed vCenter Server with HA and DRS, I lost the control of the VM startup order. Because the VMs could be hosted at any host in the cluster. Someone said that I should not worry about the VM startup order in the cluster. Because the ESXi cluster would never go down if I had designed the infrastructure with enough redundancy. As we all know, we never have enough redundancy in a small ESXi deployment.

I have been curious why VMware do not “fix” this issue for so long. Until now, vSphere 6.5 introduces the VMware Orchestrated Restart feature. At the high level, the Orchestrated Restart, likes the VM affinity and anti-affinity rules, put the VMs in different VM groups and set the startup dependence among the VM groups. To learn more about this, please go to “What is VMware Orchestrated Restart?”.

I am so glad to know about this new vSphere 6.5 feature – one more reason to upgrading to vSphere 6.5.

vSAN Performance Service “Hosts Not Contributing Stats” Fix

I have a four-host vSAN cluster running vSAN 6.2. Recently the vSAN health’s Performance service check shows two of the hosts not contributing stats.

vsan.host.not.contrubting.stats.01

The following are all the steps that I tried during troubleshooting and ultimately fixing the issue in my environment. Some of the steps do not fix my issue, however they may be applicable to your situation. PS. I opened a VMware support case on this issue. The support engineer did not directly solve my issue. However, he did give the hint on the cause of the issue that led me to discover the solution.

  1. Turn off and turn on the Performance Services in vSphere web client, vSAN cluster, Manage, Settings, Health and Performance.
  2. Turn off the Performance Services, restart the vSAN management agent “/etc/init.d/vsanmgmtd restart”, then restart the service.
  3. Place the vSAN host in the maintenance mode and restart the host.
  4. SSH to the vCenter server appliance, restart the vmware-vpxd service “service vmware-vpxd restart”.
  5. Verify the vSAN storage provider status of each vSAN host is online in vSphere web client, vCenter server, Manage, Storage Providers. If the host’s vSAN provider is offline, unregister the host’s storage provider and synchronize all vSAN storage providers. This brings the host’s vSAN storage provider back online.
    Caution: doing this can cause the VMs on the host to failover to other hosts in the cluster.
    vsan.host.not.contrubting.stats.02
  6. (I think this is to begin to lead me to the ultimate fix) Check the certificate info of each vSAN host in Storage Provider. They should be issued by the same Platform Service Controller (my vCenter is the vCSA wit the external PSC, instead of the embedded PSC). In my case, the certificate of the two “problem” vSAN hosts is issued by the VC host; the certificate of the “good” vSAN hosts is issued by the PSC host. I don’t know what the cause of these hosts having different certificate issuers, since I don’t have the history of how these PSC and VC were deployed.
    vsan.host.not.contrubting.stats.03
    vsan.host.not.contrubting.stats.04
  7. To further confirm the ESXi host certificate is the problem
    1. Login vCenter server as “administrator@vsphere.local’
    2. Home, Administration, Deployment, System Configuration, Nodes, PSC node, Manage, Certificate Authority (if selecting VC node, there is no Certificate Authority tab under Manage)
    3. Enter the password of “administrator@vsphere.local” again
    4. Active Certificate, all the ESXi hosts are listed, except the two “problem” vSAN hosts
    5. It makes sense why the certificates of the two “problem” vSAN hosts are missing here, because they are issued by the VC host, not the PSC host. But it does not make sense how they received the “problem” certificate since there is no Certificate Authority on the VC host.
      vsan.host.not.contrubting.stats.05
  8. Once the cause is identified, the fix is to re-issue the certificate to the two “problem” vSAN hosts.
  9. In vSphere web client, the “problem” vSAN host, Manage, Settings, Certificate
    1. Here is also showed the host certificate issuing by the wrong host (the VC host)
    2. Click Renew to request a new certificate
    3. Caution: Once clicking the Renew button, the host HA agent was restarted. Some VMs on the host failed over to the remaining hosts, even the VMs seem no downtime.
      Before renewing the certificate
      vsan.host.not.contrubting.stats.06
      After renewing the certificate
      vsan.host.not.contrubting.stats.07
  10. Once the host certificates are re-issued by the PSC, the vSAN Performance service status is showed “Passed”
    vsan.host.not.contrubting.stats.08

Conclusion

  • The cause of the vSAN Performance service “Host Not Contributing Stats” in my case is the “problem” vSAN host having the wrong host certificate.
  • I don’t know how these “problem” hosts received the wrong host certificate.
  • When the vCSA with the external PSC, the host certificate is issued by the PSC host.
  • Re-issuing or renewing the host certificate will restart the host HA agent. It can cause the VMs on the host migrating to other hosts.

vCenter Server 6.5 Native High Availability Feature Summary

  • Available exclusively for vCenter Server Appliance (vCSA)
  • Consist of three nodes – active, passive, and witness nodes
    • Passive and Witness nodes are cloned from the existing vCSA (active node)
  • vCenter HA cluster can be enabled, disabled, or destroyed at any time
  • There is a maintenance mode to prevent planned maintenance from causing an unwanted failover
  • Use two types of replication between active and passive nodes
    • Native PostgreSQL synchronous replication for the vCenter Server database
    • A separated asynchronous file system replication for key data outside the database
  • Two vCenter HA deployment workflows
    • Basic: all vCenter HA nodes are deployed within the same cluster
    • Advanced: the active, passive, and witness nodes are deployed to different clusters
  • There is little benefit to using vCenter HA without also providing high availability at the Platform Service Controller layer
    • An external Platform Services Controller instance is required when there are multiple vCenter Server instances in an Enhanced Linked Mode configuration.
  • Failover can occur when a host failure, or when certain key services fail
  • For the initial release of vCenter HA, a recovery time objective (RTO) is about 5 minutes

I have already known about some of these information when testing vCenter HA in my lab. I highlighted the ones I learned from this white paper.

Source: “What’s New in VMware vSphere”" 6.5” technical white paper

Configuring VCSA 6.5 Backup Lessons Learned

vCenter Server Appliance (vCSA) 6.5 comes with the built in backup functionality. Starting a backup is quite easy - login the vCSA web console and click Backup button on the Summary page (see this post for the step-by-step screen shots).
Even it looks a very simple task, I have learned a few lessons when configuring the vCSA backup.
Lesson #1: vCSA backup location is <host_name>/<folder_name>
If using FTP protocol, the backup location is not just the FTP server host name or IP address; it MUST include the folder name. There is a “/” between the host name and folder name.
Otherwise, the error message is “FTP location is invalid”.
vCSA.Backup.FTP.Location.Is.Invalid
Lesson #2: vCSA backup supports the FTP virtual host name if entering the username correctly - <ftp virtual hostname>|<ftp username>
See my Lesson #2 in “Setting Up IIS 8 FTP Server Lessons Learned” about the FTP virtual host name login. There is a “|” between the hostname and username.
Otherwise, the error message is “Access to the remote server is denied. Check your credentials and permissions”.
vCSA.Backup.Access.to.The.Remote.Server.Is.Denied
Lesson #3: Use curl to troubleshoot vCSA backup error
After entering the correct settings, vCSA backup wizard validates the settings and starts the backup. The backup fails with “BackupManager encountered an exception. Please check logs for details”, but it does not provide much details or the location of the log file.
vCSA.Backup.BackupManager.Encountered.An.Exception
After some digging, I found the backup log file in /var/log/vmware/applmgmt/backup.log. In the log file, there is a curl error “Connection time-out”.
vCSA.Backup.Backup.log
This gives me a hint that vCSA backup uses curl to transfer the backup file from vCSA to the FTP location. Recently I am also learning curl to transfer file, so I’m a little familiar with curl. (I will publish what I learn from curl in a future post).
From vCSA console, enter “curl -u <ftp user>:<password> -l <ftp server>”. It should list the file and directory on the FTP server. But I got the timeout error. I also tried running curl on a Windows computer, and got the timeout error too. This leads to me think the problem is on the FTP server. Finally the fix is to restart the FTP service (see Lesson #1 on “Setting Up IIS 8 FTP Server Lessons Learned”).
I am not sure why the wizard was able to successfully validate the FTP server setting when the FTP server connection is blocked by the Windows Firewall. When troubleshooting the Windows Firewall, I thought I could use the FTP command to connect to the FTP site, but using curl would fail. I’m not 100% sure about this, since I can’t replicate the issue again. After restarting the Microsoft FTP service, everything is working okay.
Anyway, using curl is the best tool to troubleshoot the vCSA backup failure.
Lesson #4: vCSA backup location must be an empty folder
After successfully running a backup, I try running the backup one more time with the same setting. I got the following error. (PS. In the screenshot below, I removed the virtual hostname on the FTP site, so I can just use the username).
vCSA.Backup.Location.Folder.Is.Not.Empty

VUM 6.5 “Cannot download patch definitions” via UMDS 6.5 Work Around

My nested vSphere lab environment does not have the access to the Internet (there is no physical network adapter as the uplink on the lab port group). To update and patch the ESXi host using vSphere Update Manager (VUM), I installed Update Manager Download Service (UMDS) on a Windows Server VM with dual NICs - one for Internet, another for the lab port group. Use the UMDS to download the update, configure IIS as the web server for the update repository, and configure VUM to use the http share repository. It worked fine in vSphere 6.0 and 6.2.

Recently I upgraded the lab environment to vSphere 6.5. The vCenter Server Appliance and ESXi hosts are upgraded to 6.5 successfully.

vCenter Server Appliance 6.5 bundles with VUM. VUM no longer requires a Windows Server. For UMDS, it can be installed on a Windows or Linux server. Since I already have a Windows server for UMDS. I continue using it instead of setting up a Linux server. But UMDS can’t be upgraded from the previous version to 6.5.

I uninstalled UMDS 6.0 and SQL Server 2012 Express on the server, and installed UMDS 6.5 with SQL Server 2012 Express from the ISO. I used the UMDS 6.0 repository folder for UMDS 6.5 and configured UMDS to download the host update only. Since IIS is already set up and I used the same repository folder, no change is needed in IIS. UMDS 6.5 successfully downloaded the update files from VMware.

I configured the VUM 6.5 to use the IIS shared repository. VUM validated the URL successfully. However, when I clicked “Download Now” button in VUM, I got the “Cannot download patch definitions” error.

Troubleshooting

umds.missing.6.5.metadata

Solution

  • A similar issue happened in vSphere 5.5 (KB2061622)
  • I can’t find a patch available for ESXi 6.5 yet, but I can apply the similar work around by removing <metadata> reference for vmw-ESXi-6.5.0-metadata.zip in
    hostupdate\vmw\__hostupdate20-consolidated-metadata-index__.xml
  • Now the download patch definition is successful when I click “Download Now” button in VUM
  • However, running UMDS again will re-add the vmw-ESXi-6.5.0-metadata.zip reference, and VUM will fail to download the patch definition
  • This may be a bug in vSphere 6.5 (like vSphere 5.5). Hope VMware can fix in the next update

Stop A Task Stuck in vCenter Server Appliance

  1. find out the name of vSphere host running the stuck task if possible
  2. SSH to the vCenter Server server appliance
  3. service vmware-vpxd restart
    • After restart the vmware-vpxd service on the vCSA, the stuck task should disappear from the vSphere web client
    • However, the task may be still running on the vSphere host
    • Use the following steps to stop the stuck task on the vSphere host
  4. SSH to the vSphere host running the task
  5. /etc/init.d/hostd restart
  6. /etc/init.d/vpxa restart

VSAN 6.2 On-disk Format Upgrade Fails at 5%

I am working on upgrading our VSAN from 6.1 to 6.2. See this from the upgrade step overview.

After upgrading each VSAN host to ESXi 6.0U2 (the latest build 4510822 as of 11/01/2016), the last step is to upgrade the on-disk format from v2 to v3.

In our case, the on-disk format upgrade fails at 5% with the error message “General Virtual SAN error. Disk Format conversion failed due to unexpected error”.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.01

However, check the disk format in VSAN cluster, Manage, Settings, Virtual SAN / Disk Management. A disk group is upgraded to the interim version 2.5 each time I run the on-disk format upgrade. In the screen shots below, I ran the on-disk format upgrade twice. Two of the disk groups are upgraded to v2.5.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.02

I keep running the on-disk format upgrade. In our VSAN, we have 4 hosts with 2 disk groups on each node. The on-disk format failed six times. On the seventh time, all disk groups are upgraded to v2.5.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.03

Then the upgrade moves forward to the next process - starting remove disks from one of the VSAN host.

vsan.6.2.on-disk.format.upgrade.fail.at.5%.04

I have not figured out the cause of the failure. Re-running the upgrade process until all the disk groups are upgraded to the format v2.5 is able to keep the process moving forward.

VMware Tools Stuck in “Upgrade in progress” Fix

I notice the VMware Tools status on some VMs (mostly Linux) is “Upgrade in progress”.

vmware.tools.upgrade.in.progress.01

For these VMs, I cannot vMotion them to another host; and some of these VMs, the “Edit Settings” and “Edit Resource Settings” are grayed out.

Solution:

  • Find the ESXi host running the VM
  • Use vSphere C# Client to connect to the ESXi host directly; Do not connect to the vCenter Server.
  • Locate the VM in the vSphere C# Client
  • Right-click on the VM, Guest, End VMware Tools Install

vmware.tools.upgrade.in.progress.02

  • VMware Tools status changes back to running.

vmware.tools.upgrade.in.progress.03

Then I can vMotion the VM or run the VMware Tools installation again.

vCSA “Syslog endpoint servername:514 is unreachable” Error

I am configuring the vCSA syslog to a third-party syslog server (e.g. a Splunk forwarder) via UDP port 514 (see the instruction in http://www.virtuallyghetto.com/2015/03/a-preview-of-native-syslog-support-in-vcsa-6-0.html). The syslog server receives the log from the vCSA. However, the VMware Syslog Service Health Messages reports a “Syslog endpoint servername:514 is unreachable” critical error.

It turns out the vCSA syslog uses the TCP port 514 for the syslog server health check. Since my syslog server (like many normal syslog servers) only licenses on the UPD port 514, the vCSA health check reports the syslog sever is not reachable.

Solution

  • Find a TCP port that the syslog server is licensing. Any licensing TCP port should work, it does not have to relate to the syslog.
  • SSH to vCSA
  • cd /etc/vmware-syslog
  • vi vmware-syslog-health.properties
  • Change the “cls.strata.ping.port” setting to the TCP port licensing on the syslog server (the default is 514)
  • Save the setting
  • Restart the VMware Syslog Service
  • Check the VMware Syslog Service Health Messages, it should show “Syslog endpoint <servername>:<tcp port> reachable”

One More Reason Not to Disable IPv6

Almost every existing operating system supports IPv6 and enables it by default nowadays. Some system admins still like to disable IPv6, because they think they would not deploy IPv6 in the near future. However, disabling IPv6 can be against the software vendor recommendation or experience the unexpected bug.

For example, Microsoft do not recommend disabling IPv6 in Windows. See “IPv6 for Microsoft Windows FAQ” and “How to disable IPv6 or its components in Windows”.

Recently, VMware ESXi 6.0.x has a known issue when IPv6 is disabled. See “Provisioning the TCP/IP stack does not work when IPv6 support is disabled on the host (2146023)

To avoid the unexpected issue, we should  leave IPv6 enabled (the default).

Connect to vCSA using WinSCP

The default shell of the vCSA is the Appliance Shell (/bin/appliancesh), which doesn’t work with WinSCP.

There are two solutions to work around this issue:

  1. Change the default shell of the root account to the Bash shell (/bin/bash)
  2. Configure WinSCP to use the SFTP protocol (yes, SFTP; not SCP) with the shell setting “shell /usr/lib64/ssh/sftp-server”

PS. Both of these solutions require enabling SSH login and Bash shell on the appliance.

  • For the solution #1, here are the commands (see VMware KB2107727 for the full instruction).
    • SSH to vCSA
    • shell.set --enable True
    • shell
    • chsh -s /bin/bash root
    • change back: chsh -s /bin/appliancesh root
  • For the solution #2 (credit to http://www.v-front.de/2015/03/vcsa-60-tricks-shell-access-password.html), the previous site does not provide the details on configuring WinSCP. I had an issue when setting up the first time, and someone also commented the shell trick does not work anymore. So I document the step-by-step instruction below.
  • Personally, I prefer the solution #2. Since I don’t need to mess-up the default shell of the root account.

The following instruction tested on vCSA 6.0 update 2 (6.0.0.20000) with WinSCP version 5.9 (build 6786).

  • Login vCSA web console (https://<vcsa-server>:5480)
  • Under Access, click Edit, select the checkboxes for “Enable ssh login” and “Enable bash shell”
    • vcsa.web.console.01
  • Change the Timeout value if necessary
    • vcsa.web.console.02
  • Create a new site in WinSCP
  • Select “SFTP” under File protocol, type the vCSA host name, root and its password.
    • winscp.config.01
  • Click the Advanced dropdown to edit the Advanced Site Settings
    • winscp.config.02
  • Under Environment, SFTP, SFTP server, enter “shell /usr/lib64/ssh/sftp-server” (without quotes)
    • winscp.config.03
  • Click OK and Save the setting, and click Login

How to Convert Eager Zeroed Thick Disk to Lazy Zeroed Think Disk

According to the VMware KB2145183, an Eager Zeroed Thick disk cannot be directly converted / migrated / cloned to Lazy Zeroed Think disk. The work around is:

  1. Convert the Eager Zeroed Think disk to Thin
  2. Convert the Thin disk to Lazy Zeroed

The vmkfstools can be used for the conversion (see my previous post). Here is the commands:

  1. vmkfstools -K <vmdkFile>
  2. vmkfstools -j <vmdkFile>

Error “Idm client exception: Error trying to join AD, error code [11]” when joining a VCSA to AD domain

On a newly created VCSA appliance, I got the following error when joining it to the Active Directory domain

vcsa.joining.ad.error

I used the domain’s netbios name\user-name as the user name.

Fix: use the User Principal Name (UPN), user-name@fqdn-domain-name, as the user name. After joining the domain, reboot the VMware Platform Service Controller (PSC).

Troubleshoot vMotion Error 195887167

Updated on 07/13/2016. See the update this post, I might find the ultimate solution, even I am still not sure what the cause of the issue.

Recently I had an issue to vMotion some VMs between the vSphere v 6.x cluster hosts. Long story short, here are the symptoms:

  • I consistently got the error “Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout” when vMotioning (host only, no storage vMotion) on some VMs, particularly the vCenter Server Appliance VM. I had two vCSA VMs. Both were have the same issue. But vCSA is not the only VM that I got the error.
  • I successfully vMotion some VMs between the hosts. The vMotion network configuration should be okay.
  • In other words, some VMs are okay; some are not. The size (CPU, RAM, storage) of the VMs does not seem the problem. The successfully vMotioned VMs can have more/less CPU, RAM, storage than the failed VMs.
  • The failed VM has more disks than other VMs. The vCSA VM created 11 disks by default.
  • When the VMs were powered off, vMotion successfully.
  • Restarted the hosts and restarted the VMs. No difference.
  • Verified no IP address conflicts.
  • Tried one VMkernel adapter for both management and vMotion or a dedicated VMkernel adapter for vMotion. No difference.
  • Tested vmkping successfully between the hosts.
  • In the vmkernel.log of the hosts, the error is “2016-05-18T22:47:34.959Z cpu15:39089)WARNING: Migrate: 270: 1463611379229538 D: Failed: Connection closed by remote host, possibly due to timeout (0xbad003f) @0x418000e149ee” on the destination host or “2016-05-19T18:29:23.930Z cpu1:130133)WARNING: Migrate: 270: 1463682486286991 S: Failed: Migration determined a failure by the VMX (0xbad0092) @0x41803a7f6993” on the source host.

Possible solutions

  • Remove the snapshot on the VM if it has one
    • After removing the snap shot on one of vCSA VMs, vMotion worked fine. But another vCSA had no snapshot, it still failed.
  • Try using the vSphere Client instead of the vSphere Web Client. This worked on some VMs, but not always.
  • Assign VM’s network adapter to different port group; and change back to its original port group
    • This seems the ultimate fix. After doing this, the vCSA VMs, which failed vMotion consistently, are vMotioned successfully.

Conclusion

07/13/2016 update

  • I run into the same error when migrating some VMs, including the vCenter 6.x appliance (vCSA), between hosts in the Vmware cluster. I am able to migrate other VMs. This leads me to believe the problem on the VM, instead of the VM infrastructure.
  • Within the VMs having this error, some of them, which I can power off, are migrated successfully. However, I can not power down the vCSA VM. Because I cannot perform the migration without the vCenter available.
  • I try assigning the NIC of the vCSA VM to different port group, and change back. However, I cannot do that, because this VM is the vCenter server and is configured with a distributed switch. If I change the vCSA to different port group (configured with a different VLAN), the vCenter server will be down (because its NIC is assigned to the wrong port group with the wrong VLAN); and I cannot change it back to the original port group with the right VLAN.
  • I try connecting directly to the ESXi host of the vCSA VM via the vSphere C# client, then assign the NIC of the vCSA VM to different port group. However, I cannot do that either. Because the host is only configured with the distributed switches, there is no other port group in the selection at this situation.
    • vmotion.error.01
  • I try create a new port group with the ephemeral port binding on the distributed switch. This new ephemeral port group is available in the vSphere C# client when connecting to the host directly. Then I assign the VM to the ephemeral port group and change back. However, the migration still fails with the same error.
  • Since I could fix this issue last time by changing the port group and changing back, I guess that somehow reset the NIC on the VM or the virtual switch port to which the VM is connected. That gives me an idea to manually assign the VM to another virtual switch port.
    • vmotion.error.02

Solutions

  • In the screenshot above, the NIC of the vCSA is assigned to port 214 on the vSwitch.
  • Log back in the vCenter via the vSphere C# client or Web client (I cannot see the ports on the distributed switch, nor change the port assigned to the VM’s NIC when connecting to the ESXi host via the vSphere C# client)
  • I find an used port on the same port group (e.g. port 423 in my case).
    • vmotion.error.03
  • Edit the vCSA VM setting and assign its NIC to the unused port.
  • Then I can successfully migrate the vCSA VM to another host.

Conclusion

  • Changing the NIC to an used port will be my first attempt when this issue happens again (I bet it will happen).
  • I still don’t know the cause of the this issue.

10/15/2019 update

  • I got the exact error again when Storage vMotion a VM across two vCenter (from vCSA 6.0 to vCSA 6.5). Deleted the VM snapshot, reran the vMotion, and completed successfully.

Extend Microsoft Cluster Shared Disk in VMware

A VM shared disk on Microsoft Cluster Service (MSCS) is running out of disk space. The VMs are on a single host (aka cluster in a box - CIB). I can think of two ways to expand the disk storage.

  • create a new big shared disk for the cluster, migrate the data, then change the new disk to the same drive letter as the original disk
  • extend the size of the existing shared disk

Obviously the latter seems simpler, but it requires special attention. The shared disk format in MSCS VMs must be in eager zeroed thick format. However, when extending an eagerzeroedthick VMDK, the extended chuck is in lazy zeroed thick format by default (reference “Extending an EagerZeroedThick Disk”. In my test, vSphere 6 has the same behavior)

Here is how I extend the MSCS shared disk

  • Power off both servers in the cluster
  • Increase the VMDK disk size. There are two ways:
    • GUI: edit the VM settings, increase the shared disk size
    • CLI: use vmkfstools -X <newsize> -d eagerzeroedthick <vmdkfile>
  • Using the GUI, the extended chuck will be in lazy zero thick format. The VM will fail to power on with the error “VMware ESX cannot open the virtual disk for clustering…”

cluster.vm.power.on.error

  • There are two ways to convert the extended chuck to eagerzeroedthick format
    • Migrate the VM to another storage, and specify the eager zero thick format for the disk
    • Use vmkfstools -k <vmdkfile>
      vmkfstools.convert.eagerzeroedthick
  • Once the entire shared disk is the eager zeroed thick format, the VM will be able to power on.
  • Extend the Windows partition as KB304736

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “ Use WinSCP to Transfer Files in vCSA 6.5 ”. When I try the same SFTP server setting in vCSA 6.7...