Search This Blog

vCSA “Syslog endpoint servername:514 is unreachable” Error

I am configuring the vCSA syslog to a third-party syslog server (e.g. a Splunk forwarder) via UDP port 514 (see the instruction in http://www.virtuallyghetto.com/2015/03/a-preview-of-native-syslog-support-in-vcsa-6-0.html). The syslog server receives the log from the vCSA. However, the VMware Syslog Service Health Messages reports a “Syslog endpoint servername:514 is unreachable” critical error.

It turns out the vCSA syslog uses the TCP port 514 for the syslog server health check. Since my syslog server (like many normal syslog servers) only licenses on the UPD port 514, the vCSA health check reports the syslog sever is not reachable.

Solution

  • Find a TCP port that the syslog server is licensing. Any licensing TCP port should work, it does not have to relate to the syslog.
  • SSH to vCSA
  • cd /etc/vmware-syslog
  • vi vmware-syslog-health.properties
  • Change the “cls.strata.ping.port” setting to the TCP port licensing on the syslog server (the default is 514)
  • Save the setting
  • Restart the VMware Syslog Service
  • Check the VMware Syslog Service Health Messages, it should show “Syslog endpoint <servername>:<tcp port> reachable”

One More Reason Not to Disable IPv6

Almost every existing operating system supports IPv6 and enables it by default nowadays. Some system admins still like to disable IPv6, because they think they would not deploy IPv6 in the near future. However, disabling IPv6 can be against the software vendor recommendation or experience the unexpected bug.

For example, Microsoft do not recommend disabling IPv6 in Windows. See “IPv6 for Microsoft Windows FAQ” and “How to disable IPv6 or its components in Windows”.

Recently, VMware ESXi 6.0.x has a known issue when IPv6 is disabled. See “Provisioning the TCP/IP stack does not work when IPv6 support is disabled on the host (2146023)

To avoid the unexpected issue, we should  leave IPv6 enabled (the default).

Connect to vCSA using WinSCP

The default shell of the vCSA is the Appliance Shell (/bin/appliancesh), which doesn’t work with WinSCP.

There are two solutions to work around this issue:

  1. Change the default shell of the root account to the Bash shell (/bin/bash)
  2. Configure WinSCP to use the SFTP protocol (yes, SFTP; not SCP) with the shell setting “shell /usr/lib64/ssh/sftp-server”

PS. Both of these solutions require enabling SSH login and Bash shell on the appliance.

  • For the solution #1, here are the commands (see VMware KB2107727 for the full instruction).
    • SSH to vCSA
    • shell.set --enable True
    • shell
    • chsh -s /bin/bash root
    • change back: chsh -s /bin/appliancesh root
  • For the solution #2 (credit to http://www.v-front.de/2015/03/vcsa-60-tricks-shell-access-password.html), the previous site does not provide the details on configuring WinSCP. I had an issue when setting up the first time, and someone also commented the shell trick does not work anymore. So I document the step-by-step instruction below.
  • Personally, I prefer the solution #2. Since I don’t need to mess-up the default shell of the root account.

The following instruction tested on vCSA 6.0 update 2 (6.0.0.20000) with WinSCP version 5.9 (build 6786).

  • Login vCSA web console (https://<vcsa-server>:5480)
  • Under Access, click Edit, select the checkboxes for “Enable ssh login” and “Enable bash shell”
    • vcsa.web.console.01
  • Change the Timeout value if necessary
    • vcsa.web.console.02
  • Create a new site in WinSCP
  • Select “SFTP” under File protocol, type the vCSA host name, root and its password.
    • winscp.config.01
  • Click the Advanced dropdown to edit the Advanced Site Settings
    • winscp.config.02
  • Under Environment, SFTP, SFTP server, enter “shell /usr/lib64/ssh/sftp-server” (without quotes)
    • winscp.config.03
  • Click OK and Save the setting, and click Login

Event ID 36886 “No suitable default server credential exists on this system” Fix

Recently, we created a new child domain in the existing AD forest with two new Windows Server 2012 R2 domain controllers. The AD authentication and AD replication between DCs are working fine.

Today, we are trying to set up a third party app (Splunk) with the secure LDAP authentication to the child domain AD. The child domain DC servers are hardened to require signing on the LDAP server signing requirements policy.

However, we get an error “the connection reset by the peer” in the third party app’s LDAP connection test. On the DC server, there is a warning in System event - Event ID 36886 “No suitable default server credential exists on this system. …”

ldaps.authentication.error.01

Troubleshooting

  • According to MS KB321051, “The LDAPS certificate is located in the Local Computer’s Personal certificate store.”
    • Open the Certificates MMC for the Local Computer on the child domain controller. There is a server cert for this domain controller.
    • However, this cert is not fully trusted because the root CA cert is not trusted, which it is caused by not in the Trusted Root Certification Authorities store.
    • ldaps.authentication.error.03
    • I guess this server cert is created / issued when promoting the server to DC, and the cert is issued by the internal Windows Server 2008 enterprise intermediate CAs. (There is a Windows Server 2008 enterprise root CA and intermediate CA in the AD forest.)
    • I find the root CA cert is in the Intermediate Certification Authorities store. I’m not sure why. (The enterprise root CA and intermediate CA are set up by someone else.) This is the cause of the issue.
    • ldaps.authentication.error.04

Solution

  • There are two ways to put the root CA cert back to the trusted root CA store.
    • I can copy and paste the root CA cert from the intermediate cert store to the trusted root CA store.
    • I download the CA certificate chain, open the root CA cert, and install it on the child DC server. Make sure specify the store location is the Local Machine and the Trusted Root Certification Authorities. The default automatic selection will place the root CA cert in the intermediate CA store again.
  • Once the root CA cert is in the right store location, the child DC’s cert shows trusted. The LDAPS connection test in the third party app is successful.
  • ldaps.authentication.error.05

Configuring Windows Firewall Settings in Two GPOs May Corrupt the GPO

On my group policy implementation, in addition to learning about the security setting persistence in GPO, I also observe the “corrupted” GPO breaks the GPO replication between two Windows Server 2012 R2 domain controllers.

Here are the group policies applied to the domain controller OU.

DC.GPO.01 

  • Default Domain Policy and Default Domain Controller Policy are untouched as the Windows default
  • CDE Domain Policy is applied at the domain level with some firewall rules to allow the antivirus server to communicate with the antivirus client on the servers; and the Windows firewall state is left to “Not Configured” (the default). So the administrator on the server can turn off the firewall when it’s needed.
  • PCI Win2012 R2 Hardening - Domain Controller is applied at the Domain Controllers OU level with the PCI compliance settings, including the Windows firewall state to “On”. So the Windows firewall cannot be turned off.

One of the new features in Windows Server 2012 R2 is to detect the GPO replication in Group Policy Management. After configuring and applying the new GPO, I check the status of the new GPO, it shows this GPO is not replicated from the first DC to the second DC with the error “SysVol Inaccessible”.

DC.GPO.02

Troubleshooting

I unlink the new GPO from the domain controller OU, it still shows the same error. As I learn from the previous post, unlinking the GOP does not mean rolling back all the security setting applied to the server.

From the first DC, I can browse to the SysVol share on the second DC via Windows Explorer. Other GPOs are still in sync between the two DCs. So the communication between the two DCs should be fine.

Searching this error on the web, I don’t find any direct relate to this error. Some post does give me an idea that the GPO may corrupt.

I back up and delete the GPO, and verify the replication between the two DCs in sync - selecting the domain name and clicking the status tab in Group Policy Management.

After deleting the GPO, the GPO folder is still left on the drive. This prevents restoring the GPO. I have to manually delete the GPO folder from C:\Windows\SYSVOL\domain\Policies. I also get the access denied error when deleting the folder. Just wait and give enough time for the deletion replication to complete, then I can delete the GPO folder. This also removes this GPO folder from other DCs.

Then I restore the GPO from the backup. When reviewing the settings in the restored GPO, I notice the Windows firewall settings, under Computer Configuration, Policies, Windows Settings, Security Settings, Windows Firewall with Advanced Security, are reverted back to the default (not configured). All my customized Windows firewall settings are gone, but other group policy settings are not impacted.

This leads me to think the corrupted GPO issue (if it is truly corrupted and the replication fails) causing by the Windows firewall settings configured in two GPOs, even there is not conflict in their GPO setting.

To prove my thought, I reconfigure the firewall setting in the new GPO again. Bang! this new GPO is not in sync between the two DCs again!

Then I do the fix again - deleting the GPO and restoring the GPO. This time, I configure the firewall setting in the “CDE Domain Policy” only. All GPOs are in sync between the DCs.

DC.GPO.03

Conclusion

  • I am able to duplicate the GPO sync issue between the two DCs if the Windows firewall settings are configured in more than one GPO.
  • Configuring all the firewall settings in one GPO does not have the GPO sync issue.
  • The GPO sync issue may cause by the corrupted GPO. I don’t get the corruption or sync error (other than SysVol inaccessible) error, like other posts on the internet. So I cannot prove the GPO is corrupted.
  • I don’t find any KB or post against configuring firewall settings in multiple GPOs. The closest one I found is this post. It just says merging the firewall setting under the older “Windows Firewall” section and “Windows Firewall with Advanced Security” section may have unpredictable results. In my setup, I only configure the settings in “Windows Firewall with Advanced Security”.
  • I don’t know the root cause of the GPO sync issue. For now, I will configure all the Windows firewall settings in only one GPO.

GPO Security Setting Persistence

For security compliance purpose, I want to enforce some Windows settings on the Windows server via the group policy. Some GPOs already exist in the domain. Instead of modifying the existing GPOs, I create a new GPO and link it to the OU where the servers locate. I think I can easily roll back / undo the new settings if they cause any issue, by moving the server out of the OU or unlinking the GPO to the OU.

It turns out my original thought is not 100% correct. Some security settings still persist even if the setting is no longer defined in the policy.

For example, on a new Windows Server 2012 R2 server, the local security policy setting of “Network Security: LAN Manager authentication level” is “Not Defined”. In my new GPO, I configure this setting to “Send NTLMv2 response only; Refuse LM & NTLM”. After linking this new GPO to the OU, I cannot RDP to the servers with the domain login (the error message is “The logon attempted failed”, even my username and password are correct), but the local login is okay.

I unlink the new GPO from the OU for troubleshooting and expect the security settings rolling back to their original configuration. However, I get the same error. And the local security policy on the server still configures “Send NTLMv2 response only; Refuse LM & NTLM”. I reset this setting on the local security policy back to the default “Not Defined”. That solves the problem.

Some posts give me the idea on the fix: post 1, post 2.

Now I learn the “tattooing” behavior on the security settings in the GPO.

Persistence in security settings

Security settings may still persist even if a setting is no longer defined in the policy that originally applied it.

Persistence in security settings occurs when:

  • The setting has not been previously defined for the computer.

  • The setting is for a registry object.

  • The setting is for a file system object.

All settings applied through local policy or a Group Policy Object are stored in a local database on your computer. Whenever a security setting is modified, the computer saves the security setting value to the local database, which retains a history of all the settings that have been applied to the computer. If a policy first defines a security setting and then no longer defines that setting, then the setting takes on the previous value in the database. If a previous value does not exist in the database, then the setting does not revert to anything and remains defined as is. This behavior is sometimes called "tattooing."

Registry and file settings will maintain the values applied through policy until that setting is set to other values

Reference: Administer Security Policy Settings, “Persistence in security settings”

Use WinSCP to Transfer Files in vCSA 6.7

This is a quick update on my previous post “ Use WinSCP to Transfer Files in vCSA 6.5 ”. When I try the same SFTP server setting in vCSA 6.7...