Eddie's Blog

Configuring Windows Firewall Settings in Two GPOs May Corrupt the GPO

On my group policy implementation, in addition to learning about the security setting persistence in GPO, I also observe the “corrupted” GPO breaks the GPO replication between two Windows Server 2012 R2 domain controllers.

Here are the group policies applied to the domain controller OU.

Default Domain Policy and Default Domain Controller Policy are untouched as the Windows default
CDE Domain Policy is applied at the domain level with some firewall rules to allow the antivirus server to communicate with the antivirus client on the servers; and the Windows firewall state is left to “Not Configured” (the default). So the administrator on the server can turn off the firewall when it’s needed.
PCI Win2012 R2 Hardening - Domain Controller is applied at the Domain Controllers OU level with the PCI compliance settings, including the Windows firewall state to “On”. So the Windows firewall cannot be turned off.

One of the new features in Windows Server 2012 R2 is to detect the GPO replication in Group Policy Management. After configuring and applying the new GPO, I check the status of the new GPO, it shows this GPO is not replicated from the first DC to the second DC with the error “SysVol Inaccessible”.

Troubleshooting

I unlink the new GPO from the domain controller OU, it still shows the same error. As I learn from the previous post, unlinking the GOP does not mean rolling back all the security setting applied to the server.

From the first DC, I can browse to the SysVol share on the second DC via Windows Explorer. Other GPOs are still in sync between the two DCs. So the communication between the two DCs should be fine.

Searching this error on the web, I don’t find any direct relate to this error. Some post does give me an idea that the GPO may corrupt.

I back up and delete the GPO, and verify the replication between the two DCs in sync - selecting the domain name and clicking the status tab in Group Policy Management.

After deleting the GPO, the GPO folder is still left on the drive. This prevents restoring the GPO. I have to manually delete the GPO folder from C:\Windows\SYSVOL\domain\Policies. I also get the access denied error when deleting the folder. Just wait and give enough time for the deletion replication to complete, then I can delete the GPO folder. This also removes this GPO folder from other DCs.

Then I restore the GPO from the backup. When reviewing the settings in the restored GPO, I notice the Windows firewall settings, under Computer Configuration, Policies, Windows Settings, Security Settings, Windows Firewall with Advanced Security, are reverted back to the default (not configured). All my customized Windows firewall settings are gone, but other group policy settings are not impacted.

This leads me to think the corrupted GPO issue (if it is truly corrupted and the replication fails) causing by the Windows firewall settings configured in two GPOs, even there is not conflict in their GPO setting.

To prove my thought, I reconfigure the firewall setting in the new GPO again. Bang! this new GPO is not in sync between the two DCs again!

Then I do the fix again - deleting the GPO and restoring the GPO. This time, I configure the firewall setting in the “CDE Domain Policy” only. All GPOs are in sync between the DCs.

Conclusion

I am able to duplicate the GPO sync issue between the two DCs if the Windows firewall settings are configured in more than one GPO.
Configuring all the firewall settings in one GPO does not have the GPO sync issue.
The GPO sync issue may cause by the corrupted GPO. I don’t get the corruption or sync error (other than SysVol inaccessible) error, like other posts on the internet. So I cannot prove the GPO is corrupted.
I don’t find any KB or post against configuring firewall settings in multiple GPOs. The closest one I found is this post. It just says merging the firewall setting under the older “Windows Firewall” section and “Windows Firewall with Advanced Security” section may have unpredictable results. In my setup, I only configure the settings in “Windows Firewall with Advanced Security”.
I don’t know the root cause of the GPO sync issue. For now, I will configure all the Windows firewall settings in only one GPO.

GPO Security Setting Persistence

For security compliance purpose, I want to enforce some Windows settings on the Windows server via the group policy. Some GPOs already exist in the domain. Instead of modifying the existing GPOs, I create a new GPO and link it to the OU where the servers locate. I think I can easily roll back / undo the new settings if they cause any issue, by moving the server out of the OU or unlinking the GPO to the OU.

It turns out my original thought is not 100% correct. Some security settings still persist even if the setting is no longer defined in the policy.

For example, on a new Windows Server 2012 R2 server, the local security policy setting of “Network Security: LAN Manager authentication level” is “Not Defined”. In my new GPO, I configure this setting to “Send NTLMv2 response only; Refuse LM & NTLM”. After linking this new GPO to the OU, I cannot RDP to the servers with the domain login (the error message is “The logon attempted failed”, even my username and password are correct), but the local login is okay.

I unlink the new GPO from the OU for troubleshooting and expect the security settings rolling back to their original configuration. However, I get the same error. And the local security policy on the server still configures “Send NTLMv2 response only; Refuse LM & NTLM”. I reset this setting on the local security policy back to the default “Not Defined”. That solves the problem.

Some posts give me the idea on the fix: post 1, post 2.

Now I learn the “tattooing” behavior on the security settings in the GPO.

Persistence in security settings

Security settings may still persist even if a setting is no longer defined in the policy that originally applied it.

Persistence in security settings occurs when:

The setting has not been previously defined for the computer.
The setting is for a registry object.
The setting is for a file system object.

All settings applied through local policy or a Group Policy Object are stored in a local database on your computer. Whenever a security setting is modified, the computer saves the security setting value to the local database, which retains a history of all the settings that have been applied to the computer. If a policy first defines a security setting and then no longer defines that setting, then the setting takes on the previous value in the database. If a previous value does not exist in the database, then the setting does not revert to anything and remains defined as is. This behavior is sometimes called "tattooing."

Registry and file settings will maintain the values applied through policy until that setting is set to other values

Reference: Administer Security Policy Settings, “Persistence in security settings”

How to Convert Eager Zeroed Thick Disk to Lazy Zeroed Think Disk

According to the VMware KB2145183, an Eager Zeroed Thick disk cannot be directly converted / migrated / cloned to Lazy Zeroed Think disk. The work around is:

Convert the Eager Zeroed Think disk to Thin
Convert the Thin disk to Lazy Zeroed

The vmkfstools can be used for the conversion (see my previous post). Here is the commands:

vmkfstools -K <vmdkFile>
vmkfstools -j <vmdkFile>

Error “Idm client exception: Error trying to join AD, error code [11]” when joining a VCSA to AD domain

On a newly created VCSA appliance, I got the following error when joining it to the Active Directory domain

I used the domain’s netbios name\user-name as the user name.

Fix: use the User Principal Name (UPN), user-name@fqdn-domain-name, as the user name. After joining the domain, reboot the VMware Platform Service Controller (PSC).

Troubleshoot vMotion Error 195887167

Updated on 07/13/2016. See the update this post, I might find the ultimate solution, even I am still not sure what the cause of the issue.

Recently I had an issue to vMotion some VMs between the vSphere v 6.x cluster hosts. Long story short, here are the symptoms:

I consistently got the error “Failed waiting for data. Error 195887167. Connection closed by remote host, possibly due to timeout” when vMotioning (host only, no storage vMotion) on some VMs, particularly the vCenter Server Appliance VM. I had two vCSA VMs. Both were have the same issue. But vCSA is not the only VM that I got the error.
I successfully vMotion some VMs between the hosts. The vMotion network configuration should be okay.
In other words, some VMs are okay; some are not. The size (CPU, RAM, storage) of the VMs does not seem the problem. The successfully vMotioned VMs can have more/less CPU, RAM, storage than the failed VMs.
The failed VM has more disks than other VMs. The vCSA VM created 11 disks by default.
When the VMs were powered off, vMotion successfully.
Restarted the hosts and restarted the VMs. No difference.
Verified no IP address conflicts.
Tried one VMkernel adapter for both management and vMotion or a dedicated VMkernel adapter for vMotion. No difference.
Tested vmkping successfully between the hosts.
In the vmkernel.log of the hosts, the error is “2016-05-18T22:47:34.959Z cpu15:39089)WARNING: Migrate: 270: 1463611379229538 D: Failed: Connection closed by remote host, possibly due to timeout (0xbad003f) @0x418000e149ee” on the destination host or “2016-05-19T18:29:23.930Z cpu1:130133)WARNING: Migrate: 270: 1463682486286991 S: Failed: Migration determined a failure by the VMX (0xbad0092) @0x41803a7f6993” on the source host.

Possible solutions

Remove the snapshot on the VM if it has one
- After removing the snap shot on one of vCSA VMs, vMotion worked fine. But another vCSA had no snapshot, it still failed.
Try using the vSphere Client instead of the vSphere Web Client. This worked on some VMs, but not always.
Assign VM’s network adapter to different port group; and change back to its original port group
- This seems the ultimate fix. After doing this, the vCSA VMs, which failed vMotion consistently, are vMotioned successfully.

Conclusion

I’m not sure the root cause of this issue. But it may relate to the network setting on the vSwitch or port group. Some hits about this: vMotion fails with the general system error: 0xbad003f (KB2008394)
These KBs are not the solution in my case:

07/13/2016 update

I run into the same error when migrating some VMs, including the vCenter 6.x appliance (vCSA), between hosts in the Vmware cluster. I am able to migrate other VMs. This leads me to believe the problem on the VM, instead of the VM infrastructure.
Within the VMs having this error, some of them, which I can power off, are migrated successfully. However, I can not power down the vCSA VM. Because I cannot perform the migration without the vCenter available.
I try assigning the NIC of the vCSA VM to different port group, and change back. However, I cannot do that, because this VM is the vCenter server and is configured with a distributed switch. If I change the vCSA to different port group (configured with a different VLAN), the vCenter server will be down (because its NIC is assigned to the wrong port group with the wrong VLAN); and I cannot change it back to the original port group with the right VLAN.
I try connecting directly to the ESXi host of the vCSA VM via the vSphere C# client, then assign the NIC of the vCSA VM to different port group. However, I cannot do that either. Because the host is only configured with the distributed switches, there is no other port group in the selection at this situation.

I try create a new port group with the ephemeral port binding on the distributed switch. This new ephemeral port group is available in the vSphere C# client when connecting to the host directly. Then I assign the VM to the ephemeral port group and change back. However, the migration still fails with the same error.
Since I could fix this issue last time by changing the port group and changing back, I guess that somehow reset the NIC on the VM or the virtual switch port to which the VM is connected. That gives me an idea to manually assign the VM to another virtual switch port.

Solutions

In the screenshot above, the NIC of the vCSA is assigned to port 214 on the vSwitch.
Log back in the vCenter via the vSphere C# client or Web client (I cannot see the ports on the distributed switch, nor change the port assigned to the VM’s NIC when connecting to the ESXi host via the vSphere C# client)
I find an used port on the same port group (e.g. port 423 in my case).

Edit the vCSA VM setting and assign its NIC to the unused port.
Then I can successfully migrate the vCSA VM to another host.

Conclusion

Changing the NIC to an used port will be my first attempt when this issue happens again (I bet it will happen).
I still don’t know the cause of the this issue.

10/15/2019 update

I got the exact error again when Storage vMotion a VM across two vCenter (from vCSA 6.0 to vCSA 6.5). Deleted the VM snapshot, reran the vMotion, and completed successfully.

Change Windows Server 2008 or 2012 Network Profile

Sometimes a Windows server is assigned to the incorrect network profile. It can cause applying the wrong Windows Firewall rules. Here is how to change its network profile.

For standalone server

Can change the profile to public or private; but cannot set to domain
For Windows Server 2012

Open PowerShell as administrator
Get-NetconnectionProfile | Set-NetconnectionProfile -NetworkCategory [Private | Public]

For Windows Server 2008 or 2012

gpedit.msc, Computer Configuration, Windows Settings, Security Settings, Network List Manager Profiles
Select the network name, Properties, Network Location
Under Location Type, select Private or Public

For domain joined server

Cannot change the profile manually
It’s determined by Network Location Awareness (NLA) service (see “Network Location Awareness (NLA) and how it relates to Windows Firewall Profiles”)
If the network profile is incorrect,

verify the server can contact a DC via UDP port 389
restart the Network Location Awareness service on the server

Extend Microsoft Cluster Shared Disk in VMware

A VM shared disk on Microsoft Cluster Service (MSCS) is running out of disk space. The VMs are on a single host (aka cluster in a box - CIB). I can think of two ways to expand the disk storage.

create a new big shared disk for the cluster, migrate the data, then change the new disk to the same drive letter as the original disk
extend the size of the existing shared disk

Obviously the latter seems simpler, but it requires special attention. The shared disk format in MSCS VMs must be in eager zeroed thick format. However, when extending an eagerzeroedthick VMDK, the extended chuck is in lazy zeroed thick format by default (reference “Extending an EagerZeroedThick Disk”. In my test, vSphere 6 has the same behavior)

Here is how I extend the MSCS shared disk

Power off both servers in the cluster
Increase the VMDK disk size. There are two ways:

GUI: edit the VM settings, increase the shared disk size
CLI: use vmkfstools -X <newsize> -d eagerzeroedthick <vmdkfile>

For more info about vmkfstools, see my vmkfstools Examples post

Using the GUI, the extended chuck will be in lazy zero thick format. The VM will fail to power on with the error “VMware ESX cannot open the virtual disk for clustering…”

There are two ways to convert the extended chuck to eagerzeroedthick format

Migrate the VM to another storage, and specify the eager zero thick format for the disk
Use vmkfstools -k <vmdkfile>

Once the entire shared disk is the eager zeroed thick format, the VM will be able to power on.
Extend the Windows partition as KB304736

Eddie's Blog

Search This Blog

Configuring Windows Firewall Settings in Two GPOs May Corrupt the GPO

GPO Security Setting Persistence

How to Convert Eager Zeroed Thick Disk to Lazy Zeroed Think Disk

Error “Idm client exception: Error trying to join AD, error code [11]” when joining a VCSA to AD domain

Troubleshoot vMotion Error 195887167

Change Windows Server 2008 or 2012 Network Profile

Extend Microsoft Cluster Shared Disk in VMware

Use WinSCP to Transfer Files in vCSA 6.7