Friday, January 24, 2020

Getting Reports from Long Running Performance Monitor Data Collector Sets

When trying to track down an elusive Active Directory performance problem, gathering stats using the Active Directory Diagnostics Data Collector Set is the best method for insight as to what the Domain Controller is doing. However, having super busy DCs and not knowing exactly when the problem is going to occur can make capturing the data and generating a useful report harder.
First, let’s make sure we can capture all the data on a busy DC. We’ll need to create a custom version of the default Active Directory Diagnostics DCS to be able to make the necessary changes.

Start by creating a new User Defined Data Collector Set. Give it a name, create it from an existing template. Use the Active Directory Diagnostics as the template. Finish the wizard.

Now we’ll make a small change that allows the DCS to continuously capture data without fear of impacting performance or disk capacity too much. Open the Data Manager Settings, set the Resource Policy to Delete oldest, and adjust the size or folder count if needed.

Next, we make the DCS long running but still have a manageable size. Open the DCS Properties and switch to the Stop Condition tab. Turn off the Overall Duration checkbox, turn on Restart the data collector set at limits, then choose a limit based on either duration or size. For relatively busy DCs I’d suggest a duration time limit of 1 hour. Adjust as needed for your DCs. This change to the stop conditions causes the DCS to run until you turn it off, but it will switch to a new set of files in a new folder once the configured limit is reached. This helps keep each capture from growing too large and doesn’t risk missing any events.

Start your DCS, then wait. Once the event you wanted to capture occurs, stop the DCS. The event and the conditions around it will be somewhere in one of the collection folders. Hopefully you know when the event occurred so you can isolate it to a single set of capture files.

When the DCS stops, it will automatically create the report for the final set of files but will not create a report for any prior sets, so we’ll have to generate the report for any previous sets manually. In each DCS collection folder, you should see four files: Active Directory.etl, AD Registry.xml, NtKernel.etl and Performance Counter.blg. There is a fifth file that we need to generate the report. We will manually create it. Create a new text file in the folder called reportdefinition.txt. In that text file, add the following XML and save it.
<Report name="wpdcAdvisor" version="1" threshold="9999"><Import file="%systemroot%\pla\reports\Report.System.Common.xml"/><Import file="%systemroot%\pla\reports\Report.System.Summary.xml"/><Import file="%systemroot%\pla\reports\Report.System.Performance.xml"/><Import file="%systemroot%\pla\reports\Report.System.CPU.xml"/><Import file="%systemroot%\pla\reports\Report.System.Network.xml"/><Import file="%systemroot%\pla\reports\Report.System.Disk.xml"/><Import file="%systemroot%\pla\reports\Report.System.Memory.xml"/><Import file="%systemroot%\pla\reports\Report.System.Configuration.xml"/><Import file="%systemroot%\pla\Reports\Report.AD.xml"/></Report>
You may notice these are the same files that show up in the Data Manager Settings, Rules section for the DCS.

Finally, execute the following command line from within the capture directory you want to use for the report.
tracerpt.exe *.blg *.etl -df reportdefinition.txt -report report.html -f html
If everything went right, you should end up with a normal DCS diagnostic report that you can review which covers the time period from when the event occurred.

As a neat trick, if you need to see more than the top 25 items that the report defaults to, you can run the following command to get full XML output:
tracerpt.exe –lr "Active Directory.etl"
For additional reading on similar but not identical issues that led me to this solution, I offer up the Canberry PFE team blog Issues with Perfmon reporting - Turning ETL into HTML, the Directory Services Team blog Are your DCs too busy to be monitored?: AD Data Collector Set solutions for long report compile times or report data deletion and the Core Infrastructure and Security blog Taming Perfmon: Data Collector Sets.

Tuesday, November 12, 2019

Disconnecting Objects with AADConnect Default Filtering

If you're familiar with MIM, you know there exists the capability to disconnect an object from the metaverse to force it to go through the join/provision process again. This is useful when the object was joined to the wrong metaverse object for some reason (like a bad join ruleset or incorrect data at the time of joining) and you want to have it be reassessed like it was a new object. In AADConnect, the disconnect function has been removed.

If you have the ability to change (or get changed) the original AD data, you can leverage the default filtering rules to temporarily disconnect an object. This is the main topic for this blog post.  If you can’t get the original AD data changed, you can follow the process in my original Disconnecting Objects with AADConnect post that shows an AADC-only method.

This feature is kind of hidden, not well documented, and not obvious when you see it.

If you look at the default filtering rules for the In from AD - User Join or In from AD - User Common rules, you’ll see these default scoping filters:


The filter we’re concerned with is
adminDescription NOTSTARTSWITH User_
For a source object to attach to an inbound rule it must satisfy the conditions in the scoping filter.  In this case, so long as adminDescription does not start with “User_” it will pass the filter and attach to the rule.  AdminDescription is blank on all objects by default so the normal projections and data flows happen.

So if you put a value of “User_<something>” on a user object, it will no longer attach to this rule.  And because In from AD - User Join is our sole default provisioning rule, once an object loses that rule, it is no longer allowed to project into the MetaVerse and becomes a disconnector!

Once disconnected, you can make any other data changes that are needed to retry a join or re-provision.  When ready, clear the adminDescription and the disconnector object will be reevaluated at the next delta sync run like any other new object.

Groups have a similar default filter of adminDescription NOTSTARTSWITH Group_ that can be used to disconnect groups.

I have a customer with a few scenarios where users need to be disconnected, so they enacted workflows to stamp User_Transfer or User_Disable on objects at specific points in their lifecycle.

Now you can easily disconnect objects and reevalute and hopefully not miss the lack of a disconnect button anymore.

Thursday, April 18, 2019

Changes to Ticket-Granting Ticket (TGT) Delegation Across Trusts in Windows Server (PFE edition)

I helped with some content referencing the upcoming May and July 2019 patches that change the default behavior for cross-forest unconstrained delegation. The full post is available at the new TechCommunity home of the AskPFE blog.

Monday, November 12, 2018

AADConnect Resilient Operations

In my job as an Identity-focused Premier Field Engineer, I get to help many companies on their journey from being solely on-premise to becoming a cloud-first enterprise. But the one thing all of them have in common is that they will be in a hybrid setup for the foreseeable future. Hybrid is the new IT reality. For most Microsoft customers that means Azure AD Connect is your Identity Bridge between on-premise and the Azure Cloud. I have found that not all the customers with which I work have a full understanding of how AADConnect is designed to function or how to ensure their AADConnect infrastructure is running smooth and is resilient to failure.

Resilient operations for AADConnect involves three main topics that I wanted to cover today: Health, High Availability (HA) and Upgrades.

Synchronization Health

The ideal operational state for AADConnect has two characteristics for which we look. The first is that no errors occur during any stage of the sync process. This snapshot from my lab shows entire sync cycles running without error. All the import, delta syncs and exports have a status of success.


Any current version of AADConnect will have the health agent installed. The health agent will report into your Azure AD portal any data sync issues that are occurring. It’s more likely that you are working in the portal than looking at the Sync Manager, so surfacing the errors in the portal should give more visibility to any errors that exist. You need to work on clearing out any errors that exist. I’ve seen too many environments where errors are allowed to persist, and the companies wonder why they have account problems in Azure AD. Having zero errors will also make future upgrades and changes easier. We’ll see why later.

The second characteristic we want to see is AADConnect having zero disconnectors in each of the connectors. Those of you not familiar with how the sync engine works in AADConnect (or in MIM) may not know what a disconnector is. The simple answer is that a disconnector is an object in the connected directory that is in scope, yet not being managed by the sync engine. The more complete answer is described in the architecture document.

Having zero disconnectors on your Azure AD connector means that every object in Azure AD is being actively managed by the sync engine.

Disconnectors are reported during the Delta Sync phase for the connector.


This shows that I currently have one disconnector in Azure AD. Disconnectors in Azure AD are especially troublesome as it means nothing is managing that object in Azure AD. It will never get changed or deleted by AADConnect.

To figure out what the disconnector is we need to run the command line tool CSExport to export the disconnectors.

The syntax to use for exporting disconnectors is csexport.exe MAName ExportFileName /f:s /o:d

As an example in my lab, to get the disconnectors for the Azure AD connector I run

C:\>"C:\Program Files\Microsoft Azure AD Sync\Bin\csexport.exe" " - AAD" c:\temp\aaddisconnectors.xml /f:s /o:d

Microsoft Identity Integration Server Connector Space Export Utility v1.1.882.0

c 2015 Microsoft Corporation. All rights reserved


Successfully exported connector space to file 'c:\temp\aaddisconnectors.xml'.

This will output an XML file with the details of every disconnector. In my case, I had a leftover device from testing hybrid Azure AD join. After deleting the device from Azure AD, my disconnector count returned to zero.

You need to work your way through the XML document and figure out why each object is disconnected. Should it have been connected to something in Active Directory and isn’t? Is it an orphaned object in Azure AD that needs to be deleted? The goal is to have zero disconnectors for the Azure AD connector.

For any Active Directory connectors you have, the goal is also to have zero disconnectors, but it’s more difficult to achieve a value that is absolutely zero. This is because any container-type objects (such as the OUs that contain the users) get reported as disconnectors. So realistically, we’re looking to achieve a count of disconnectors that is low and static so that we can tell if the number of disconnectors has changed.

To minimize the number of disconnectors that occur because of the OUs, you can run through the configuration wizard and only select the OUs with users and computers that you care about, unchecking the OUs that aren’t needed.

A word of caution here. Don’t just unselect random OUs without being absolutely certain they aren’t needed. If an OU is deselected and it contained objects that were being synched into Azure AD, those objects will be deprovisioned from Azure AD. You can mitigate this risk by using a staging server to test a new configuration change before the change goes live. We’ll talk more about staging mode later.

If your AD setup is such that you have a small and static number of OUs selected, you can ideally end up with a disconnector count around 10 or so. Know what that number is for your environment, so that if it changes that means you have a new disconnector that should be reviewed and remediated. If your AD setup has lots of OUs that need to be included, and the number of OUs keeps changing (maybe you have an OU per site or department and those change frequently), you can create a custom inbound rule that will project the OUs into the metaverse. This changes the OUs into connectors and returns you to a state where the number of disconnectors should be almost zero. See Appendix A for how to create the necessary rule for connecting OUs.

In summary for this section, an AADConnect server with zero errors and zero disconnectors means we are running a well-managed environment that has no data problems affecting the sync operations of AADConnect.

High Availability: Using a Staging Server

The good news for AADConnect is that the sync engine itself is not involved with any run-time workloads for your users, reducing our HA requirements. You could shut off the AADConnect sync service and your users would still have access to all their Azure and Office 365 resources. Changes to user data (adds, deletes, changes) won’t happen, but that doesn’t affect availability in the short-term. However, depending on which sign-on option you are using, there may be additional considerations. If you are performing Password Hash Sync, that sync process runs on its own every two minutes. Users could be impacted if they change their AD password and AADConnect isn’t running; there will be a mismatch between their cloud password and their on-premise AD password. If you are performing Pass-through Authentication the first agent is installed on the AADConnect server; you need to install additional agents on other servers to provide HA for that feature. If you have configured Password Writeback then the AADConnect service needs to be running for the Service Bus channel to be open. Finally, if you use ADFS, HA designs for federated sign-on are out-of-scope for this AADConnect discussion.

Accordingly, we need some measure of HA to keep the Azure AD data from becoming too stale. “Too stale” is a relative term. For a small environment with few changes that may mean you can run for weeks without AADConnect running and not experience any issues. For larger environments, you may not want to have AADConnect be down for more than a few hours.

The major HA design model for AADConnect is to have a second, fully independent installation of AADConnect that runs in staging mode. This server will see all the same data changes that happen in AD, is configured with the same rule sets as the active server and validates that the changes it expects to see in Azure AD were actually made in Azure AD by the active server. The general concept is that if both servers see the same input, and have the same rules, they will both independently arrive at the same output result. The Operation tasks topic goes into detail as to how to set up the staging server, but it neglects to cover how to realistically get the same input, the same rules and the same output on both sides. All three are problems I’ve seen in the field for implementations that have some measure of customization in the rules and, shall we say, less than pristine data sources.

Let’s start with making sure we have the same rules on both the active and staging server. To get your customizations on the staging server, you can either create them all by hand via the GUI or leverage the export capability from the active server. When you select one or more rules in the Rules Editor and select Export, you’ll get a PowerShell script generated that can almost be used to create the same rule on the staging server. The main problem with the generated PowerShell script is that it uses GUIDs to represent the connectors, and those GUIDs are unique to the AADConnect server on which they were created. The same connector will have a different GUID between the active and staging servers. But if you manually adjust the Connector GUID you’ll be able to run the script to recreate the custom rule. The PowerShell script is designed to create a new rule, not apply customizations to an existing rule. This means it’s a good reason to follow the GUI guidance when customizing a built-in rule to disable the built-in rule and create a new one to hold the customizations.

Now that we’ve attempted to configure the same rules on both sides, how do we confirm that we successfully accomplished that task? I have a PowerShell script on the TechNet Gallery called Reformat AADConnect Sync Rule Export for Easy Comparisons that will use the exported PowerShell scripts and an export of the connector configuration to generate a version of the creation scripts that is designed for comparison using WinDiff or similar comparison tool. Each rule is rewritten into its own file, making it much easier to perform a rule-by-rule comparison without needing to export each rule one at a time.

When looking at the results of the comparison, you will generally find two categories of differences even when all the rules appear to have been duplicated appropriately. The first category is made up of the internal Identifier GUIDs of the rules themselves. Like the Connector GUIDs, each rule has an internal GUID that is unique per server. This difference can be ignored. The second category you’re likely to find is due to newer versions of AADConnect having different default rules than earlier versions. Newer rules usually add in new attribute flows, change calculations or introduce entirely new rules. When this occurs the precedence number for each rule can also change, resulting in more differences showing up when comparing the two different rule sets. But, after you’ve looked at a few of the difference details, you’ll notice the pattern and can quickly complete the validation that no other unexpected differences show up in the report.

Here’s an example WinDiff comparison between an active server and a staging server that was upgraded with a more recent build. We’re looking specifically at the In from AD – Group Common rule.


You can see the differences in the GUIDs (lines 3 and 11). Because this upgrade also has a new data flow, you can see additional changes to the ImmutableTag (line 14) and the new data flows (new section starting at line 170).

Please keep in mind that this comparison task is really designed to help make sure your own customizations were successfully duplicated. If there has been a change to one of the default rules in a newer version of AADConnect, like in this example, do not attempt to revert the rule back to a prior version. The newest default rules are almost always correct and represent the best default data flow for Azure AD that is currently available. But knowing that a default rule has changed is important for the following steps of getting a new staging server ready.

Next, we will want to confirm that the input data on the staging server is the same as on the active server. Going through the configuration wizard you can manually validate that the same OUs have been selected on both sides. Or you can use an export of the Connector configurations and compare the saved-ma-configuration/ma-data/ma-partition-data/partition/filter/containers sections of the XML documents. To export a Connector configuration, open the Synchronization Service GUI on the AADConnect server, navigate to the Connectors tab, highlight a connector and select Export Connector from the Actions menu. Provide a File name and click Save.

Finally, and most importantly, is to validate that the same output results are occurring on both the active and staging servers. This is done by examining the export queues on the staging server. When in staging mode the one and only difference from active mode is that no export steps are executed.

Let’s have a quick discussion on how export data is created. In AADConnect, when a synchronization step is executed, the inbound rules from that specific connector and all outbound rules for all connectors are executed against each object being synced. The results of the sync task are compared to AADConnect’s current view of the connected data source (from the last time the object was imported) and any differences in data flows due to changed input values or different rules are computed and stored in the export queue. When the export step happens, the pending exports are written to the connected data source. However, because we’re in staging mode, the export step doesn’t happen and all the changes that AADConnect wants to make are nicely kept in the export queue for us to review.

Now that we know how the exports are constructed, we can see what happens in the ideal case. In the ideal case, the staging server has all the same input data from AD as the active server; it has the exact same rules; and it calculated the exact same final state of the objects as they already exist in Azure AD. Therefore, it has no changes that it wants to apply to Azure AD, so the export queue is empty. When the export queue is empty, the staging server is ideally situated to take over for the active server as it will immediately pick up from where the active server left off.

Unfortunately, the production systems I usually see are far from ideal. They have errors that are present, they have disconnectors that are present, and they have high volumes of changes being processed through them. This means the export queue on the staging server is almost never empty. Therefore, we need to analyze the export report to figure out why the staging server thinks that it has data that it wants to change.

To improve our analysis ability, we need to make sure we’ve accomplished the health status items we talked about earlier. We need to eliminate errors and minimize the disconnectors in the active server. Then we can go about trying to remove the noise that is due to transient data changes that are flowing though the active server and eventually seen by the staging server. To do this we can compare the pending exports between the active server and the staging server at two different points in time. By putting a delay in the comparison, we can ensure that the staging server has seen the changes that the active server was in process of making. The first pass generates a list of potential changes that we care about. If those same changes are still present a few hours later, after several sync cycles have run, then we can be sure the changes are not in the queue simply due to an in-process change.

We end up needing a total of four exports for each Connector to run this comparison. That’s one each from the active and staging servers to start with, then a second set of exports from both a few hours later after some sync cycles have run. Generating an export file is a two-step process. We first look at the pending exports by returning to csexport.exe with syntax of csexport.exe MAName ExportFileName /f:x. After generating the XML document of the pending exports, we convert that to CSV using the CSExportAnalyzer with syntax like CSExportAnalyzer ExportFileName.xml > %temp%\export.csv

In my lab, one instance of those commands looks like this:

C:\Program Files\Microsoft Azure AD Sync\Bin>csexport.exe " - AAD" c:\temp\aadexport.xml /f:x

Microsoft Identity Integration Server Connector Space Export Utility v1.2.65.0

c 2015 Microsoft Corporation. All rights reserved


Successfully exported connector space to file 'c:\temp\aadexport.xml'.

C:\Program Files\Microsoft Azure AD Sync\Bin>CSExportAnalyzer.exe c:\temp\aadexport.xml > c:\temp\aadexport.csv

Then I have a second script on the TechNet Gallery called Compare AADConnect Pending Exports Between Active and Staging Servers that compares these pending export CSV files looking for entries that never successfully get exported. On the active server, exports that never complete are usually due to errors that should be showing up in the error conditions we previously discussed. On the staging server, exports that are present in both sets of reports show the final set of data that the staging server will change when it is reconfigured to become the active server.

From this final report, if there are unexpected exports sitting in the export queue of the staging server, we know they are due to one of the two things already discussed. Changes to the rules can result in a significant amount of new data being staged for export to Azure AD, especially when the updated rule is flowing new default data that wasn’t being exported in previous versions of AADConnect. Changes to the input data can result in different sets of objects being prepared for export.

While the report will tell us what is different, it doesn’t directly tell us why. Fully analyzing the system to understand why the data is different is beyond the scope of this blog post. But hopefully we’ve produced enough useful artifacts to help you discover the source.

In summary for this section, a staging server is used to provide HA capabilities to AADConnect but building a staging server and getting it properly configured can be difficult. I’ve highlighted the main data elements we need to be managing to ensure the staging server has good data on which to operate. And I’ve laid out a methodology for comparing a staging server to an active server, so we can be aware of where differences occur. Finally, we have a way to validate the changes the staging server wants to make to the connected systems before cutting over and before any changes are made.


There are two ways to manage AADConnect upgrades. The first is to simply perform an in-place upgrade on the existing active server by running the new installer. It will make whatever changes it wants to make to the rules and will export the results to Azure AD. You can even automate the upgrade process by enabling the Automatic Upgrade feature. My recommendation for automatic upgrades is to use this method only if you’ve done no customizations to the rules.

The second way is to make use of the staging server for a swing migration. Operationally, it will be identical to the process we already went through to build out the staging server in the first place. Run the upgrade first on the staging server. Perform the comparisons to ensure no unexpected changes were introduced in the upgrade process. If you’re comfortable with the data sitting in the export queue, proceed with putting the current active server into staging mode, then put the new staging server into active mode and it will start exporting the changes it was holding in its export queue. If all the changes processed without causing any errors, then you can upgrade the first server to bring it up to the same level as the newly active server. Now you’ve fully swapped between the active and staging servers.

I hope you’ve found this discussion useful and have a better understanding of how to manage your AADConnect infrastructure to provide the best possible foundation for your hybrid Azure experience.

Appendix A: Connecting OUs

First, we need an object type in the metaverse to represent OUs. In the Sync Manager, go to the Metaverse Designer, create a new object and call it organizationalUnit. For the attributes, include displayName and sourceAnchorBinary.


In the Rules Editor, create a new Inbound rule. The connector is your AD forest, the precedent must be a unique, non-zero number (I picked 90). The object type is organizationalUnit. The Link Type is Provision. For the Join rules, include objectGUID = sourceAnchorBinary. For the transformations, include direct objectGUID to sourceAnchorBinary and direct dn to displayName.




Wednesday, November 7, 2018

Disconnecting Objects with AADConnect

If you're familiar with MIM, you know there exists the capability to disconnect an object from the metaverse to force it to go through the join/provision process again. This is useful when the object was joined to the wrong metaverse object for some reason (like a bad join ruleset or incorrect data at the time of joining) and you want to have it be reassessed like it was a new object. In AADConnect, the disconnect function has been removed. The only supported way to cause an object to be reevaluated is to delete all objects from the connector space, and run a Full Import and Full Sync against all objects. That's a bit heavy handed when you have a large connector space and only want to reevaluate a single object.

Update 2019.11.12: If you can edit the source AD object you can leverage the default filtering rules to disconnect the object.

Fortunately, there's a round-about way, that clearly says it is for testing only, to disconnect an object in AADConnect.

Note: performing this procedure incorrectly can introduce incorrect data into AADConnect, which may require deleting all the connector space objects and running a Full Import and Full Sync to correct. Make sure you understand how this works before you make changes to a production system.

First, we need to save the connector space view of the object as we'll need it later. The syntax is csexport.exe ma_name filename.xml /f:d="DN" /o:b

As an example from my lab:

C:\Program Files\Microsoft Azure AD Sync\Bin>csexport.exe "" c:\temp\contact001.xml /f:d="CN=Contact001,OU=userids,DC=logon,DC=loderdom,DC=com" /o:b 
Microsoft Identity Integration Server Connector Space Export Utility v1.2.65.0
c 2015 Microsoft Corporation. All rights reserved
Successfully exported connector space to file 'c:\temp\contact001.xml'.

Second, we delete the single object from the connector space. The syntax is csdelete.exe ConnectorName ObjectDN

As an example from my lab:

C:\Program Files\Microsoft Azure AD Sync\Bin>csdelete.exe "" "CN=Contact001,OU=userids,DC=logon,DC=loderdom,DC=com"

Next, we need some template data for how to structure the import. You can create the template yourself by creating a new delta import Run Profile. Use the Set Log File Options to select the Create a log file setting and provide a filename. Make a small change to an in-scope AD object, and run the new delta import step. This should cause an XML file to be created in the C:\Program Files\Microsoft Azure AD Sync\MaData\MAName folder. This log file is the basis for the template to import data. It should look something like this:

<?xml version="1.0" encoding="UTF-16"?>
<mmsml xmlns="" step-type="delta-import">
<delta operation="replace" dn="CN=Contact001,OU=userids,DC=logon,DC=loderdom,DC=com">
 <anchor encoding="base64">NhtjCw4HbUuNLrUso4zsyw==</anchor>
 <parent-anchor encoding="base64">NH+Z2J4tEkuLRC42kBckew==</parent-anchor>
 <attr name="cn" type="string" multivalued="false">
 <attr name="displayName" type="string" multivalued="false">
 <attr name="givenName" type="string" multivalued="false">
 <attr name="objectGUID" type="binary" multivalued="false">
  <value encoding="base64">NhtjCw4HbUuNLrUso4zsyw==</value>
 <attr name="sn" type="string" multivalued="false">

I've highlighted the template portion in yellow, and the replaceable object-specific content in green.

Take the XML data from the cs-objects/cs-object/synchronized-hologram/entry section of the export file from the first step of this procedure and use it to replace the data from the import template. Take care to make sure the XML is still structured properly. The DN is part of the entry element, with the other data being children.

The structure should be identical to the import template from above, just containing data from the object you exported.

When saving the template XML file with notepad, be sure to use Unicode, not ANSI. Copy the input file to the C:\Program Files\Microsoft Azure AD Sync\MaData\MAName folder.

Finally, we can import this object into the connector space. Edit the custom delta import Run Profile (or create a new one if you just used the XML from above). Use the Set Log File Options to select the Resume run from existing log file and stage to connector space (test only) setting and provide the filename.

If you get a parsing error, validate that the XML structure matches the template example and that the file is encoded with Unicode.

Thursday, October 11, 2018

Compare AADConnect Pending Exports Between Active and Staging Servers

On the TechNet Gallery I posted a script for comparing the export queues between an active AADConnect server and a staging server so you can verify the staging server is not planning on exporting unexpected or unwanted data.  The script helps remove unnecessary noise from the standard export CSV by making use of exports at two different points in time.  This removes any transient data from the report so that only data present in the export queue both times we looked is in the final report.

Compare AADConnect Pending Exports Between Active and Staging Servers

Tuesday, September 19, 2017

AdminSDHolder in the news again

The Enterprise Mobility blog just published a post on Active Directory Access Control List – Attacks and Defense.

One of the concerns they call out is modifying AdminSDHolder, with a reference to Exchange.

Long time readers might remember that I am the one that first brought this concern with a Release Candidate of Exchange to the forefront.

Securing Privileged Access for the AD Admin – Part 2

Cross Post from

Hello everyone, my name is still David Loder, and I’m still PFE out of Detroit, Michigan.  Hopefully you’ve read Securing Privileged Access for the AD Admin – Part 1.  If not, go ahead.  We’ll wait for you.  Now that you’ve started implementing the roadmap, and you’re reading this with your normal user account (which no longer has Domain Admin rights), we’ll continue the journey to a more secure environment.  Recall the overarching goal is to create an environment that minimizes tier-0 and in doing so establishes a clear tier-0 boundary.  This requires understanding the tier-0 equivalencies that currently exist in the environment and either planning to keep them in tier-0 or move them out to a different tier.

Privileged Access Workstations (PAWs) for AD Admins

You’ve (hopefully) gone through the small effort to have a credential whose only purpose is to manage AD.  Let’s assume you now need to go do some actual administering.  The only implementation that prevents expansion of your tier-0 equivalencies would be to physically enter your data center and directly log on to the console of a Domain Controller.  But that’s not very practical for any number of obvious reasons and I think everyone would agree that an AD Admin being able to perform their admin tasks remotely from a DC console is a huge productivity gain.  Therefore, you now need a workstation.

I’m going to guess that most of you use the one workstation that was handed out by your IT department.  That workstation which uses the same base image for every employee in the organization.  That workstation which is designed to be managed by your IT department for ease of support.  Yes, that workstation.

Recall last time we spent almost all our time talking about tier-0 equivalencies.  Guess what?  I’m going to sound like a broken record.  Item #3 from our elevator speech in part one stated “Anywhere that tier-0 credentials are used is a tier-0 system.” What is the new system we just added to tier-0?  That workstation.  Now, any process that has administrative control over that workstation is a tier-0 equivalency.  Consider patching, anti-virus, inventory and event log consolidation.  Is each of those running as local system on your workstation and managed by a service external to the laptop?  Check, check, check and check.  Does it have Helpdesk personnel as local admins? Check. I’ll ask again how big is your tier-0?

I hear some of you starting to argue ‘I don’t actually log on to my workstation with my AD admin credential, I use [X].’  What if you use RunAs?  That workstation is still a tier-0 system.  What if you use it to RDP into a jump-box?  That workstation is still a tier-0 system.  What if you have smartcard logons?  Still a tier-0 system.  Some of the supplemental material goes into the details of the various logon types, but the simple concept is ‘secure the keyboard.’  Whatever keyboard you’re using to perform tier-0 administration is a tier-0 system.

Now that we’ve established that your workstation really is a tier-0 system, let’s treat it as such.  Start acting like your workstation is a portable Domain Controller.  Think of all those plans, procedures and systems you have in place to manage the DCs.  You need to start using them to manage your workstation.  My fellow PFE Jerry Devore has an in-depth look at creating a PAW to be your admin workstation.

Should your PAW be a separate piece of hardware?  Preferably, yes.  That way it is only online when it needs to be used, helping to reduce the expansion of tier-0 to the minimum necessary.  If your organization can’t afford separate hardware you can virtualize on one piece of hardware.  But the virtualization needs to occur in the opposite direction than you might ordinarily expect.  The PAW image will still need to run as the host image, and your corporate desktop would be virtualized inside.  This keeps any compromise of your unprivileged desktop from elevating access into your PAW.

This is another big step/small step decision.  PAWs will be a change for your organization.  If you can start small by implementing it for a few AD Admins, you can show your enterprise that using PAWs can be a sustainable model.  At later phases in the roadmap you can expand PAWs to more users.

With a PAW in place you now have a tier-0 workstation for your tier-0 credential to manage your tier-0 asset.  Congratulations, by implementing the first two steps down the SPA roadmap, you now have the beginnings of a true tier-0 boundary.

Unique Local Admin Passwords for Workstations

So far, we’ve been talking about protecting your personal AD Admin accounts.  But everyone knows AD has its own built-in Administrator account that is shared across all DCs.  Ensure you have some process in place to manage that specific “break in case of fire” account.  Maybe two Domain Admins each manage half of the password, and those halves are securely stored.  The point is: have a procedure for managing this one account.  Be careful if you decide to implement an external system to manage that password.  Do you want that external system to become tier-0 just to manage one AD Admin account?  I can’t answer that question for you, but I can point out that it is a tier-0 boundary decision.  Your new PAWs, on the other hand, will have one built-in Administrator account per PAW.  How do we practically secure those multiple Administrator accounts without increasing the size of tier-0?

The answer is to implement Microsoft’s Local Administrator Password Solution (LAPS).  Simply put, LAPS is a new Group Policy Client Side Extension (CSE), available for you to deploy at no additional cost.  It will automatically randomize the local Administrator account on your tier-0 PAWs on an ongoing basis, store that password in AD and allow you to securely manage its release to authorized personnel (which should only be the tier-0 admins).  Since the PAW and AD are both already tier-0 systems, using one to manage the other does not increase the size of tier-0.  That fits our goal of minimizing the size of tier-0.

These new PAWs that you just introduced into the environment also become the perfect place to begin a pilot deployment of LAPS.  Install the CSE on the PAWs, create a separate OU to hold the PAW computer objects, create the LAPS GPO and link it to the PAW OU.  You’ll never have to worry about the local admin password on your PAW again.  As another big step/small step decision, using LAPS to manage the new PAWs should be an easier step than starting out using LAPS for all your workstations.

If you’re interested in how LAPS allows us to help combat Pass the Hash attacks, here are a few additional resources you can review.

Unique Local Admin Password for Servers

Building on your previous work of where you want your tier-0 boundary to be, start running LAPS on those member servers that are going to remain part of tier-0.  Again, a smaller step than LAPS everywhere, and not much else to say on the subject.  By this point you should be familiar with LAPS and are just expanding its usage.

End of the Stage 1 and the Roads Ahead

If you expand LAPS to cover all workstations and all servers, congratulations, you have now followed the roadmap to the end of Stage 1.

Stage 2 and Stage 3 of the roadmap involves expanding the use of the PAWs to all administrators, implementing advanced credential management that begins to move you away from password-only credentials, minimizing the amount of standing, always-on, admin access, implementing the tier-0 boundary you already decided upon, and increasing your ability to detect attacks against AD.  You can also start looking at implementing Windows Server 2016 and taking advantage of some of our newest security features.

In these stages, we’re looking at implementing new capabilities that defend against more persistent attackers.  As such, these will take longer to implement than Stage 1.  But if you’ve already gotten people familiar with the tiering model and talking about your tier-0 boundary you’ll have an easier time implementing this guidance, with less resistance, as all the implementations are aligned to the singular goal of minimizing your tier-0 surface area.

2.1. PAW Phases 2 and 3: all admins and additional hardening

Get a PAW into the hands of everyone with admin rights to separate their Internet-using personal productivity end user account from their admin credentials.  Even if they’re still crossing tiers at this point in time, there is now some separation from the most common compromise channel.

2.2. Time-bound privileges (no permanent administrators)

If an account has no admin rights, is it still an admin credential?  The least vulnerable administrators are those with admin access to nothing.  We provide tooling in current versions of both AD and Microsoft Identity Manager to deliver this functionality.

2.3. Multi-factor for time-bound elevation

Passwords are no longer a sufficient authentication mechanism for administrative access.  Having to breach a secondary channel significantly increases the attackers’ costs.

Also have a look at some of our current password guidance.

2.4. Just Enough Admin (JEA) for DC Maintenance

Allowing junior or delegated Admins to perform approved tasks, instead of having to make them full admins, further reduces the tier-0 surface area.  You can even consider delegating access to yourself for common actions you perform all the time, fully eliminating work tasks that require the use of a tier-0 credential.

2.5. Lower attack surface of Domain and DCs

This is where all the up-front work of understanding and defining your tier boundaries pays off in spades.  When you reach this step, no one should be surprised about what you intend to do.  If you’ve decided to keep tier-0 small and are isolating the security infrastructure management from the general Enterprise management, everyone has already agreed to that.  If you’ve decided that you must keep some of those systems as tier-0, you’ve hardened them like they are DCs and have elevated the maturity of those admins to treat their credentials like the tier-0 assets they are.

2.6. Attack Detection

Seeing Advanced Threat Analytics (ATA) in action, and providing visibility into exactly what your DCs are doing, will likely be an eye-opening revelation for most environments.  Consider this your purpose-built Identity SIEM instead of simply being a dumping ground for events in general.

And, while not officially on the roadmap at this time, if you have SCOM, take a look at the great work some of our engineers have put into the Security Monitoring Management Pack.

3.1. Modernize Roles and Delegation Model

This goes together with lowering the attack surface of the Domain and DCs.  You can’t accomplish that reduction without providing alternate roles and delegations that don’t require tier-0 credentials.  You should be trying to scope tier-0 AD admin activity to actions like patching the OS and promoting new DCs.  If someone isn’t performing a task along those lines, they likely are not tier-0 admins and should instead be delegated rights to perform the activity and not be Domain Admin.

3.2. Smartcard or Passport Authentication for all admins

More of the same advice that you need to start eliminating passwords from your admins.

3.3. Admin Forest for Active Directory administrators

I’m sure your AD environment is perfectly managed.  All the legacy protocols have been disabled, you have control over every account (human or service) that has admin rights on any DC.  In essence, you’ve already been doing everything is the roadmap.


Your environment doesn’t look like that?

Sometimes it’s easier to admit that it’s going to be too difficult to regain administrative control over the current Enterprise forest.  Instead, you can implement a new, pristine environment right out of the box and shift your administrative control to this forest.  Your current Enterprise forest is left mostly alone due to all the app-compat concerns that go along with everything that’s been tied to AD.  We have lots of guidance and implementation services to help make sure you build this new forest right and ensure it’s only used for administration purposes.  That way you can turn on all the new security features to protect your admins without fear of breaking the old app running in some forgotten closet.

3.4. Code Integrity Policy for DCs (Server 2016)

Your DCs should be your most controlled, purpose-built servers in your environment.  Creating a policy that locks them down to exactly what you intended helps keep tier-0 from expanding as your DCs can’t just start running new code that isn’t already part of their manifest.

3.5. Shielded VMs for virtual DCs (Server 2016 Hyper-V Fabric)

I remember the first time I saw a VM POST and realized what a game-changer virtualization was going to be.  Unfortunately, it also made walking out the door with a fully running DC as easy as copy/paste.  With Shielded VMs you can now enforce boundaries between your Virtualization Admins and your AD Admins.  You can allow your virtualization services to operate at tier-1 while being able to security host tier-0 assets without violating the integrity of the tier boundary.  Can you say “Game changer”?

Don’t Neglect the Other Tiers

While this series focused on tier-0, the methodology of tackling the problem extends to the other tiers as well.  This exercise was fundamentally about segmentation of administrative control.  What we’ve seen, is that over the years, unintentional administrative control gets granted and then becomes an avenue for attack.  Be especially on the lookout for service accounts that are local admin on lots of systems and understand how those credentials are used and if they are present on those endpoints in a manner that allows them to be reused for lateral movement.  If you’ve gone through the effort to secure tier-0 but you have vulnerable credentials with standing admin access to all of tier-1, where your business-critical data is stored, you probably haven’t moved the needle as much as you need to.  Ideally you get to the point where the compromise of a single workstation or a single server is contained to that system and doesn’t escalate into a compromise of most of the environment.

I know this has been a lot of guidance over these two posts.  Even if you can’t do everything, I know you can do something to improve your environment.  Hopefully I provided some new insight into how you can make your environment more secure than it is currently and exposed you to the volumes of guidance in the SPA roadmap.  Now get out there and start figuring out where your tier-0 boundary is and where you want it to be!

Thanks for spending a little bit of your time with me.



Tuesday, September 12, 2017

Securing Privileged Access for the AD Admin – Part 1

Cross Post from

Hello again, my name is still David Loder, and I’m still a PFE out of Detroit, Michigan.  I have a new confession to make.  I like cat videos.  Your end users like cat videos.  You may like cat videos yourself.  Microsoft will even help you find cat videos.  Unfortunately, cat videos may have it out for you and your environment.  How do you keep your environment secure when malicious cat videos are out there, waiting to pounce?

Microsoft has a significant amount of published guidance around Securing Privileged Access (SPA), Privileged Access Workstations and the Administrative Tier Model.  My fellow PFEs have also contributed their own great thoughts around these topics.  Go browse through our Security tagged posts to get easy access to them.  As for myself, I was staff IT in the security department for a large, global corporation, prior to joining Microsoft, where we operated in a tiered administrative model and had implemented many, though not all, of the defenses highlighted in the SPA roadmap.  So I’d like to share my perspective on the items in the roadmap and the practical implications from an Active Directory Administrator point of view.

But first a caveat for this series of articles.  I love the SPA roadmap.  I espouse its virtues to all my customers and anyone else who will listen.  But there are times where the SPA roadmap takes a big step, and I know it can sometimes be difficult to get the people in charge to agree to a big step.  In all the cases where I point this out it is possible to take a smaller step by limiting the scope by focusing solely on AD.  I have a different purpose for this series of articles than the SPA roadmap itself.  I want you to actually implement the guidance.  That’s a shocking statement, I know.  Despite all the guidance, I still walk into environments that haven’t implemented a single piece of this guidance.  Maybe they don’t know this guidance exists.  Maybe they think they aren’t a target.  Maybe they think the guidance doesn’t apply to them.  My hope with this series is that a few more people know about the guidance, understand why they should care and have an easier time convincing others in their organization that the roadmap guidance should be implemented.  Security is a journey.  Through no fault of your own, the rules have changed.  What you did to secure your environment yesterday is no longer sufficient for today’s reality.  So, let’s get started.

Separate Admin Account for Admin Tasks

This is an easy one, right?  Nothing about this guidance is new.  It ranks right up there with not browsing the Internet from a server.  But I am constantly seeing environments where normal user accounts, which have a mailbox and browse the Internet for cat videos, are also in the Domain Admins group.  Stop this.  Stop this now.  You need a separate credential for administrative tasks.  Come up with a naming convention and a process to get an admin account for anyone who does admin work.

I know some of you are smiling and thinking to yourself ‘of course we do this; the admins get their ADM_username accounts for performing admin work’ (or their $username or their username.admin or whatever convention you use).  But, have you made the correlation between tiering and admin accounts?  To fully implement the guidance, a user with admin rights must have a separate admin account per tier!

Let that sink in for a minute. In a three-tier model, the AD Admins may require four separate credentials: user (non-privileged), tier-2 (workstation) admin, tier-1 (server) admin and tier-0 (security infrastructure) admin. This guidance is designed to avoid having a credential that has admin rights in multiple tiers. This helps prevent a pass-the-hash attack from elevating from a lower tier to a higher tier.

Now for the practical part.  Yes, this gets hard to do.  You may have processes in place that will get a second credential to admin users, but it wasn’t designed to get them four.  Maybe you have clear separation between server admins and workstation admins, so no one will need all four.  We want the guidance to be actionable and most importantly to protect tier-0.  Guidance that isn’t followed because it is too burdensome isn’t valuable. At a minimum, your AD Admins should have three accounts: user, admin, tier-0 admin.  And your goal is to minimize the scope of tier-0.  Tier-0 admin accounts should only be managed by other tier-0 admin accounts and not by a tier-1 system.  Please don’t have your normal Identity Management (IdM) system try to manage AD Admin accounts.  Because you’ll either fight with AdminSDHolder or you’ll have to grant your IdM system Domain Admin rights and neither of those is a good choice.

Let’s discuss the tiers for a moment.  What does tier-0 really mean?  The definition from the Administrative Tier Model is:

Tier 0 – Direct Control of enterprise identities in the environment. Tier 0 includes accounts, groups, and other assets that have direct or indirect administrative control of the Active Directory forest, domains, or domain controllers, and all the assets in it. The security sensitivity of all tier 0 assets is equivalent as they are all effectively in control of each other.

Control of a tier-0 system means control of the entire environment.  The very nature of Active Directory means there should be at least two tiers in the environment: AD itself, and everything else.  Splitting between tiers isn’t a hard and fast line.  The tiering is there to provide a security boundary that is supposed to be difficult to cross.  You can certainly have user workstations that might need to be treated more like a tier-1 system because of the value they hold.  The point is that your organization must decide which security boundaries should exist that define the tiers and the systems contained within those tiers.  This is especially true of tier-0.

At a minimum tier-0 will contain Active Directory; specifically, the writeable Domain Controllers and the AD Admin credentials.  Those credentials are any account that is a member of Domain Admins, Enterprise Admins, Builtin Administrators, etc.  These groups are all equivalent.  Don’t think being Builtin Administrator is somehow more secure or different than being Domain Admin.

What else is tier-0? Look in your AD Admin groups.  Every account in them is a tier-0 credential.  Ideally, they are credentials only for people and they are unique to the management of AD infrastructure, following a naming convention that distinguishes them from your normal tier-1 admin accounts.  In other words, the tier-0 credentials that are members of the AD Admin groups must be used for the sole purpose of managing AD infrastructure and for nothing else.

If you have service accounts in your AD Admin groups, those service accounts are tier-0 credentials.  The servers where those service accounts are used are tier-0 systems.  Anyone who is administrator on those servers has access to tier-0 credentials.  Do you see how quickly this grows?  While you may normally think of just AD as being tier-0, your tier-0 equivalency may be immense.  In fact, you may not have a tier-1 or tier-2 layer at all.  It is possible that you are operating an environment where everything is tier-0.

I will state again; the goal is to minimize tier-0.  Your AD Admin groups should only have people in them, not service accounts.  Use the delegation abilities within AD to grant those service accounts only the rights they need.  Yes, it may be hard work to figure out what and where those rights are needed, but it’s the job that needs to be done to keep things that should be tier-1 from being tier-0.

What else is tier-0? Are your DCs virtualized? If so your VM admins are tier-0 admins.  Your VM platform is a tier-0 system. Your VM storage is a tier-0 system. Your storage admins are tier-0 admins. Do you see how quickly this grows? Hyper-V in Windows Server 2016 offers Shielded VMs to mitigate this risk.

What else is tier-0? What additional services run on your DCs? Which of those services are listening on the network and running as Local System? Which of them report into some kind of management console to receive instructions on what to do? Does that describe your SIEM agent, your anti-virus agent, your asset management agent, your configuration management agent? Your SIEM team has control over a tier-0 system.  Your SIEM is a tier-0 system.  Your AV platform is a tier-0 system. Your configuration platform is a tier-0 system. Do you have a standard corporate image that you use for all servers, including the servers that you will promote to become Domain Controllers? Everything added to that image has the possibility of being a tier-0 system. Do you see how quickly this grows?

What else is tier-0?  Is your IdM system tier-0? Maybe. By our definition it should be since it has direct control of the enterprise identities.  What if it is only delegated rights to a specific set of OUs and it doesn’t use an AD Admin account to manage the users?  If that system is compromised is tier-0 compromised?  The integrity of the AD infrastructure is still intact.  It may no longer contain the user data you wanted it to contain but you still have administrative control over AD and can more easily recover.  It that a bad day? Absolutely. But you can still point to a security boundary that wasn’t crossed.  A defense in-depth mindset would have more boundaries to cross when possible.

Control over your tier-0 equivalencies is likely the hardest part of the roadmap; which is why it practically shows up later in the roadmap.  But I wanted to discuss it up front, as understanding the true nature of your own tier-0 definition is paramount to being able to have successfully implemented the roadmap at the end of the journey.

Now that we all understand the impact of tier-0 equivalencies, how many credentials in your enterprise (from both humans and service accounts) are tier-0 admins?  Is it 5 or 100?  How many do you want at that tier?  5 or 100?  Personally, I’d vote for 5.  Keep in mind that we’re focusing on credentials.  This shouldn’t be a discussion that we trust, for example, the VM Admin team less than the AD one.  It’s that the more credentials and systems that exist at tier-0, the more surface area we have to consider in an assumed-compromised state.

As a personal story from my previous life many years ago, the first time we had to integrate a non-AD workload into tier-0, we thought the sky was falling and it was the destruction of our security posture because a different team was suddenly involved.  It took me a while to recognize that tier-0 doesn’t exclusively mean AD.  Every organization will have a unique combination of workloads and roles that will be their tier-0, and that’s OK.  What’s important is define the boundary then make every workload and every person in tier-0 operate to the same standard.

Once you and your organization have made your decision about defining your intended tier-0 boundary, go make totally separate admin accounts for those that you want to end up operating at tier-0.  Yes, managing three or four credentials is more difficult than one.  But you’re the AD admin for your enterprise and if you aren’t taking the lead in enabling this change, no one else will do so.  If you already have a separate admin account, but it’s crossing tier boundaries (existing or planned), go get your third credential.  Making use of the third is almost no additional effort beyond a second credential.  Here’s where you have one of those big step/small step decisions to make.  If having separate admin accounts for everyone who does administration in your organization is too big of a change to make all at once, start small with only those admins who manage AD.  Show everyone that the world doesn’t end if you have to manage separate credentials for AD Admin purposes.

Ensure you have proper procedures for creating and managing the new tier-0 admin credentials.  My first preference is to manage them manually, outside of the scope of any IdM platform you have in place, with proper, proactive scheduled reviews. Hopefully you’ve caught on to the hints that managing tier-0 will be easier when it’s small.  That allows manual management of tier-0 credentials to be successful.  If you’re in a more mature organization, then you can look to a dedicated tier-0 IdM system that can manage these credentials.

To summarize this post into a 30 second elevator speech:

1.       Active Directory Domain Controllers are tier-0 systems.

2.       AD Admin credentials are a tier-0 credentials.

3.       Anywhere that tier-0 credentials are used is a tier-0 system.

4.       Anything or anyone that has administrative control over any part of 1, 2 or 3 is also a tier-0 credential/system.

5.       Keeping 1, 2, 3 and 4 small makes tier-0 easier to manage and more secure.

That’s it for now.  The first step down the roadmap is both incredibly simple and incredibly hard at the same time.  I want to give you a break to allow the full impact of the guidance to soak in.  Check back in next week, where we’ll continue our discussion of the roadmap.  But please, go create your separate AD Admin account right now.  I shudder to think you’ve been reading this with a browser running under AD Admin credentials, with your cat videos playing in another tab.