Monday, May 24, 2021

Multi-Tenant B2B Sync with MIM Graph Connector

Cross Post from https://techcommunity.microsoft.com/t5/core-infrastructure-and-security/multi-tenant-b2b-sync-with-mim-graph-connector/ba-p/2381682

Hello everyone, this is David Loder again, sporting Microsoft’s new Customer Engineer title, but still a Hybrid Identity engineer from Detroit. Over the past year I’ve seen an uptick in requests from customers looking to modernize their GALSync solution. Either they’re wanting to control use of SharePoint and Teams B2B capabilities or looking to enable a GALSync with a cloud-only organization. And they’re asking for assistance and guidance that I hope to provide today.

Before I get started on how Microsoft Identity Manger (MIM) 2016 can help provide the basis for a supported GALSync solution, I want to ensure everyone knows that Microsoft provides a managed service SaaS offering to help with any multi-tenant syncing scenarios. It’s called Active Directory Synchronization Service (ADSS) and does all the back-end tenant syncing automagically. The great benefit with ADSS is that it’s a fully supported solution.  MIM as a product is fully supported, but as has always been the case, any customizations put into it are best-effort support. Reach out to your account team if you want more information about ADSS. Now on with the MIM discussion.

Historically, Microsoft has provided a supported GALSync solution in our on-premises sync engine, MIM. Documentation on the GALSync configuration was first provided back in the Microsoft Identity Integration Server (MIIS) 2003 timeframe. Additional documentation is available at the GALSync Resources Wiki. Despite its age, that guidance still holds true today. But it is limited to an Active Directory to Active Directory user to contact sync design.

Today, we offer the Microsoft Graph Connector. It provides the ability to connect to an Azure AD tenant, and to manage B2B invitations. However, it is not a drop-in replacement for GALSync. We can get there, but we need to fill in some missing components.

There are many scenarios that Graph Connector could satisfy, and they can get increasingly difficult. For this blog I’ll focus on the simplest scenario and then discuss what considerations one would have in order to move to more complex scenarios.

In a classic GALSync solution, we sync users from a partner AD to become contacts in our home AD. For this Azure AD replacement, we want to sync users from a partner tenant, and make them B2B users in our home tenant. This assumes that you’ve moved past the point where you need identical GALs in both on-premises Exchange and Exchange Online. Most of the customers I work with have gotten to that point. Maybe they still have some service account mailboxes left to move, but all humans who need to view a GAL have been moved to Exchange Online. That simplifies the requirements as there’s no longer a need for creating on-premises objects for Exchange to use.

The first component we need is MIM 2016. If you are an Azure AD Premium customer, MIM is still fully supported and still available for download. Otherwise, MIM is in extended support through 2026. To keep our infrastructure footprint small, this solution will use only a Sync Service install, and will not use the Portal or any declarative provisioning.

With a base MIM install in place we’re almost ready to make a Graph connection to our first tenant. But before we do so, we need to have a discussion of scope, because scope is the major factor in determining the complexity of solution. When talking about scope, I’m going to be very exact in terms of objects and attributes as much of our terminology in this space is vague and subject to perspective to understand meaning.

“We want users from a partner tenant….” Starting with this phrase, we need to break it down to a scoped object and attribute definition. The Graph Connector exposes user and contact (technically orgContact) object classes. We will only want to bring in users from the partner tenant. Except the user object class covers both internal and external (a.k.a. B2B) users. Typically, we only want to bring in the internal users from the partner tenant. We can tell the difference between them because external users have a creationType attribute equal to ‘Invitation’, whereas internal users have a null creationType. The other possible choice one might consider is the userType attribute with values of either ‘Guest’ or ‘Member’. But I think that is a poor choice. Internal users are Member by default and external users are Guest by default, but userType can be changed for both. Guest vs. Member only controls one’s default visibility to certain workloads such as SharePoint or Azure AD itself. Guest is sometimes incorrectly used interchangeably with B2B, but those two terms are not equivalent.

“Make B2B users in our home tenant….” Given the previous discussion, this phrase is now rather easy to scope. We’ll be looking at user objects with creationType=’Invitation’.

With our purposefully simplistic scope defined, let’s build the first Graph connection to the partner tenant. Install the Graph Connector on the MIM system. There have been lots of fixes recently so be sure to use the current version.

Start with creating a new Graph (Microsoft) connector.


Provide registered app credentials to connect to the partner tenant. The app registration needs at least User.Read.All and Directory.Read.All, with Admin consent. This is an example from one of my temporary demo tenants.



On the Schema 1 page, keep the Add objects filter unchecked. Unfortunately, we cannot use the filtering capability to return only the internal users where creationType is null. The Graph API provides more advanced filtering capabilities, but it requires a Header value to be set, which the Graph Connector does not currently expose as a configurable setting.

On the Select Object Types page check user.

On the Select Attributes page, let’s select a minimum number of attributes to enable decent GAL functionality as part of the B2B sync. Additional attributes can be added if the GAL needs to be more fully populated. Select creationType, displayName, givenName, id, mail, showInAddressList, and surname.



The anchor attribute on the Configure Anchors page will automatically be set to id.

On the Configure Connector Filter page, I will keep this example simple by using a declared filter of creationType Is present. This will filter out any external users that may happen to already exist in the partner tenant. But this filtering will come at the expense of increased Delta Sync times due to having to process each filtered disconnector every sync cycle.



For Configure Join and Projection Rules, we’ll join on id first, mail second, otherwise project as a person.



This is the inbound partner tenant user flow, so provide a direct inbound flow for each attribute. Several of the selected attributes are not default metaverse attributes, so the metaverse schema will need to be extended to account for these attributes.



Leave the Configure Deprovisioning page at the default of ‘Make them disconnectors’ and Configure Extensions page will also be left at its default of empty.

Create the Full Import and Full Sync run profiles. Execute them to confirm that the partner tenant users are projected into the metaverse. Also create the Delta Import and Delta Sync run profiles. We won’t use them now, but will need them later. I’ve gotten spoiled from AADC creating run profiles by default.

Now that the inbound side from the partner tenant is complete, let’s create the outbound side for the home tenant. The setup will be similar to the inbound side, but with some minor changes.

The App Registration in the home tenant will require the Directory.Read.All and User.ReadWrite.All permissions. There is a User.Invite.All permission, but since we need to sync GAL attributes after the invite, that permission does not provide enough access for this scenario.

For the Schema 1 page, we’ll need to leave the Add objects filter checkbox uncheck again. Even though we could technically set a graph filter of creationType eq 'Invitation', using a filter breaks Delta Imports for the Graph Connector (with a no-start-ma error). We will have to continue to use MIM filtering the keep the scope correct since Delta Imports are very important for most of my customers.

On the Global Parameters page set the Invite redirect URL to https://myapps.microsoft.com/?tenantid=GUIDValue. Leave the send mail checkbox unchecked unless you want to start automatically spamming all your invitees.



On the Select Attributes page, include userPrincipalName and userType in addition to the list of attributes from the inbound side. We’re selecting UPN just so we can see the full results of the invitation process, not because we’ll be doing any syncing of that attribute.

For the Configure Connector Filter page, we reverse it from the inbound partner tenant setting and use a filter of creationType Is not present.

On the Configure Join and Projection Rules page, only add the Join Rule for mail. There should be no Projection Rule as we want all the external users to project into the metaverse from the inbound partner tenant.

For the Configure Attribute Flow page, add a direct export (allowing nulls) for displayName, givenName, mail, showInAddressList and surname. Add a constant export of Guest for userType. While an external user is typically Guest by default, the Graph Connector defaults to Member, so we need to override that. Also add a constant export of Invitation for creationType. For the creationType, we’re flowing that just to satisfy the MA filter, not that it affects the invitation process.



On the Configure Deprovisioning page, change the selection to Stage a delete on the object for the next export run.

Create and run the Full Import and Full Sync run profiles. If there are any matching mail values for existing external users those should join. Otherwise, the existing external users will show up as disconnectors. Also create the Delta Import, Delta Sync and Export run profiles. We won’t use them now, but will need them later.

Finally, we need a small amount of provisioning code to provision the external users from the metaverse into the home tenant MA. From the Tools > Options… menu check the Enable metaverse rules extension checkbox. Then click the Create Rules Extension Project… button. I’ll provide sample code for Visual C#, so choose that selection and the version of Visual Studio to use to compile the project.

This is a sample implementation for the IMVSynchronization.Provision method.

void IMVSynchronization.Provision (MVEntry mventry)

{

    string container = "OBJECT=user";

    string rdn = "CN=" + Guid.NewGuid().ToString();

    ConnectedMA HomeTenantMA = mventry.ConnectedMAs["HomeTenant"];

    ReferenceValue dn = HomeTenantMA.EscapeDNComponent(rdn).Concat(container);

    int numConnectors = HomeTenantMA.Connectors.Count;

 

    // If there is no connector present, create a new connector.

    if (numConnectors == 0)

    {

        CSEntry csentry = HomeTenantMA.Connectors.StartNewConnector("user");

        csentry.DN = dn;

        csentry["id"].StringValue = Guid.NewGuid().ToString();

        csentry.CommitNewConnector();

    }

    else if (numConnectors == 1)

    {

        //Do nothing, no rename is needed

    }

    else

    {

        throw (new UnexpectedDataException("multiple connectors:" + numConnectors.ToString()));

    }

}     

 

A few things to note in this code. We need the name of the home tenant MA as the connected MA we are managing. We also set a random GUID-based DN and id in order to successfully export the invitation, but those values will be replaced by the real Azure AD values during the first confirming import.

Build the solution in Visual Studio and make sure the extension DLL gets copied to the Microsoft Forefront Identity Manager\2010\Synchronization Service\Extensions folder. Back in the Options dialog, ensure the DLL that was just created is selected for the Rules extension name, and check the Enable Provisioning Rules Extension checkbox.



To begin with a small test, pick a sample user from the partner tenant MA and commit a Full Sync Preview against them. That should generate a pending export in the home tenant MA.



The small piece of magic with the Graph Connector is that if a user has a pending add with mail but no UPN, they will go through the invitation process to make them an external user, rather than being created as an internal user. We can see the pending export with the temporary DN and id, the GALSync attributes we wired up, and our constant userType of Guest. This test user has an OnMicrosoft.com mail address in the partner tenant as I have not added a custom domain to that tenant. The actual mail value is ultimately immaterial so long as it doesn’t already belong to the destination tenant.

Run the Export, followed by a confirming Delta Import.



We see that the user got successfully invited, got its real DN and id and has all the attributes we set. Notice the UPN got automatically set by AAD in the expected format of mail#EXT#@tenant. It also was given a default setting of showInAddressList = false. By default, invited external users are hidden from the GAL.

Complete a second delta sync cycle (Delta Import, Delta Sync, Export) and showInAddressList should get set to its synced value. For this example user, that would be a null value.


After exporting the updated showInAddressList value, we can confirm that our GALSync is functional. Log in to Outlook on the web in the home tenant, open the People app and select the All Users GAL. We should see our newly synced user present in the GAL.


Finally, to complete the deprovisioning aspect of the GALSync, configure the Object Deletion Rule for the person object class to delete the metaverse object when the partner tenant connector is disconnected, and set the MA's Deprovisioning action to Stage a delete. This way, a deletion of the user from the partner tenant will cascade a delete of the external B2B user to our home tenant.

That’s the end of the setup for GALSync from a single source to a single destination tenant.

As I alluded to at the beginning, more complex setups are possible. Consider a bi-directional GALSync where the partner tenant also needs the users from our home tenant. One way to keep the architecture simple is to maintain one MIM instance per tenant; we simply duplicate this setup in the opposite direction. This is identical to the AADC architecture where one AADC is needed for each tenant. It allows the provisioning code to know the tenant for which it is responsible, cleanly separates inbound from outbound flows and causes no precedence problems. It also allows the partner to control the app registration which possesses write access into their tenant.

Or consider a full-mesh setup where the tenants are all peers in one org that decided to segment their tenants for some reason. We could design a single MIM solution that manages every tenant. We could do two connectors to each tenant to allow us to separate internal user from external and continue to manage the flows separately. We’d only have to prevent same-tenant provisioning in the provisioning code. I could also see a solution that uses only one connector to each tenant. We could come up with a mechanism to track authority of address spaces, so we know which source tenant is responsible for each user and use that knowledge to then create the external B2B users in the other tenants.

For larger deployments where we might have concern about the number of disconnectors and corresponding delta sync times, there are a few advanced techniques we could implement to alleviate that concern. We could project and terminate objects in the metaverse instead of keeping them as disconnectors. Or we could replace the Graph Connector with a PowerShell Connector and take care of all the Graph logic ourselves, avoiding the scenarios where the Graph Connector has limitations.

Hopefully, this has shed some light on considerations for a modern GALSync solution.

Thanks for spending a little bit of your time with me.

-Dave

 

Disclaimer: The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.


Tuesday, February 16, 2021

MIM account deprovisioning déjà vu

I was doing some research for one of my customers that was having an AD account provisioning problem with MIM. I wanted to make sure that I wasn't introducing a potential deprovisioning problem with my change for them, so I was revisiting the deprovisioning scenarios in my lab.  I hit on this support forum article: FIM 2010 R2 Deprovisioning with Outbound System Scoping Filters (microsoft.com). Reading through that and get to the bottom and see that it was asked and summarized by me. About a decade ago. Totally forgot about that work. No deprovisioning improvements ever came to FIM/MIM. At least AADConnect does a nice job of declarative deprovisioning when falling out of scope of a provisioning rule.

Friday, April 3, 2020

Migrated Code/Scripts From Technet Script Gallery To GitHub

If you did not know already Microsoft will be retiring Technet Script Gallery.  More information here: https://docs.microsoft.com/en-us/teamblog/technet-gallery-retirement



Currently it is in read-only mode, which means you can edit existing contributions as an owner, but you will not be able to add new contributions. In June 2020 Microsoft will shut it down. Therefore if you have your code stored there, make sure to get it and migrate it to GitHub. If you need scripts from it, go and get those scripts before the shutdown.



My scripts are still available in Technet Script Gallery, but I also already migrated them to my new GitHub public repository at https://github.com/dloder0/AAD-Connect-MIM-FIM-Scripts



Friday, January 24, 2020

Getting Reports from Long Running Performance Monitor Data Collector Sets


When trying to track down an elusive Active Directory performance problem, gathering stats using the Active Directory Diagnostics Data Collector Set is the best method for insight as to what the Domain Controller is doing. However, having super busy DCs and not knowing exactly when the problem is going to occur can make capturing the data and generating a useful report harder.  There are also limitations with the Data Collector Sets that we have to take into consideration to come up with a solution that really works.

Let me start with explaining how, and more importantly when, the various features of the Data Collector Sets work.

The first thing to know is that you cannot change the properties of the built-in System Data Collector Sets.  You must create a custom User Defined Data Collector Set to be able to change its behavior. Create a new Data Collector Set by right-clicking the User Defined node and select New > Data Collector Set. Give it a name, create it from an existing template. Use the Active Directory Diagnostics as the template. Finish the wizard.



The DCS has settings that can be used to keep the collection from consuming too much disk capacity. Right-click the new DCS and select Data Manager Settings.  The Resource Policy of Delete Oldest causes older folders to be deleted to keep from growing too large. You can adjust the size or folder count as needed. However, two things have to occur to have the data purging trigger.  First, the “Enable data management and report generation” property has to be selected.  It should be selected by default.  But second, these Data Manager rules don’t trigger until the DCS stops.  That’s an important distinction for the next set of DCS properties.



Now right-click the DCS, select Properties and switch to the Stop Condition tab. The Overall duration, which defaults to 5 minutes, controls when the DCS will automatically stop. Overall duration is the only setting that causes the DCS to stop on its own. The Limits section can be used to force the DCS to start using a new folder with new files for its collection, but setting a limit does not stop the DCS and therefore does not trigger the data purging configuration that is set in the Data Manager section.



The final feature of a DCS to explain is the report feature. Like data purging, report generation also only occurs when the DCS stops. Only the collection folder that was active when the DCS was stopped is used as the input source for the report. If you configure a limit and end up with multiple source folders for a single execution, only the last folder is used. Also, the size of the collected data in the collection folder determines how long the report generation will take.  Larger source data results in longer report generation times. While the report is being generated, all data collection has stopped and that same DCS cannot be restarted. Other than the report name in the Data Manager section, there is no GUI for managing report definitions.

With this explanation for how a DCS works complete, I have a problem I had been trying to solve for a customer of mine. They had an elusive Active Directory performance problem that they couldn’t predict when it would happen, couldn’t cause it to happen on demand and it could be many days before it reoccurred. But they could tell when it had happened and wanted more diagnostic data for what the Domain Controller was doing at that time.

So we needed our DCS to behave with the following characteristics:
  • Continuously capture data over several days without gaps
  • Do not save more data than the DC capacity
  • Generate a report against an identified collection
The first problem to solve is how to collect data without gaps yet still allow purging to run. The solution is to allow the DCS to stop, so purging can happen, but not run a report when it does stop so we don’t waste capture time running a report against a time period that likely didn’t include the event we were trying to capture.  There’s no GUI for the report definition.  It’s located in the XML of the DCS itself. Right-click your custom DCS and select Save Template. Open the XML in notepad and find the ReportSchema node. You’ll see there are nine report definitions that are included. Delete all the Imports except one and change the file to a non-existent filename. Having one invalid Import for the report causes the smallest possible report to be generated, which finishes in a fraction of a second, minimizing the amount of time we’re not collecting data. Having zero Report Imports causes a default set of reports to run, which we want to avoid since they take time to finish. Save the edited content to a new XML file.

That section of XML should now look something like this:
<ReportSchema>
     <Report name="wpdcAdvisor" version="1" threshold="9999">
         <Import file="%systemroot%\pla\reports\NoReport.xml">
         </Import>
     </Report>
</ReportSchema>
With this change to the XML complete, create a new DCS from a template, but this time browse to the edited XML file instead of selecting from the list.

For this new DCS, set the Data Manager purging rules as needed and clear the Overall duration checkbox for the Stop Condition. Now we have a DCS that we can start and when we stop it, it quickly stops, creates a very small, useless report and purges the oldest data. By clearing the Overall duration checkbox, this DCS will run until stopped by another method (manual or scripted). It will not stop on its own.

On my customer’s busy DCs we can’t let the collection run too long, otherwise there’s a chance a report can never be generated so our plan is to stop and start the DCS once an hour. The built in scheduler on the DCS properties isn’t that great so we’ll use Task Manager instead.  We created a small batch file with these commands:
logman.exe stop AD_DCS_NoReport
logman.exe start AD_DCS_NoReport
Then we created a task that ran under the SYSTEM account that ran that batch file once an hour.

After the event reoccurred, we knew what collection set the data should be in, so we needed to generate a report for that particular 1 hour block, which we have to do manually. In each DCS collection folder, you should see four files: Active Directory.etl, AD Registry.xml, NtKernel.etl and Performance Counter.blg. There is a fifth file that we need to generate the report. We will manually create it. Create a new text file in the folder called reportdefinition.txt. In that text file, add the following XML and save it.
<Report name="wpdcAdvisor" version="1" threshold="9999"><Import file="%systemroot%\pla\reports\Report.System.Common.xml"/><Import file="%systemroot%\pla\reports\Report.System.Summary.xml"/><Import file="%systemroot%\pla\reports\Report.System.Performance.xml"/><Import file="%systemroot%\pla\reports\Report.System.CPU.xml"/><Import file="%systemroot%\pla\reports\Report.System.Network.xml"/><Import file="%systemroot%\pla\reports\Report.System.Disk.xml"/><Import file="%systemroot%\pla\reports\Report.System.Memory.xml"/><Import file="%systemroot%\pla\reports\Report.System.Configuration.xml"/><Import file="%systemroot%\pla\Reports\Report.AD.xml"/></Report>
You may recognize that these are the same files that show up in the XML that we edited.

Finally, execute the following command line from within the capture directory you want to use for the report.
tracerpt.exe *.blg *.etl -df reportdefinition.txt -report report.html -f html
If everything went right, you should end up with a normal DCS diagnostic report that you can review which covers the time period from when the event occurred.

As a neat trick, if you need to see more than the top 25 items that the report defaults to, you can run the following command to get full XML output:
tracerpt.exe –lr "Active Directory.etl"
For additional reading on similar issues that led me to this solution, I offer up the Canberry PFE team blog Issues with Perfmon reporting - Turning ETL into HTML, the Directory Services Team blog Are your DCs too busy to be monitored?: AD Data Collector Set solutions for long report compile times or report data deletion and the Core Infrastructure and Security blog Taming Perfmon: Data Collector Sets.



Tuesday, November 12, 2019

Disconnecting Objects with AADConnect Default Filtering


If you're familiar with MIM, you know there exists the capability to disconnect an object from the metaverse to force it to go through the join/provision process again. This is useful when the object was joined to the wrong metaverse object for some reason (like a bad join ruleset or incorrect data at the time of joining) and you want to have it be reassessed like it was a new object. In AADConnect, the disconnect function has been removed.

If you have the ability to change (or get changed) the original AD data, you can leverage the default filtering rules to temporarily disconnect an object. This is the main topic for this blog post.  If you can’t get the original AD data changed, you can follow the process in my original Disconnecting Objects with AADConnect post that shows an AADC-only method.

This feature is kind of hidden, not well documented, and not obvious when you see it.

If you look at the default filtering rules for the In from AD - User Join or In from AD - User Common rules, you’ll see these default scoping filters:


defaultfilter

The filter we’re concerned with is
adminDescription NOTSTARTSWITH User_
For a source object to attach to an inbound rule it must satisfy the conditions in the scoping filter.  In this case, so long as adminDescription does not start with “User_” it will pass the filter and attach to the rule.  AdminDescription is blank on all objects by default so the normal projections and data flows happen.

So if you put a value of “User_<something>” on a user object, it will no longer attach to this rule.  And because In from AD - User Join is our sole default provisioning rule, once an object loses that rule, it is no longer allowed to project into the MetaVerse and becomes a disconnector!

Once disconnected, you can make any other data changes that are needed to retry a join or re-provision.  When ready, clear the adminDescription and the disconnector object will be reevaluated at the next delta sync run like any other new object.

Groups have a similar default filter of adminDescription NOTSTARTSWITH Group_ that can be used to disconnect groups.

I have a customer with a few scenarios where users need to be disconnected, so they enacted workflows to stamp User_Transfer or User_Disable on objects at specific points in their lifecycle.

Now you can easily disconnect objects and reevalute and hopefully not miss the lack of a disconnect button anymore.

Thursday, April 18, 2019

Changes to Ticket-Granting Ticket (TGT) Delegation Across Trusts in Windows Server (PFE edition)

I helped with some content referencing the upcoming May and July 2019 patches that change the default behavior for cross-forest unconstrained delegation. The full post is available at the new TechCommunity home of the AskPFE Core Infrastructure and Security blog.

https://techcommunity.microsoft.com/t5/core-infrastructure-and-security/changes-to-ticket-granting-ticket-tgt-delegation-across-trusts/ba-p/440261

Monday, November 12, 2018

AADConnect Resilient Operations

Content Update Notice: This blog was written before the introduction of the Config Import feature.  Follow that process to easily copy over the custom rules and OU selections from an existing AADC installation.

In my job as an Identity-focused Premier Field Engineer, I get to help many companies on their journey from being solely on-premise to becoming a cloud-first enterprise. But the one thing all of them have in common is that they will be in a hybrid setup for the foreseeable future. Hybrid is the new IT reality. For most Microsoft customers that means Azure AD Connect is your Identity Bridge between on-premise and the Azure Cloud. I have found that not all the customers with which I work have a full understanding of how AADConnect is designed to function or how to ensure their AADConnect infrastructure is running smooth and is resilient to failure.
Resilient operations for AADConnect involves three main topics that I wanted to cover today: Health, High Availability (HA) and Upgrades.

Synchronization Health

The ideal operational state for AADConnect has two characteristics for which we look. The first is that no errors occur during any stage of the sync process. This snapshot from my lab shows entire sync cycles running without error. All the import, delta syncs and exports have a status of success.
clip_image002
Any current version of AADConnect will have the health agent installed. The health agent will report into your Azure AD portal any data sync issues that are occurring. It’s more likely that you are working in the portal than looking at the Sync Manager, so surfacing the errors in the portal should give more visibility to any errors that exist. You need to work on clearing out any errors that exist. I’ve seen too many environments where errors are allowed to persist, and the companies wonder why they have account problems in Azure AD. Having zero errors will also make future upgrades and changes easier. We’ll see why later.
The second characteristic we want to see is AADConnect having zero disconnectors in each of the connectors. Those of you not familiar with how the sync engine works in AADConnect (or in MIM) may not know what a disconnector is. The simple answer is that a disconnector is an object in the connected directory that is in scope, yet not being managed by the sync engine. The more complete answer is described in the architecture document.
Having zero disconnectors on your Azure AD connector means that every object in Azure AD is being actively managed by the sync engine.
Disconnectors are reported during the Delta Sync phase for the connector.
clip_image004
This shows that I currently have one disconnector in Azure AD. Disconnectors in Azure AD are especially troublesome as it means nothing is managing that object in Azure AD. It will never get changed or deleted by AADConnect.
To figure out what the disconnector is we need to run the command line tool CSExport to export the disconnectors.
The syntax to use for exporting disconnectors is csexport.exe MAName ExportFileName /f:s /o:d
As an example in my lab, to get the disconnectors for the Azure AD connector I run
C:\>"C:\Program Files\Microsoft Azure AD Sync\Bin\csexport.exe" "loderdom.onmicrosoft.com - AAD" c:\temp\aaddisconnectors.xml /f:s /o:d

Microsoft Identity Integration Server Connector Space Export Utility v1.1.882.0

c 2015 Microsoft Corporation. All rights reserved

[1/1]

Successfully exported connector space to file 'c:\temp\aaddisconnectors.xml'.

This will output an XML file with the details of every disconnector. In my case, I had a leftover device from testing hybrid Azure AD join. After deleting the device from Azure AD, my disconnector count returned to zero.
You need to work your way through the XML document and figure out why each object is disconnected. Should it have been connected to something in Active Directory and isn’t? Is it an orphaned object in Azure AD that needs to be deleted? The goal is to have zero disconnectors for the Azure AD connector.
For any Active Directory connectors you have, the goal is also to have zero disconnectors, but it’s more difficult to achieve a value that is absolutely zero. This is because any container-type objects (such as the OUs that contain the users) get reported as disconnectors. So realistically, we’re looking to achieve a count of disconnectors that is low and static so that we can tell if the number of disconnectors has changed.
To minimize the number of disconnectors that occur because of the OUs, you can run through the configuration wizard and only select the OUs with users and computers that you care about, unchecking the OUs that aren’t needed.
A word of caution here. Don’t just unselect random OUs without being absolutely certain they aren’t needed. If an OU is deselected and it contained objects that were being synched into Azure AD, those objects will be deprovisioned from Azure AD. You can mitigate this risk by using a staging server to test a new configuration change before the change goes live. We’ll talk more about staging mode later.
If your AD setup is such that you have a small and static number of OUs selected, you can ideally end up with a disconnector count around 10 or so. Know what that number is for your environment, so that if it changes that means you have a new disconnector that should be reviewed and remediated. If your AD setup has lots of OUs that need to be included, and the number of OUs keeps changing (maybe you have an OU per site or department and those change frequently), you can create a custom inbound rule that will project the OUs into the metaverse. This changes the OUs into connectors and returns you to a state where the number of disconnectors should be almost zero. See Appendix A for how to create the necessary rule for connecting OUs.
In summary for this section, an AADConnect server with zero errors and zero disconnectors means we are running a well-managed environment that has no data problems affecting the sync operations of AADConnect.

High Availability: Using a Staging Server

The good news for AADConnect is that the sync engine itself is not involved with any run-time workloads for your users, reducing our HA requirements. You could shut off the AADConnect sync service and your users would still have access to all their Azure and Office 365 resources. Changes to user data (adds, deletes, changes) won’t happen, but that doesn’t affect availability in the short-term. However, depending on which sign-on option you are using, there may be additional considerations. If you are performing Password Hash Sync, that sync process runs on its own every two minutes. Users could be impacted if they change their AD password and AADConnect isn’t running; there will be a mismatch between their cloud password and their on-premise AD password. If you are performing Pass-through Authentication the first agent is installed on the AADConnect server; you need to install additional agents on other servers to provide HA for that feature. If you have configured Password Writeback then the AADConnect service needs to be running for the Service Bus channel to be open. Finally, if you use ADFS, HA designs for federated sign-on are out-of-scope for this AADConnect discussion.
Accordingly, we need some measure of HA to keep the Azure AD data from becoming too stale. “Too stale” is a relative term. For a small environment with few changes that may mean you can run for weeks without AADConnect running and not experience any issues. For larger environments, you may not want to have AADConnect be down for more than a few hours.
The major HA design model for AADConnect is to have a second, fully independent installation of AADConnect that runs in staging mode. This server will see all the same data changes that happen in AD, is configured with the same rule sets as the active server and validates that the changes it expects to see in Azure AD were actually made in Azure AD by the active server. The general concept is that if both servers see the same input, and have the same rules, they will both independently arrive at the same output result. The Operation tasks topic goes into detail as to how to set up the staging server, but it neglects to cover how to realistically get the same input, the same rules and the same output on both sides. All three are problems I’ve seen in the field for implementations that have some measure of customization in the rules and, shall we say, less than pristine data sources.
Let’s start with making sure we have the same rules on both the active and staging server. To get your customizations on the staging server, you can either create them all by hand via the GUI or leverage the export capability from the active server. When you select one or more rules in the Rules Editor and select Export, you’ll get a PowerShell script generated that can almost be used to create the same rule on the staging server. The main problem with the generated PowerShell script is that it uses GUIDs to represent the connectors, and those GUIDs are unique to the AADConnect server on which they were created. The same connector will have a different GUID between the active and staging servers. But if you manually adjust the Connector GUID you’ll be able to run the script to recreate the custom rule. The PowerShell script is designed to create a new rule, not apply customizations to an existing rule. This means it’s a good reason to follow the GUI guidance when customizing a built-in rule to disable the built-in rule and create a new one to hold the customizations.
Now that we’ve attempted to configure the same rules on both sides, how do we confirm that we successfully accomplished that task? I have a PowerShell script in GitHub called Reformat AADConnect Sync Rule Export for Easy Comparisons that will use the exported PowerShell scripts and an export of the connector configuration to generate a version of the creation scripts that is designed for comparison using WinDiff or similar comparison tool. Each rule is rewritten into its own file, making it much easier to perform a rule-by-rule comparison without needing to export each rule one at a time.
When looking at the results of the comparison, you will generally find two categories of differences even when all the rules appear to have been duplicated appropriately. The first category is made up of the internal Identifier GUIDs of the rules themselves. Like the Connector GUIDs, each rule has an internal GUID that is unique per server. This difference can be ignored. The second category you’re likely to find is due to newer versions of AADConnect having different default rules than earlier versions. Newer rules usually add in new attribute flows, change calculations or introduce entirely new rules. When this occurs the precedence number for each rule can also change, resulting in more differences showing up when comparing the two different rule sets. But, after you’ve looked at a few of the difference details, you’ll notice the pattern and can quickly complete the validation that no other unexpected differences show up in the report.
Here’s an example WinDiff comparison between an active server and a staging server that was upgraded with a more recent build. We’re looking specifically at the In from AD – Group Common rule.
clip_image006
You can see the differences in the GUIDs (lines 3 and 11). Because this upgrade also has a new data flow, you can see additional changes to the ImmutableTag (line 14) and the new data flows (new section starting at line 170).
Please keep in mind that this comparison task is really designed to help make sure your own customizations were successfully duplicated. If there has been a change to one of the default rules in a newer version of AADConnect, like in this example, do not attempt to revert the rule back to a prior version. The newest default rules are almost always correct and represent the best default data flow for Azure AD that is currently available. But knowing that a default rule has changed is important for the following steps of getting a new staging server ready.
Next, we will want to confirm that the input data on the staging server is the same as on the active server. Going through the configuration wizard you can manually validate that the same OUs have been selected on both sides. Or you can use an export of the Connector configurations and compare the saved-ma-configuration/ma-data/ma-partition-data/partition/filter/containers sections of the XML documents. To export a Connector configuration, open the Synchronization Service GUI on the AADConnect server, navigate to the Connectors tab, highlight a connector and select Export Connector from the Actions menu. Provide a File name and click Save.
Finally, and most importantly, is to validate that the same output results are occurring on both the active and staging servers. This is done by examining the export queues on the staging server. When in staging mode the one and only difference from active mode is that no export steps are executed.
Let’s have a quick discussion on how export data is created. In AADConnect, when a synchronization step is executed, the inbound rules from that specific connector and all outbound rules for all connectors are executed against each object being synced. The results of the sync task are compared to AADConnect’s current view of the connected data source (from the last time the object was imported) and any differences in data flows due to changed input values or different rules are computed and stored in the export queue. When the export step happens, the pending exports are written to the connected data source. However, because we’re in staging mode, the export step doesn’t happen and all the changes that AADConnect wants to make are nicely kept in the export queue for us to review.
Now that we know how the exports are constructed, we can see what happens in the ideal case. In the ideal case, the staging server has all the same input data from AD as the active server; it has the exact same rules; and it calculated the exact same final state of the objects as they already exist in Azure AD. Therefore, it has no changes that it wants to apply to Azure AD, so the export queue is empty. When the export queue is empty, the staging server is ideally situated to take over for the active server as it will immediately pick up from where the active server left off.
Unfortunately, the production systems I usually see are far from ideal. They have errors that are present, they have disconnectors that are present, and they have high volumes of changes being processed through them. This means the export queue on the staging server is almost never empty. Therefore, we need to analyze the export report to figure out why the staging server thinks that it has data that it wants to change.
To improve our analysis ability, we need to make sure we’ve accomplished the health status items we talked about earlier. We need to eliminate errors and minimize the disconnectors in the active server. Then we can go about trying to remove the noise that is due to transient data changes that are flowing though the active server and eventually seen by the staging server. To do this we can compare the pending exports between the active server and the staging server at two different points in time. By putting a delay in the comparison, we can ensure that the staging server has seen the changes that the active server was in process of making. The first pass generates a list of potential changes that we care about. If those same changes are still present a few hours later, after several sync cycles have run, then we can be sure the changes are not in the queue simply due to an in-process change.
We end up needing a total of four exports for each Connector to run this comparison. That’s one each from the active and staging servers to start with, then a second set of exports from both a few hours later after some sync cycles have run. Generating an export file is a two-step process. We first look at the pending exports by returning to csexport.exe with syntax of csexport.exe MAName ExportFileName /f:x. After generating the XML document of the pending exports, we convert that to CSV using the CSExportAnalyzer with syntax like CSExportAnalyzer ExportFileName.xml > %temp%\export.csv
In my lab, one instance of those commands looks like this:
C:\Program Files\Microsoft Azure AD Sync\Bin>csexport.exe "loderdom.onmicrosoft.com - AAD" c:\temp\aadexport.xml /f:x

Microsoft Identity Integration Server Connector Space Export Utility v1.2.65.0

c 2015 Microsoft Corporation. All rights reserved

[1/1]

Successfully exported connector space to file 'c:\temp\aadexport.xml'.

C:\Program Files\Microsoft Azure AD Sync\Bin>CSExportAnalyzer.exe c:\temp\aadexport.xml > c:\temp\aadexport.csv

Then I have a second script in GitHub called Compare AADConnect Pending Exports Between Active and Staging Servers that compares these pending export CSV files looking for entries that never successfully get exported. On the active server, exports that never complete are usually due to errors that should be showing up in the error conditions we previously discussed. On the staging server, exports that are present in both sets of reports show the final set of data that the staging server will change when it is reconfigured to become the active server.
From this final report, if there are unexpected exports sitting in the export queue of the staging server, we know they are due to one of the two things already discussed. Changes to the rules can result in a significant amount of new data being staged for export to Azure AD, especially when the updated rule is flowing new default data that wasn’t being exported in previous versions of AADConnect. Changes to the input data can result in different sets of objects being prepared for export.
While the report will tell us what is different, it doesn’t directly tell us why. Fully analyzing the system to understand why the data is different is beyond the scope of this blog post. But hopefully we’ve produced enough useful artifacts to help you discover the source.
In summary for this section, a staging server is used to provide HA capabilities to AADConnect but building a staging server and getting it properly configured can be difficult. I’ve highlighted the main data elements we need to be managing to ensure the staging server has good data on which to operate. And I’ve laid out a methodology for comparing a staging server to an active server, so we can be aware of where differences occur. Finally, we have a way to validate the changes the staging server wants to make to the connected systems before cutting over and before any changes are made.

Upgrades

There are two ways to manage AADConnect upgrades. The first is to simply perform an in-place upgrade on the existing active server by running the new installer. It will make whatever changes it wants to make to the rules and will export the results to Azure AD. You can even automate the upgrade process by enabling the Automatic Upgrade feature. My recommendation for automatic upgrades is to use this method only if you’ve done no customizations to the rules.
The second way is to make use of the staging server for a swing migration. Operationally, it will be identical to the process we already went through to build out the staging server in the first place. Run the upgrade first on the staging server. Perform the comparisons to ensure no unexpected changes were introduced in the upgrade process. If you’re comfortable with the data sitting in the export queue, proceed with putting the current active server into staging mode, then put the new staging server into active mode and it will start exporting the changes it was holding in its export queue. If all the changes processed without causing any errors, then you can upgrade the first server to bring it up to the same level as the newly active server. Now you’ve fully swapped between the active and staging servers.
I hope you’ve found this discussion useful and have a better understanding of how to manage your AADConnect infrastructure to provide the best possible foundation for your hybrid Azure experience.

Appendix A: Connecting OUs

First, we need an object type in the metaverse to represent OUs. In the Sync Manager, go to the Metaverse Designer, create a new object and call it organizationalUnit. For the attributes, include displayName and sourceAnchorBinary.
clip_image008
In the Rules Editor, create a new Inbound rule. The connector is your AD forest, the precedent must be a unique, non-zero number (I picked 90). The object type is organizationalUnit. The Link Type is Provision. For the Join rules, include objectGUID = sourceAnchorBinary. For the transformations, include direct objectGUID to sourceAnchorBinary and direct dn to displayName.
clip_image010
clip_image012
clip_image014

Updated 2020.04.03 moved links to the GitHub Repository