Service Levels and Active Directory Disaster Recovery

Active Directory Anxiety and Service Level Stress

Are you having problems sleeping? Are you waking up each night in a cold sweat, plagued by an unshakable feeling of dread?

If so, have you ever considered it could be linked to that niggling suspicion that everything is not as it should be with your organization’s Active Directory deployment?  You understand that Active Directory is a critical service for your organization on which so much depends, yet you have no tried and tested disaster recovery plan and have certainly never given much thought to service levels.

Now that I have your attention I would like to provide some useful pointers to help you change this and maybe even get you some sleep!

Planning for Active Directory Recovery

The only thing harder than planning for an emergency is explaining why you didn’t.

You may have heard that saying before, and there’s a lot of truth in it.  Many people have learned that truth the hard way.  The statement implies that disaster recovery planning is a dreaded, overly cumbersome and time-consuming task—but if approached correctly, it doesn’t have to be.  Planning for disaster recovery and, more specifically, Active Directory disaster recovery doesn’t need to be overwhelming.

Many forms of Active Directory disasters can damage your business: domain controller failure, accidental bulk deletion or modification of objects and attributes, domain failure or forest wide failure.  If you don’t understand why your organization needs a highly available Active Directory deployment with robust disaster recovery processes, ask yourself this: How much would it cost your business—in terms of money and reputation—if it were to stand still for 12 hours, 24 hours or even more?  There is rarely a convenient and acceptable time for an Active Directory outage.

Here are three vital questions every business must answer when planning backup and recovery strategies for Active Directory:

  1. What is my recovery point objective (RPO)? The RPO is defined as the maximum tolerable period in which data might be lost.  If your RPO is 12 hours, then backups should run at least every 12 hours.
  1. What is my recovery time objective (RTO)? The RTO is the duration of time following a disaster by which data needs to be restored in order to avoid unacceptable consequences associated with a break in service continuity.
  1. What is my backup retention time? In other words, how far back does data need to be kept?  Your backup retention time needs to be less than the tombstone lifetime set in Active Directory.  How far you need to go back can depend on a number of factors such as legal requirements or some other constraint.  Whatever the number is it should be formally agreed.

Your answers to these questions can also legitimize a request for new infrastructure, storage and tools to meet your business’s requirements.

One of the first rules of backup is redundancy, so it’s important that your backup and recovery plan does not have a single point of failure, whether a site, backup storage or a domain controller.  Back up multiple domain controllers from different physical locations, and be sure to have at least one physical domain controller per domain.  You might need to consider which domain controllers will be the control points for recovery, and this might involve reviewing your Active Directory topology and backup placement.

Managing Disaster Recovery Proactively

Disaster recovery plans and processes should never be static and reactive; they should be proactive and involve continuous improvement, testing and refinement processes.  If you’ve never tested your disaster recovery plan, you simply won’t be prepared to manage a real disaster properly.

Compiling and performing a selection of quarterly and yearly tests strengthens your company’s confidence in its ability to recover from disaster.  Regular testing not only validates the technical side of the recovery process, it also confirms the backups work as expected and helps your organization refine other processes you’ll need to follow during a disaster.

Of course, you must monitor all your backups to ensure they are successful and running according to schedule.

Turning Up the Service Level

Once you’ve implemented a tried-and-tested backup and recovery solution that meets your business’s needs, take disaster recovery a step further by building out and agreeing on service levels with your business.  Service levels measure a system’s performance by determining whether—and to what extent—that system meets certain goals.  If your organization has not yet embraced IT service management and service levels, why not be one step ahead of the game?  In the long run your business will be better off.

Discussing and agreeing service levels is a healthy way to approach disaster recovery.  There are questions which need to be asked, and agreement reached to help remove ambiguity around the service and recovery times should disaster strike.  To coin an overly used business cliché it will help ensure everyone’s on the same page with the same expectations.

Should disaster strike then people are aware of expected recovery times.  Providing availability uptime as a percentage for Active Directory is much easier if you have a well thought out, thoroughly tested disaster recovery plan.

Sleeping Well Again

To reduce Active Directory downtime and the related loss of productivity make it your priority this year to build an effective disaster recovery plan and ensure that it is regularly tested, reviewed and refined.  Offering service levels on Active Directory under these circumstances is achievable.  Do these things and it will be good for your business-and your sleep!

If anything in this post resonates with you, or if you would like to discuss this post in more detail, please get in touch.

Get in touch.

Share your IT problem with us, we can help you solve it!