Adjusting the Tombstone Lifetime
Posted by Yogi on 29/06/2012
I just had a pretty interesting discussion via a mailing list with some other Active Directory MVPs and some members of the Active Directory Product Group in Redmond.
As we know, there is a new default for the tombstone lifetime in Active Directory. The discussion initiated because there is an article on Technet which is incorrect: http://technet.microsoft.com/en-us/library/cc784932(WS.10).aspx. Currently point 8 states that the tombstone lifetime, if it is <not set>, depends on the version of the Operating System of the first DC in the forest. However this is not correct and the article is already being changed.
If you are not familiar with tombstones, I wrote Some details about Tombstones, Garbage Collection and Whitespace in the AD DB a while ago. Basically, a tombstone is an object which is deleted, however a small part of it is maintained in AD for 60 or 180 days (by default) to make sure that all DCs receive the information that the object needs to be deleted. When the 60 or 180 days are over (this is the tombstone lifetime) every DC will delete the object locally (this is not replicated, the DC simply calculates if “time-of-deletion + tombstone-lifetime < now”, if yes the object is cleaned up. This “cleaning up” is done during garbage collection, which is by default every 12 hours.
The tombstone lifetime therefore is also the limit of the “shelf live” of an backup – if you’d use an backup which is older it would reintroduce objects which were already deleted, so the maximum age of an backup is the same as the tombstone lifetime.
In Windows Server 2003 SP1 Microsoft decided to increase the tombstone lifetime to 180 days, as I wrote in Active Directory Backup? Don’t rush – you’ll get more time. However, in Windows Server 2003 R2 there was a minor slip so this version introduced 60 days again. To clarify, this only changes if you set up a new forest and the value will depend on the level of the operating system of that first DC.
|Operating System of first DC||tombstoneLifetime (days)|
|Windows 2000 Server||60|
|Windows Server 2003 w/o SP||60|
|Windows Server 2003 SP1/2||180|
|Windows Server 2003 R2 (SP1)||60|
|Windows Server 2003 R2 SP2||180|
|Windows Server 2008 and higher||180|
You can verify what your tombstone lifetime is by looking at the Attribute “tombstoneLifetime” of the object cn=directory service,cn=windows,cn=services in the Configuration-Partition.
|dsquery * “cn=directory service,cn=windows nt,cn=services,cn=configuration,dc=” –scope base –attr tombstonelifetime|
If the attribue has an value, tombstone lifetime is that value in days, if it has no value it is 60 days. What changed the default to 180 is the file schema.ini, which is creating the default objects in a new AD. The version of Windows Server 2003 SP1 and higher (see table above) of schema.ini sets simply the value 180 in the attribute tombstoneLifetime.
Is it recommended to adjust the Tombstone-Lifetime to the new default?
Over the years there were many infrastructures who’s DCs didn’t replicate within 60 days, leading to replication issues and lingering objects. There were many cases within Microsoft PSS and I’ve also seen a couple of infrastructures where I had to fix this. Therefore Microsoft decided to raise the default tombstone lifetime to 180 days, which also extends the lifetime of your backup. It is up to your company to decide whether to change the tombstone lifetime to the new default.
In the E-Mail-Thread we were also discussing if there are any issues with changing the tombstone lifetime.
If you lower the tombstone lifetime, there is no issue. The garbage collection process will be a bit more busy (usually it only needs to clean up changes from a 12 hour timeframe 60 or 180 days ago, but if we go down from 180 to 60 garbage collection needs to clean up the changes of 120 days the next time it is running). However this shouldn’t lead to a performance issue, and if you think it’ll be an issue you can stage it (e.g. moving from 180 to 150, waiting at least for replication + 12 hours, then go from 150 to 120 and so on).
However, if you want to raise the tombstone lifetime, e.g. from 60 to 180 to match the new default, there’s one scenario which needs to be considered:
Lets say we have two DCs, DC-Munich and DC-LA (L.A. because that where The Experts Conference will be in April). On DC-Munich we change the tombstoneLifetime from (=60) to 180. When garbage collection runs on DC-Munich it is bored – it already cleaned up all changes from 60 days ago but we instructed it to keep everything now to 180 days, so the next 120 days garbage collection does not need to do anything. However a bit later DC-LA (who hasn’t gotten replication with the new tombstoneLifetime yet) runs garbage collection and cleans up everything which happened in the 12h timespan 60 days ago.
In this scenario, DC-Munich has objects (tombstones) which were cleaned up on DC-LA, leading various detection mechanisms to identify them as lingering objects (repadmin will detect them, as well as various update processes which will prevent you from doing operations like schema updates for the next 120 days). This will resolve after 120 days, however is pretty inconvenient.
To increase tombstoneLifetime in big infrastructures, there is only one valid solution:
- make sure that garbage collection will not run instantly after you changed the attribute, then after changing the attribute force replication and make sure it’s replicated everywhere
- lower the tombstone lifetime before increasing it. e.g. set it to 55 and make sure it has been replicated everywhere, then wait at least 12 hours or ensure that garbage collection was running on all DCs. This ensures that there are no objects which need to be taken care of garbage collection for the next couple days. Then increase the tombstone lifetime to the value you intended, e.g. 180 days. Make sure that replication works and every DC is getting the update in the next few days, and you are on the safe side
Thanks to Jesko who discussed this scenario with me – I was wrong – increasing is always causing trouble with lingering objects. Controlling garbage collection is the only way to go.
I think this scenario is very interesting, so I wanted to share it.