![]() |
![]() |
||
![]() |
One of the essential technical activities in a disaster recovery is rebuilding the server infrastructure – often from backups. Recovery of Microsoft Windows servers has generally been no different – but enhancements to the boot process introduced in Windows 2000 ties a server installation more tightly to the initial hardware configuration. These changes make hardware maintenance, upgrades and server recovery more challenging and lead to a belief that Windows servers were not really recoverable but needed to be rebuilt from scratch – potentially a very time-consuming activity. Migration of the in-house servers from NT4 to Windows 2003 Active Directory provided an unexpected opportunity to explore several (successful) recovery approaches. The migration process from NT4 to Active Directory required that old and new environments be concurrently active. To simplify the process, a small swing server was acquired to host the build of the new environment. The boot drive was cloned and the original drive moved to the swing server for the upgrade. At completion, the new environment would need to be recovered on the main production server to finish the process. Prior to Windows 2000, our experience has been that boot drives could be moved between similar platforms with relative ease—and backups could be cross-restored as long as the HAL type and disk architectures were the same (i.e. SCSI or ISA/EIDE). Windows 2000 introduced plug-n-play device handlers for SCSI and ISA/EIDE disks. As a side-effect of this new feature, Setup writes the hardware signature of the boot controller into the registry. Newer versions of Windows build on this approach. When the new production environment was recovered on the old server hardware, the reboot process experienced a fatal boot error – the BSOD or blue screen of death. A similar exception was encountered when the boot drives were interchanged between the servers – essentially restoring the hardware to its pre-upgrade configurations. A detailed search of the Microsoft Technet site provided a wealth of detail on this new error – and a number of alternate approaches to recovering from this problem. If an installation were physically or logically moved between unlike hardware—by doing a restore, swapping a disk or replacing a motherboard or disk controller, the subsequent reboot attempt would fail with a blue screen of death and the fault code “0x0000007B”. Depending upon the boot firmware settings, the machine could just hard loop through a reboot cycle, flashing the BSOD for a fraction of a second. What is happening is that a check is made after the primary bootstrap is complete, to match the registry key built at install time with the current PlugnPlay id. If they do not match, the machine is crashed. And this check is not mentioned in ‘Windows Internals – 4th edition[1]’ -- in an otherwise detailed walkthrough of the boot process. There are two primary documented approaches in the Microsoft Knowledgebase to recover from this type of error – performing an OS repair or doing a pre-install followed by an NTBackup recovery. NTBackup knows about the registry key trick and has an internal list (another registry key) of key values that must not be overwritten on restore. Backup/restore is clearly now more than just a moving “disk blocks” issue. Performing a Windows repair using the operating system installation disk is brutal and quick. This process fixes up any driver mismatches and corrects the registry key issues by deleting and rebuilding a big chunk of “system32...”. Hot fixes and service packs must be reapplied but everything else about the configuration is left as is. It is not recommended for recovering an active directory domain controller – particularly if there are other active directory servers in the network. But it may be the only choice if recovering from a disk controller or motherboard failure. The second approach is the recommended way of recovering an active directory domain controller—by restoring a full backup over the top of a minimal install of the correct OS version. Use of a network-attached disk array for backups was also helpful. This recovery approach relies on the selective registry restore capability mentioned above. We found this to be very clean, but the post-restore reboot process takes a long, long time. In our very small environment it took about 2 hours from restart until it was possible to logon. Active Directory, Exchange and Sqlserver were recovered flawlessly with neither configuration nor patch level affected. The only issue was a bit of weirdness around network device definitions that was obvious and easily fixed. If time were not an issue, this would be the preferred approach to recovery on potentially dissimilar hardware. Our experience suggests that Windows is not any less recoverable than other platforms – even with dissimilar hardware. Our migration process tested both physical disk exchanges and recovery from backups – and both approaches were successful. The preferred approach in a recovery situation would depend very much on the specific business requirements and budget. It is essential, however, to have offsite copies of the original windows install disks in addition to backups. A shop practice to make local copies of any service packs or other updates to the system drives for potential reapplication at recovery time could also be helpful. And of course, all backups have been tested regularly for readability, integrity and completeness. Technology Strategists, Inc. is a Toronto, Ontario (Canada) based technology-focused management consulting firm established in 1995. Technology Strategists was formed to provide assistance to corporations seeking to better achieve business goals through more effectively utilizing investments in technology. [1] Microsoft Windows Internals, 4th Edition, David Solomon, Mark Russinovich, 2005. |
||
![]() Copyright Technology Strategists, Inc.
|
|||
|
|
|||
| Copyright Technology Strategists, Inc. 2005 |
|
||
Insert Document Here