Can your VM be restored? VSS and VMware — Part 2 (updated)
This post was originally made in July 2010, and has new updates and a new section on Active Directory.
The backup job for your VM completed successfully so the backup is good, right? Unfortunately it’s not that simple and a failure to effectively deal with VM backups can result in data loss and perhaps even legal consequences.
In Part 1 we discussed VSS, why it is important, and how to make sure VMware Tools is configured to leverage VSS. But unfortunately there are different levels of VSS support for different operating systems which needs to be considered. In this post we will first discuss the gaps and then some potential remedies.
Let’s start with a wake up call:
If you have a VM running Exchange, SQL or SharePoint on Windows 2008, and are not running vSphere 4.1, your VM is not being backed up in an application-consistent state unless you have taken specific steps.
“Mr. Backup” at Backup Central brought attention to the VMware gap with Windows backups in a detailed post. In this post I’ll try to summarize this and explore a few things from a different angle.
The above chart needs some explanation. Volume consistency means that the volume is quiesced at the file level, but NOT at the application level. If you’re not quiescing at the application level, you may be unable to restore that application.
Applications also need to be notified that a backup has taken place so that they can truncate their logs. This is where you need to understand your applications. Exchange is especially vulnerable to this as it is highly transactional and logs are only truncated during backups. Some SQL databases may run in simple recovery mode and/or have stored procedures which will either backup or truncate the logs. But if a SQL database is running in full recovery mode, with no process in place to truncate the logs, it will eventually fill up the disk and bring everything to a screeching halt.
vSphere 4.1 corrects the gap with Windows 2008 application quiescing, as noted in the “What’s new in vSphere 4.1” notes:
VADP now offers VSS quiescing support for Windows Server 2008 and Windows Server 2008 R2 servers. This enables application-consistent backup and restore operations for Windows Server 2008 and Windows Server 2008 R2 applications.
UPDATE: Specific steps are needed to support application quiescing on Windows 2008 VM’s created in vSphere 4.0 and earlier. Read this post for details.
Now that we understand the gaps, let’s take a look at some remedies.
Upgrade to vSphere 4.1
This is one way to fix the gap with applications not being quiesced in Windows 2008. At the time of this writing it is not clear however, if this includes the ability to notify apps of the backup so that they can truncate their logs (stay tuned).
Install a helper agent inside the VM
Both Veeam Backup and Quest (Vizioncore) vRanger Pro provide an additional VSS agent that can be installed in a VM. Each of these agents provides full support for both application quiescing AND notifying the application that a backup has taken place.
If you are using the current version of either Veeam Backup or Quest’s vRanger, you just need to install their agent into the VM’s that require application-level integration and configure the backup job appropriately.
One of my favorite ways to solve this problem for databases is to use either Quest Lightspeed or RedGate SQL Backup. These products will back up your SQL databases to highly compressed files that you can keep right on your VM. This means quicker restores, and the backups are automatically captured by volume-level quiescing. You just need to make sure that your backup schedules are synchronized according to your organization’s RTO and RPO objectives, as well as have the proper monitoring in place.
Additionally you can use the built in SQL tools to configure backups (now compressed in SQL 2008 R2) or write a simple stored procedure to truncate the logs (if you don’t need point in time recovery).
UPDATE: A few weeks ago I watched in horror as a SQL Server restore took over 40 hours (they were using “legacy” agent-based backup to tape — yuck!)). Using a VSS-aware VM-level backup to disk would have saved most of these hours. In addition, using a product like RedGate SQL Backup or Quest Lite Speed would have vastly improved database restore time (if necessary). I continue to be amazed that some organizations would leave themselves vulnerable to long restore times when it is easily overcome.
Active Directory Domain Controllers
Some have made the statement that snapshots should not be used on an AD Domain controller, especially for the purposes of restore. There’s several reasons for this, including problems with the SYNC driver (discussed in Part One) which can crash your AD, and restoring a domain controller from a snap is not supported (and for good reason).
Since snaps are used in backups, does that mean you should not back up with snaps? I don’t agree with this interpretation. First having a snap open for only the duration of a backup is OK in my opinion. Second, if VSS integration is enabled with the domain controller, VSS will automatically quiesce the System State which includes the SYSVOL, NTDS.DIT and other elements of Active Directory. And third, I would recommend an additional backup of the System State using NTBACKUP as an additional level of protection. This is basically the same concept as using Quest Lite Speed or RedGate SQL Backup on a SQL server — you let NTBACKUP backup the critical elements to a flat file on your system and that file will be included in the backups. You just need to schedule the timing such that the System State backup file exists on your system at the time of the VM-level backup.
I believe that AD Domain Controllers can be successfully virtualized and that there are significant benefits to doing so — including being able to do offline testing against a “real” AD domain controller. But only use snaps for backups (never revert a DC to a previous snap!) and keep in mind that there are additional considerations to restoring AD objects from a previous state (authoritative restore, etc.).
I haven’t worked on Exchange for some time and thus sometimes I overlook it, but Veeam has a great post here which details how to address both VSS and granular restore with Microsoft Exchange.
This post originally appeared in July 2010. Please also see the post Application Consistent Quiescing in vSphere 4.1 for more details on Windows 2008 VSS support.