Can your VM be restored? VMware and VSS — Part 1
The backup job for your VM completed successfully so the backup is good, right? Unfortunately it’s not that simple and a failure to effectively deal with VM backups can result in data loss and perhaps even legal consequences.
We will take a look at several issues, including VSS integration with VMware, covering several known issues along the way.
VSS and Application Consistency
The first concept to understand is application consistency. When a snapshot is taken it of a VM, it will freeze all disk activity to the point that every file is consistent. But what about complex file structures like databases where the contents of the file are constantly changing?
What if a transaction was being written to a SQL, SharePoint, Active Directory, or an Exchange database but the transaction was in mid-step? What about the registry and other elements of the System State? You can end up losing data, and in some cases have the tables and indexes so corrupt that the database won’t even load.
Microsoft introduced a solution in Windows 2003 called VSS (Volume Shadow Copy Service) which is described here in a Technet article. Microsoft applications including, Active Directory, SQL, Exchange and Sharepoint integrate with VSS so that Windows can ensure that the applications are in a transactionally-consistent state for backups. Several 3rd party applications such as Oracle also provide VSS support for their databases.
VMware Tools and VSS Support
In earlier versions of VMware, the VMware Tools package would include a locked file driver (SYNC driver) from Legato (an EMC backup product). This SYNC driver would prevent issues with locked files, so that all files could be quiesced and be consistent at the file level, but not the transaction level. The problem with the SYNC driver is that it caused many problems with Oracle, Exchange and Active Directory databases and even caused data loss in some cases. AD and Exchange both use the Jet database and seemed to have most of the issues and is discussed here in a VMware KB article.
In ESX 3.5 Update 2, VMWare introduced VSS Support. Now when ever a snapshot was taken, it would instruct VMware Tools to invoke Microsoft’s VSS function inside the VM as a part of the snapshot. This is leveraged whether your backup solution uses the legacy VCB system (no longer available in vSphere 4.1) or the newer vStorage API. So as long as your application is VSS aware, you should be good right? Unfortunately it’s not quite that simple.
Many (like me) made the assumption that once VMware Tools is upgraded within the VM that the SYNC driver would be gone and VSS would be enabled. Imagine my concern when I saw AD databases going offline when backups were triggered because the SYNC driver was still present. I shared this with Duncan at Yellow Bricks who wrote a post on this issue here.
Basically the burden here is on the operator to first discover if the SYNC driver or VSS integration is being used. One quick check is to use Device Manager and look for the SYNC driver there (show hidden devices). If the SYNC driver is present, your backups are probably using this and not VSS.
One important issue I want to be clear on: If VMware Tools was first installed with a version of VMware Tools at 3.5 U2 or later you should be fine. However if the VM started with an earlier version of VMware tools, you are likely at risk for this issue.
Once you’ve found a VM where the SYNC driver is being used, here is one way to remediate this and switch to VSS:
- Run the VMWare tools installer in interactive mode. If you are not at the current version you will have to upgrade first.
- Select the “Modify” option as shown below
- If present, deselect the SYNC driver so that it has a red X next to it, and select VSS Support which is highlighted in the image below.
Once you are done with this, backups triggered by VCB or the vStorage API should now properly invoke the VSS support.
It would be ideal if there was a Powershell/WMI script that could query for the existence of any VM’s that have this problem. I haven’t taken the time to look into this very closely but will hopefully be able to in the near future.
There’s still quite a bit more to know however about how VMware and VSS interact with different operating systems. For example if you are running Exchange or SQL on Windows 2008, it is probably not being quiesced by VSS unless you have taken specific steps. We will explore this and more in detail in Part 2.
Originally posted on July 21, 2010