VM Snapshots — They can be a problem, but VVols in vSphere 6 can help
Snapshots in VMware have been an invaluable tool for years. The ability to create an application consistent point-in time snapshot of a virtual machine has significant OPEX (or DevOps — take your pick) benefits. It can be used as an “undo” button for upgrades, it can facilitate clones and some replication solutions, but perhaps most commonly it is used to facilitate backups of virtual machines.
For me snapshots have always been a love-hate thing. Wonderful feature but in some cases they’ve caused a lot of pain and disruption. In my vSphere 6 What’s New post I talk a bit about vVols — what VVols mean for snapshots just might be one of the least known and discussed features.
Here’s the issue. A snap is created, the backup runs, and then the snap is closed. It is this closing of the snap where the VM can become “stunned” for significant lengths of time. I’ve seen this become an issue for highly transactional servers ranging from web servers, databases and email mail systems as well.
There’s even a VMware KB article that discusses this problem:
So what’s happening here? Think of this this way.
First the snap is opened. From this point forward writes are not committed to the base virtual disk (VMDK) but a child VMDK. The more writes that occur while the snap is open (often how long the backup takes), the greater the size of this child VMDK which will have to be consolidated.
Above you can see a VMDK with three snapshots open. Writes go to the most recent snapshot, and the live state of the VMDK is actually a real-time calculation across this entire chain. Once I discovered an Exchange server for which the backups were not properly configured — there was a chain of 58 snaps supporting a production Microsoft Exchange server! Yikes!
There’s actually an additional snapshot file that is created for application quiescing (Microsoft VSS) but there’s no need to go into that here. Hopefully you already have an appreciation for how closing a snapshot can be problematic for transactional workloads. These snapshots — and the child VMDKs created for them — need to be written back into the base VMDK.
For many VMs you might be able to backup and run snapshots just fine. But in my experience, just an IIS server creating a steady output of IIS log files, can experience disruption during a snap close event — especially if you are doing a full backup. I’ve watched the snap close process freeze IIS servers to where web transactions are dropped and lost. And for large transactional databases you can just forget about it.
vSphere 6 and VVols
With the new VVol feature in vSphere 6 several things change. First of all the base VMDK is ALWAYS the base VMDK. It is always the write target. The snapshots are now read only reference files that do NOT exist with a chain. When the snap is closed, there’s nothing to ingest back into the base VMDK — it already has it!
This is a huge change from the previous method where writes went into the most recent snap in the chain and would have to be consolidated back into the base VMDK. Now there’s nothing to consolidate when the snap is closed — the base VMDK is always the live state. VMware’s Cormac Hogan has an excellent post on this which goes into far greater detail on how this process works.
But that’s not all. VVols also enable the ability to offload snapshot functions to the array controller. The implementation details may vary among storage vendors, but the whole snapshot process can be offloaded to the storage array itself in some cases, providing instant and non-disruptive snapshots.
This is a huge change from vSphere 5 which should allow for backups and snap close operations on highly transactional servers where this might not have been possible in the past. Impact free snapshot (and backup) operations.
Now in full disclosure I’ve not had the opportunity to work with VVols in production yet, but perhaps you can see why I’m rather excited. Non disruptive snaps and backups for ALL workloads would be a welcome feature.
Do you have any experience with snapshots with vSphere 6 and VVols? Post in the comments below. Go VVols!