Why Microsoft?

This is a question that can be explored from many different angles, but I’d like to focus on it from not JUST a virtualization perspective, and not JUST a cloud perspective, and not JUST from my own perspective as a vExpert joining Microsoft, but a more holistic perspective which considers all of this, as well

Top 6 Features of vSphere 6

This changes things. It sounds cliché to say “this is our best release ever” because in a sense the newest release is usually the most evolved.  However as a four year VMware vExpert I do think that there is something special about this one.  This is a much more significant jump than going from 4.x

vSphere 6.0 Public Beta — Sign Up to Learn What’s New

Yesterday, VMware announced the public availability of vSphere 6.0 Beta 2.  I can’t tell you what’s all in it due to the NDA, but you can still register for the beta yourself, read about what’s new and download the code for your home lab. There’s some pretty exciting stuff being added to vSphere 6.0 in

Will VMware Start Selling Hardware? Meet MARVIN

The Register is running a story that VMware is preparing to launch a line of hardware servers.

VMware Pursues SDN With Upcoming NSX Offering

Earlier this week VMware announced VMware NSX – an upcoming offering that takes network virtualization to new levels. NSX appears to be somewhat of a fusion between Nicria’s SDN technology (acquired last year by VMware) and vCloud Network and Security (vCNS – formerly known as vShield App and Edge). Since I already had intentions to

What Really Is Cloud Computing? (Triple-A Cloud)

What is cloud computing?  Ask a consumer, CIO, and salesman and you’ll likely get widely varying responses. The consumer will typically think of the cloud as a hosted service, such as Apple’s iCloud, or uploading pictures to Photobucket, and scores more of like services (just keep in mind that several such services existed before it

Agility Part 2 — The Evolution of Value in the Private Cloud

When an IT project is commissioned it can be backed by a number of different statements such as: “It will reduce our TCO” “This is a strategic initiative” “The ROI is compelling” “There’s funds in the budget” “Our competitors are doing it” Some of these are better reasons than others, but here’s a question.  Imagine a

Stacks, the Vblock and Value — A Chat with EMC’s Chad Sakac

…I reached out to EMC’s Chad Sakac to gain more insights from his perspective on how the various stacks…well…stacked up….

HBA Best Practices with vSphere 4.1 (updated)

KB article 1030265 got my attention — it describes a potential issue in vSphere 4.1 where HBAs and PCI devices can stop responding.  The article currently doesn’t detail the exact circumstances that are known to cause this problem but the workaround does reveal what I think are some good practices for HBAs in general.

To workaround this issue, ensure that you have a minimum of 2 HBAs in each host and that those HBAs are on different IRQs. This can be determined by reviewing /proc/vmware/interrupts.

In my ESX host designs I always preferred to use at least 2 HBAs when possible to eliminate an HBA card as a potential single point of failure.

The other issue that is sometimes overlooked is hardware interrupts.  To ensure the best performance and availability, always make sure different IRQs are used for each HBA.  For example:

cat interrupts |grep  qla2xxx|cut -c270-330
0 <COS irq 19 (PCI level)>, VMK qla2xxx
0  <COS irq 20 (PCI level)>, VMK qla2xxx

UPDATE:  If you have ESXi you will not be able to run the above command.  Setom reported in a comment to this post the following:

It seems that the following command can be used for finding the IRQ in ESXi:

vmkvsitools hwinfo -p

Loot at the 4th column of the output, it shows ISA/irq/Vec. The middle number should be the irq of the device.

Also I found this post at Malaysia Hypervisor which gives more background on the vmkvsitools command.

Another best practice mentioned in the article is to “ensure that you have alarms set to alert you if path redundancy is lost”.  I know of a case where a VMware customer had issues on their SAN.  Long story short, after the initial failure on controller A, no one was aware that the path to this controller was still down.  The next weekend they went to do maintenance on controller B (the only active path) and brought the house down.  I’ll repeat again using the words from the KB article — ensure that you have alarms set to alert you if path redundancy is lost.  The “cannot connect to storage” rule in vCenter 4 should by default include triggers for “degraded storage path redundancy” and more.  Make sure that you are proactively monitoring for these events!

Let Your Fast Zebras Run Free (with a Vblock)

Let Your Fast Zebras Run Free (with a Vblock)

What makes an IT department effective and agile?  How can you let the stars on your team succeed while reducing gridlock and OPEX (operational expenses)?

Jon Katzenback and Zia Khan have an intriguing post at Harvard Business Review on fast zebras.  What’s a fast zebra?  I’ll let them explain:

Mark Wallace, former US Ambassador to the United Nations, has a term for people who can quickly absorb information, adapt to new challenges, and get people aligned in the right direction: fast zebras. They are the people who can skirt around or blast through the kind of gridlock found not only in the political spectrum, but in organizations of every stripe.

The metaphor is based on the fast zebra on the African savannah who survives a trip to the drinking hole by moving quickly while slower herd members fall prey to waiting predators. Well, organizations are sometimes like the savannah; to the new-comer, they constitute vast, unexplored areas fraught with hidden dangers. The fast zebras in both contexts travel the terrain swiftly to accomplish significant goals while the naïve ones run into the predators of red tape, unaligned incentives, and unmotivated teams.

Before I get into the Vblock concept, I want to take a step back and talk about the management issues here and add to them with my own experiences.

In one organization I worked as a part of a core team of a handful of “fast zebras”. I was the SAN admin, the server admin, and often the application engineer as well. The network team was readily available and we had the luxury of engaging them without a formal process.   While there are some drawbacks to such a design, we were able to attack projects with great speed and it was satisfying as well.

Suddenly we became a much larger organization and now we had to deal with a new model where there were rigid silos of responsibility and the fast zebras in the organization were told that they had to be “corralled” (that was the actual word used) and work within the new system and to NOT transcend across the silos.

If one does a Google search for “organizational management silos” you won’t see a lot of praise for silos – which management sometimes can be seduced with from an accountability standpoint.  Rather you’ll see posts about “breaking” and “tearing down” the silos, bridging them,avoid them, poking holes in them and even blaming CEO’s for their rigidity. (One of my MBA classes last year focused on organizational management and if I had my book handy I’d have a lot more great quotes!)

So what was the impact of the new siloed organization? We found that projects in general took 300-500% longer in the new organization. We had meetings to reclarify statements, specifications and then schedule even more meetings.  Think of the difference between telling the network or storage team to do “X” versus having an engaged partner in the initiative who understands the business and technical drivers.  Those of you who have built virtual infrastructures know just how complex everything from firmware to spindles to fabric can be in creating a solid infrastructure.   At times we even uncovered single points of failure that needed to be corrected.

Another area that amazes me is how some organizations can focus so much effort on their processes for regulatory compliance, but don’t focus on making those same processes effective for IT.  Even worse, real-world security can be an albatross when you pull back the curtain, but all the regulatory compliance appears to be solid and in order. It’s as if management can take solace in a regulatory interpretation of security while the effective security is a near-disaster lacking both standards and best practices.

Projects reigned and best practices became a luxury unafforded and with no silo of its own.  Compliance and structure over effectiveness.  Perception over reality.

Getting back to the topic, some level of red tape will be necessary for accountability and regulatory compliance but what are we really gaining by corralling our fast zebras?

Fast zebras are not mavericks. They do not seek notoriety for overtly breaking rules, and then start enjoying breaking the rules for the sake of being noticed. Instead, fast zebras are relentlessly focused on results. They prefer bending to breaking rules. They achieve results by using their fact-based knowledge of the formal organization complemented by insight into the informal organization. They often have no preference for either and view both simply as means to the ends.

So how do you let your IT fast zebras run free? Part of it is process and organizational structure of course. Rather than endless meetings and task delegation across silos, identify your fast zebras and empower them within your processes and reporting structure to transcend the silos and get things done quickly and with a consistent vision.

Another part of it is your IT infrastructure itself.  This is what I love about the Vblock and complimentary technologies like UIC and vCloud Director – not only are there significant OPEX reductions, but it helps to knock down the silos between networking, storage, server and app, while improving organizational agility.  Imagine if the application engineer can quickly provision a new multi-tiered application — including servers, networking, firewall rules and storage, mostly from a single console.

In an upcoming “Agility” series I will discuss CAPEX, OPEX, value and the need for agility, then obstacles to agility and then how solutions like vBlock, UIC and vCloud Director (and especially when all put together) can create exciting value and opportunity in new ways.

Leaders can speed up their formal machinery with the lubrication that fast zebras provide by planting them wherever agility, responsiveness, and innovative approaches are needed most. But sometimes what fast zebras do is so important that it warrants changes to the whole system so that fast zebra behavior is adopted more broadly by others. When a few fast zebras won’t do, and a herd is needed, leaders need to change the ecosystem of the organizational savannah. If they don’t, they run the risk of encountering their own unpleasant predators.

Your fast zebras in IT want to run free – it is their natural state. If they are herded, both morale and productivity will suffer – perhaps bad enough that your fast zebras may find somewhere else to run.

Quest Acquires Bakbone Software To Enhance Backup Portfolio

Quest Software announced today an agreement to acquire Bakbone Software for $55 million.  Quest excepts the acquisition to further boost their existing data recovery product lines including vRanger and LiteSpeed.

It’s not yet clear exactly how the products and technology will be integrated, but a post on Quest’s vCommunity blog suggests that some of the technologies will be integrated with vRanger.  BakBone has some interesting technologies including:

  • Real-Time CDP Protection
  • Post-Process byte-level de-duplication which Bakbone claims can reduce the storage footprint by up to 12 times, while also reducing backup times/windows
  • Application-Level backup integrations with Oracle, SQL, Exchange, Sharepoint, MySQL, Notes and more
  • Support for multiple virtualization platforms including vSphere, Hyper-V and Xen
  • Individual retention policies, including the ability to move older backups to a different storage tier (disk, tape, VTL)
  • NDMP/Tape/VTL support
  • Bare Metal Recovery
  • Email and File Archiving

There’s some overlap in a few areas of course but it will be interesting to see how some of these technologies may be integrated into future versions of the vRanger product.

Please Give to the Make-A-Wish Foundation

Our 9-year-old daughter is a Make-A-Wish survivor.  This past winter, the Make-A-Wish Foundation granted our daughter’s wish and sent us to Hawaii for almost 2 weeks (and yes, she picked Hawaii by herself and we had no idea!).

During the summer she endured what can only be described as extreme abdominal surgery for which their was great risk.  She spent about 2 months in the hospital — half of that time in the PICU with over a dozen tubes and lines at times.  Her medical journey is not over yet as challenges remain (currently she is on tube feedings and has a severely slipped disc), but she is on a good trajectory at this time.

I can’t begin to explain how my impressions and understanding of the Make-A-Wish Foundation have changed and how impressed I am with their organization and what they do.

When more time allows, I will share more details about our experience with the Make-A-Wish Foundation and a bit on our daughter’s story as well.  I have some other posts to get to first, but until then please consider the Make-A-Wish Foundation in your holiday giving.  You can also follow them on twitter @makeawish.

How to kill a hung VM (or task) in ESX or ESXi

Yesterday I received a call asking for assistance on a VM that was in an inoperative state in a ESX 3.5 environment.  There was an active task to create a snapshot which was almost a day old and could not be cancelled.  Any attempt to invoke a new task or change the power state resulted in “another task is already in progress”.

The good news here is that VMware has some very good KB articles that do a great job of detailing some of the options here.  They have a KB article for ESX with the Service Console and another KB article for ESXi environments.

If you are in an ESX (Service Console) environment you also need to know how to use SSH (you can also use the physical server console or DRAC/ILO/KVM, but I prefer SSH).  I use PuTTY which is available here.  In an ESXi environment you can use the Remote CLI (which you will want to install on your workstation) to issue the commands as detailed in the ESXi KB article.

Now there may also be times where it is necessary to kill the task (i.e. creating or closing a snapshot) without affecting the VM state.  You can follow the procedures in this KB article to restart the management agents on the ESX host or you can play the video below.  Just keep in mind that this resets the connection between the host and vCenter and that some tasks (such as closing snaps) can take a long time and in some cases even exceed the timeout in vCenter even though the ESX host is still performing the task.  Restarting the agents does not impact what the ESX host is doing (including hosting VMs) beyond resetting the connection with vCenter.

Introducing Project BLUE SPHERE (PBS)

I was wanting to blog about Vblocks, UIM, vCD in the OPEX/Agility space and much more, but it’s rather difficult to gain access to some of these things and learn about them  — especially for an independent blogger who was unable to attend VMworld or access the online sessions.  That leaves me with just theory and vSphere to blog about.  Pretty dull huh?  In fact the whole thing made me a bit sad…

Now I do happen to be starting a major vSphere upgrade project but there’s not much there that hasn’t been said, right?  Looking at other blogs I still see a lot of activity on things that didn’t seem terribly interesting to me.  Then it occurred to me that just because I may not  find something exciting or relevant doesn’t mean that others in the community wouldn’t.  And there’s always more than that can be learned about vSphere and integrating all of the elements.

Introducing PROJECT BLUE SPHERE (PBS)

I am leading a project consolidating medium sized ESX 3.5 and ESX 3.0 farms supporting over 300 VM’s to a 20-host vSphere 4.1 environment, so why not blog about about the items of technical interest during the project?  It’s kind of like a reality show for vGeeks, but the best part is there are no eliminations, no immunity, no one gets fired (we hope) and no “most dramatic rose ceremony ever” just after the commercial break.  Just good old vSphere.

We won’t cover absolutely everything of course, but I’ll outline our technical experiences with upgrading vCenter, vCenter Heartbeat, ESXi 4.1, backups, some other challenges and much more as we proceed through the project which hopefully will provide some valuable lessons learned and best practices.

The project is in the early stages as I’m still collecting details and preparing for kickoff meetings, but be sure to tune into Blue Shift and track our progress during our ESX 3.x to vSphere 4.1 upgrade.  I’ll also be starting an “Agility” series soon which will cover some of the things I find exciting about the VBlock, UIM, vCD and more.

Quest vRanger and Direct-To-Target Backups

Direct-To-Target backups can generally significantly increase parallelism and backup performance.  In the vRanger architecture, each ESX host would be the “data mover” and collect the bits and write them directly to the backup target (CIFS or NFS).  Quest has noted that customers have experienced significant performance improvements with Direct-to-Target over proxy architectures where the “data mover” can become a bottleneck.

ESXi however created a problem for this Direct-to-Target capability.   With ESXi being a bare-metal hypervisor, it was no longer possible to inject run-time binaries onto the ESX host.  Thus when ESXi is used, all the backup traffic is now passing through the vRanger server itself – acting as a backup proxy.  This is why some have noticed a drop in backup performance after deploying ESXi in a vRanger environment.

What can be done for ESXi hosts?  For the moment, all you can do is use either fiber-based (LAN free) backups and/or leverage ESX’s CBT (Changed Block Tracking) to reduce backup times.

But in order for true Direct-To-Target capability with ESXi, vRanger customers will have to wait until next quarter (Q1 2011) when a VA (virtual appliance) architecture will be introduced.  In this design, an appliance VM will run from each ESXi host and use the hot-add capability of the vStorage API to mount each VMDK as read-only and perform the backup operation.

Quest has a detailed whitepaper on this VA solution, and an illustration is provided below.

New Application HA Whitepaper

Previously I posted an overview of Symantec HA here.  VMware and Symantec have just released a new whitepaper detailing Application HA protecting SQL Server, which you can read in full here.

A quick summary:

The Application HA monitoring API was introduced in vSphere 4.1.  HA vendors (such as Symantec) will provide the in-guest application monitoring components, and then these components can integrate directly with VMware HA for intelligent remediation.

Other solutions can either have technical limitations (VM Fault Tolerance) or be very complex and/or expensive (i.e. MSCS) and they don’t integrate with either HA or vCenter.  Symantec Application HA lists for $350 per VM and integrates directly with both HA and the vCenter client.

Can your VM be restored?  VMware and VSS — Part 1

Can your VM be restored? VMware and VSS — Part 1

The backup job for your VM completed successfully so the backup is good, right?  Unfortunately it’s not that simple and a failure to effectively deal with VM backups can result in data loss and perhaps even legal consequences.

We will take a look at several issues, including VSS integration with VMware, covering several known issues along the way.

VSS and Application Consistency

The first concept to understand is application consistency.  When a snapshot is taken it of a VM, it will freeze all disk activity to the point that every file is consistent.  But what about complex file structures like databases where the contents of the file are constantly changing?

What if a transaction was being written to a SQL, SharePoint, Active Directory, or an Exchange database but the transaction was in mid-step?  What about the registry and other elements of the System State?  You can end up losing data, and in some cases have the tables and indexes so corrupt that the database won’t even load.

Microsoft introduced a solution in Windows 2003 called VSS (Volume Shadow Copy Service) which is described here in a Technet article.  Microsoft applications including, Active Directory, SQL, Exchange and Sharepoint integrate with VSS so that Windows can ensure that the applications are in a transactionally-consistent state for backups.  Several 3rd party applications such as Oracle also provide VSS support for their databases.

VMware Tools  and VSS Support

In earlier versions of VMware, the VMware Tools package would include a locked file driver (SYNC driver) from Legato (an EMC backup product).  This SYNC driver would prevent issues with locked files, so that all files could be quiesced and be consistent at the file level, but not the transaction level.    The problem with the SYNC driver is that it caused many problems with Oracle, Exchange and Active Directory databases and even caused data loss in some cases.  AD and Exchange both use the Jet database and seemed to have most of the issues and is discussed here in a VMware KB article.

In ESX 3.5 Update 2, VMWare introduced VSS Support.  Now when ever a snapshot was taken, it would instruct VMware Tools to invoke Microsoft’s VSS function inside the VM as a part of the snapshot.  This is leveraged whether your backup solution uses the legacy VCB system (no longer available in vSphere 4.1) or the newer vStorage API.  So as long as your application is VSS aware, you should be good right?  Unfortunately it’s not quite that simple.

Many (like me) made the assumption that once VMware Tools is upgraded within the VM that the SYNC driver would be gone and VSS would be enabled.  Imagine my concern when I saw AD databases going offline when backups were triggered because the SYNC driver was still present.   I shared this with Duncan at Yellow Bricks who wrote a post on this issue here.

Basically the burden here is on the operator to first discover if the SYNC driver or VSS integration is being used.  One quick check is to use Device Manager and look for the SYNC driver there (show hidden devices).  If the SYNC driver is present, your backups are probably using this and not VSS.

One important issue I want to be clear on:  If VMware Tools was first installed with a version of VMware Tools at 3.5 U2 or later you should be fine.  However if the VM started with an earlier version of VMware tools, you are likely at risk for this issue.

Once you’ve found a VM where the SYNC driver is being used, here is one way to remediate this and switch to VSS:

  • Run the VMWare tools installer in interactive mode.  If you are not at the current version you will have to upgrade first.
  • Select the “Modify” option as shown below

  • If present, deselect the SYNC driver so that it has a red X next to it, and select VSS Support which is highlighted in the image below.

Once you are done with this, backups triggered by VCB or the vStorage API should now properly invoke the VSS support.

It would be ideal if there was a Powershell/WMI script that could query for the existence of any VM’s that have this problem.  I haven’t taken the time to look into this very closely but will hopefully be able to in the near future.

UPDATE:  another blogger wrote a script for this here

There’s still quite a bit more to know however about how VMware and VSS interact with different operating systems.  For example if you are running Exchange or SQL on Windows 2008, it is probably not being quiesced by VSS unless you have taken specific steps. We will explore this and more in detail in Part 2.

Originally posted on July 21, 2010

Load Based Teaming in vSphere 4.1

Many commonly assume that VMware will balance network traffic across all the physical NICs that are in the same team, but this is not necessarily the case.  Let’s take a closer look at this, and also how a new feature in vSphere 4.1 can help to prevent network I/O bottlenecks.

Let’s say you have a vSwitch comprised of 4 physical NIC interfaces which is used by 5 different VM’s –each with 2 vNICs.  How does network traffic get distributed across these 4 interfaces?

As of vSphere 4.0 there were 3 teaming policies available.  Let’s look at the default first:

Route based on the originating virtual port ID (default)

This is the default setting and it means each virtual switch port will alternate affinity to each physical NIC.

The numbered blue boxes above represent ports inside the virtual switch.  Each port is assigned affinity to a physical NIC in round-robin fashion.  The first vNIC that joins the switch will be assigned pNIC1, the 2nd gets pNIC2 and so on.

There is absolutely no consideration to traffic load here.  Furthermore, the port assignment is random – it is based on the order in which the vNICs came online.  Another key point here is that any one vNIC never has access to more than one physical NIC at a time.

All this could result potentially result in an allocation of network resources that may be less than optimal – especially if you are using 1 Gig Ethernet rather than 10G.

Now the other two options in vSphere 4.0 were Route by IP Hash and Route by MAC hash.  These options have the ability to segregate the traffic by conversation.  Each IP/MAC to IP/MAC conversation could traverse a different physical NIC in the team.  While this can result in a more balanced traffic distribution, it will likely take some collaboration with your network team in order to implement successfully.

VMWare sums up the problem with this statement:

These three policies provide static mapping from vSwitch port to pNIC adapter. It is possible that two heavy network load virtual machines are mapped to same physical adapter that is congested while the other adapters still have free bandwidth.

Now in vSphere 4.1 there is an additional option which is displayed as “Route based on physical NIC load” for distributed vSwitches, which the marketing folks refer to as “Load Based Teaming”:

With Load Based Teaming in vSphere 4.1, after initial port based assignment, the load balancing algorithm regularly checks the load of all the teaming NICs. If one gets overloaded while another one has bandwidth available, it reassigns the port-NIC mapping to reach a balanced status. During the period until the next check is performed, the mapping is stable.

Note: Bandwidth is still limited to the maximum bandwidth a single pNIC provides.

This feature is a significant improvement in the sense that it can guard against a pNIC being overloaded.  This does not balance ALL traffic, but it will balance the connections (based on load) across the pNICs and optimize them.  A vNIC can still only have affinity to a single pNIC, but the Load Based Teaming feature in vSphere 4.1 can prevent against network resources being imbalanced and overloaded within a team.

Load Based Teaming in vSphere 4.1 requires a distributed virtual switch (vDS) which requires vSphere Enterprise Plus.

For more details, review the following KB article (which was the source of the above quotes):

http://kb.vmware.com/kb/1022590 Load Based Teaming in vSphere 4.1

Van Halen on Cloud Security

Van Halen on Cloud Security

What in the name of rock-and-roll does Van Halen have to do with the cloud?  Join us on a magical journey filled with wonderment and perplexity as we seek to understand this parable.

Your guide for our magical journey into the cloud. And Waldo, please sit down...

Van Halen introduced a new kind of rock music which was bold, extreme and uncompromising, so naturally the band would adopt a persona of bravado and attitude to match their music.  The 1983 Rolling Stone Record Guide called Roth “the most obnoxious singer in human history, an achievement notable in the face of long tradition and heavy competition.”  While many rock artists would demand perks in their contracts to match their egos, Van Halen took such demands to a new level.

Van Halen added a rider into their contract insisting that a bowl of M&M’s be provided backstage with all the the brown M&M’s removed.  According to lead singer David Lee Roth’s autobiography (as recalled by Snopes), Article 126 in the contract rider stated:

“There will be no brown M&M’s in the middle of the backstage area, upon pain and forfeiture of the show, with full compensation”

So with “full compensation” for the performance at risk, hundreds of thousands of dollars depended on whether or not every single brown M&M was removed from the backstage candy dish.  Another example of out of control rock star behavior, right?  Perhaps not…


The mammoth stage also doubles as an acrobatics platform

First, Roth explained that the tour was technically demanding:

We’d pull up with nine eighteen-wheeler trucks full of gear, where the standard was three trucks max.  And there were many, many technical errors – whether it was the girders couldn’t support the weight, of the flooring would sink in, or the doors weren’t big enough to move the gear through….The contract rider read like a version of the Chinese Yellow Pages because there was so much equipment….it would say “Article 148:  There will be fifteen amperage voltage sockets and twenty-foot spaces, evenly providing nineteen amperes…’  This kind of thing.

So when you’re in a new city most every night, how can you be certain that all the technical details in your contract are being followed by third parties?  There wasn’t enough time or resources to check every detail in the contract, so the infamous M&M clause was born.  Roth explains:

So, when I would walk backstage, if I saw a brown M&M in that bowl….well, line check the entire production.  Guaranteed you’re going to arrive at a technical error.  They didn’t read the contract.  Guaranteed that you’d run into a problem.  Somethings it would threaten to just destroy the whole show.

Roth in fact described one incident where as a result of the contract not being read, the staging sank through the arena floor and did $80,000 of damage.

The M&M’s served as an early-warning system — a predictor of how likely it might be that fine details of the contract were not being followed.  And if the contract was not followed in detail it could threaten the show and even the band’s brand.

Now, what does this have to do with cloud computing again?

SECURITY AND CLOUD COMPUTING

Contracts often have SLAs which can be often verified by performance and availability metrics.  But security isn’t a simple binary function – it can be very complex to contractually enforce – not unlike Van Halen’s touring requirements.  First lets take a look at the cloud security problem itself.

Can you afford to trust the security of your data to just anyone?

It’s challenging enough to be able to account for data access controls and security governance, when the data is on your private network.  Most companies fall under multiple regulatory requirements, such as PCI, HIPPA, E-Discovery and many more which require strict governance about managing data confidentiality, integrity and availability.

In cloud scenarios you now have either public networks, third-party companies or both now carrying your data.  Are your security controls still effective?  Can you prove it?

To provide just a few examples of concern:

  • Can data accidentally “bleed” over into other networks and systems due to misconfiguration or other factors?
  • Can the data be protected from other entities who may have processes running on the same hardware?
  • If you are connecting internal databases to SaaS applications in the cloud, how can you be assured transactions to your internal databases and directories are secure?
  • Does the hosting and/or SaaS provider adhere to a level of security audits and controls that are comparable to what your organization has adopted internally?

There are many opportunities for the confidentiality and integrity of your data to be compromised – the opportunity can arise out of either negligence or malice; the catalyst can be an individual from a 3rd party under contract or even another cloud customer.

“We’re from the government, and we’re here to help”

Don't allow your security to be shredded -- like Eddie's guitar

For cloud applications which are hosted within the United States, security may be about to get even more complicated.  The Obama Administration is reportedly crafting legislation which would call on communication firms to be able to decrypt secure communications for the FBI.  With a new back door to encrypted communications, cloud security and governance could become a much greater challenge than it already is.  Securosis, a leading security research and advisory firm,  describes the proposal as follows:

To allow a communications service to decrypt messages, they will need an alternative decryption key (master key). This means that anyone with access to that key has access to the communications. No matter how well the system is architected, this provides a single point of security failure within organizations and companies that don’t have the best security track record to begin with. That’s not FUD — it’s hard technical reality.

A post on ReadWriteCloud goes into more detail on this issue, and how this poses a challenge for cloud security firms like enStratus, which provides encryption across cloud platforms.

ARE ALL CLOUDS CREATED EQUAL?

Some might be tempted to say “I’m just building a private cloud so this doesn’t really concern me”.  While it’s true there can be different scenarios depending on private, public and hybrid cloud models, as Edward Haletky points out in a recent post, in the long run the security issues are all the same:

There is a difference between public and private cloud security, but it is very easy for a private cloud to in essence become a public cloud with all the Secure Multi-Tenancy issues that entails. This means that all clouds are alike and the security of any cloud could be handled by a single set of controls and security policies.

WHAT SHOULD BE CONSIDERED IN CLOUD CONTRACTS?

Sarabjeet Chugh – a senior manager at VMware — recently posted some questions to ask infrastructure service providers such as:

How transparent are their security standards and compliance audits?

Eddie Van Halen and Valerie Bertinelli probably aren't signing a contract here

Or in other words is the third party diligently adhering to the terms of the contract and security standards they claim to follow, or should we expect to find brown m&m’s backstage, along with security issues?

InformIT has an informative series on cloud security which encourages organizations to look for several things in a cloud services contract including:

  • Is the cloud provider contractually obligated to protect the customer’s data at the same level as the customer’s own internal policies?
  • Do the provider’s security policies comply with all applicable regulatory rules?
  • Is the provider willing to undergo on-demand or periodic audits and security certifications?
  • What are the provider’s policies on data handling/management and access control? Do adequate controls exist to prevent impermissible copying or removal of customer data by the provider, or by unauthorized employees of the company?

And before you sign that contract, consider your exit strategy.  The cloud computing model may be agile, but this can be threatened if you can not safely, securely and promptly remove your data from the premises of a third party.

ABOUT THOSE BROWN M&M’s

There’s much advice available on cloud security, but be creative when crafting cloud security agreements.  The contract may demand certain security protocols and safeguards but how can you be certain that they are being followed effectively?  There are third-party tools like RSA’s Solution for Cloud Security and Compliance which can help here, but also consider opportunities to insert “brown M&M clauses” into the contract as an alarm system to help determine if your services provider is indeed following the security requirements of the contract.

So the next time you sign a cloud services contract, ask yourself if you’re smarter than this guy:

AND NOW  SOMETHING COMPLETELY DIFFERENT

This post started out on the fun side and ended up a bit serious, so lets end this on a lighter note.  Van Halen music fans should be excited as indications are that the band is currently in the studio with David Lee Roth (for the first time since “1984”) and apparently is planning to launch an album and tour next spring.

Now here’s some music trivia.  What song by David Lee Roth was based on a song from 1915, that Louis Prima converted into a medley in 1945, and sports a video featuring “cameos” of Michael Jackson, Cindi Lauper, Willie Nelson, Boy George, Richard Simmons and more?  Enjoy the video!

vCloud Request Manager Video

VMware just announced a new product which should help transcending through organizational silos and increase business agility within the private cloud.

This past August, vCloud Director was launched, which replaced vCenter Lifecycle Manager.  Rather than limiting provisioning and lifecycle management to a single VM, vCloud Director expands these roles to an application scope, by including networking and security elements.  Now you can automate application provisioning, but what about the business request process?

vCloud Request Manager was just announced this week at VMWorld Copenhagen for just this purpose.  Business groups can submit a request for a new application from a web UI that also supports tablets like the iPad.  Check out the video below for a quick introduction to vCloud Request Manager.