Why Microsoft?

This is a question that can be explored from many different angles, but I’d like to focus on it from not JUST a virtualization perspective, and not JUST a cloud perspective, and not JUST from my own perspective as a vExpert joining Microsoft, but a more holistic perspective which considers all of this, as well

Top 6 Features of vSphere 6

This changes things. It sounds cliché to say “this is our best release ever” because in a sense the newest release is usually the most evolved.  However as a four year VMware vExpert I do think that there is something special about this one.  This is a much more significant jump than going from 4.x

vSphere 6.0 Public Beta — Sign Up to Learn What’s New

Yesterday, VMware announced the public availability of vSphere 6.0 Beta 2.  I can’t tell you what’s all in it due to the NDA, but you can still register for the beta yourself, read about what’s new and download the code for your home lab. There’s some pretty exciting stuff being added to vSphere 6.0 in

Will VMware Start Selling Hardware? Meet MARVIN

The Register is running a story that VMware is preparing to launch a line of hardware servers.

VMware Pursues SDN With Upcoming NSX Offering

Earlier this week VMware announced VMware NSX – an upcoming offering that takes network virtualization to new levels. NSX appears to be somewhat of a fusion between Nicria’s SDN technology (acquired last year by VMware) and vCloud Network and Security (vCNS – formerly known as vShield App and Edge). Since I already had intentions to

What Really Is Cloud Computing? (Triple-A Cloud)

What is cloud computing?  Ask a consumer, CIO, and salesman and you’ll likely get widely varying responses. The consumer will typically think of the cloud as a hosted service, such as Apple’s iCloud, or uploading pictures to Photobucket, and scores more of like services (just keep in mind that several such services existed before it

Agility Part 2 — The Evolution of Value in the Private Cloud

When an IT project is commissioned it can be backed by a number of different statements such as: “It will reduce our TCO” “This is a strategic initiative” “The ROI is compelling” “There’s funds in the budget” “Our competitors are doing it” Some of these are better reasons than others, but here’s a question.  Imagine a

Stacks, the Vblock and Value — A Chat with EMC’s Chad Sakac

…I reached out to EMC’s Chad Sakac to gain more insights from his perspective on how the various stacks…well…stacked up….

“Dude! Where’s my Server?” – Firewall Edition

Remember when server virtualization was still new and untested and we (endearingly) referred to the skeptics as “server huggers”? You know the type. They’d walk into the server room and say “which server is mine?”  You could always answer in confidence and tell them that their server is “somewhere in one of these first 3 rows of server racks”. Maybe they just wanted to know where to put the asset tag? Or perhaps give it one last hug and feel the warmth eminating from the air vents.  And when it came to P2V, remember the look on their faces right before they said “you want to do what to my server?!?”

We humans don’t naturally accept change very well, but eventually most server huggers would come to accept server virtualization as being safe. Not only has virtualization become socially normalized, but the economic drivers of CAPEX, OPEX, Agility – and even performance have led many former server huggers to accept server virtualization. After all, it is the abstraction of physical resources which is perhaps the biggest enabler of a new paradigm shift of benefits – and to enable and take advantage of these new benefits we had to think differently when it came to servers.

WHAT ABOUT FIREWALLS?

Firewalls can be abstracted too. When we start going over our Visio diagrams of networks and thinking about VLANS, routes and security often times we think in terms of physical hardware. “I need to have two firewalls here – load balancers there, and another firewall for this remote web farm”.  But what if we could abstract firewalls and virtualize them such that perhaps for some elements we didn’t need to purchase and deploy a physical firewall?

VMware vSphere customers who are at the Enterprise Plus level essentially just got a free upgrade to vCloud Suite Standard which includes virtual application firewall capabilities in both vShield App and vShield Edge. And those who upgrade to vCloud Suite Advanced also gain a virtualized load balancer.  Cisco also makes a virtual edition of their Adaptive Security Appliance (ASA) – the Cisco ASA 1000V – which can be integrated into VMware vSphere environments as well.

 

Virtual Firewalls? Sweet Dude!

With solutions like these – abstracting firewalls and network security – it is now possible in many cases to build your security policy into your virtualized environment. Need a web server policy to open 443 and 80 for a specific group of servers while only allowing a custom high-level SQL port back inside? We can do that.  Firewalls between servers which might even be running from the same physical host?  No problem.  By abstracting network security to logical boundaries we might be able to provision applications more quickly and more securely — and perhaps also not needing to purchase as much physical network hardware as we are accustomed to thinking. And with VMWare’s acquisition of Nicria this movement to abstract the network layer has only just begun.

Physical network hardware isn’t going away, but as we review our designs we might want to start thinking about virtualizing certain components of our network security and consider it as an option. Over the longer run, I suspect we will see even more abstraction at the networking level over the years.

It’s Christmas in August for VMware Customers

Some huge announcements were made at VMWorld today, many of which will be warmly received by VMware customers. I’m not referring to some of the great new features like the vSphere 5.1 web client, but things like new product entitlements, improved licensing and more.

Licensing and vRAM

Last year VMware announced a new vRAM licensing component with some controversy. If customers wanted to deploy a large server with 512GB of RAM for example, they would have to purchase additional vSphere licenses to accommodate the memory consumption.  This “vTax” on memory-dense servers has now been lifted, leaving per processor licensing as the only component.  This will enable customers to immediately leverage the new 64 vCPU capabilities of vSphere 5.1 for large virtual machines without having to incur additional licensing costs.

vShield

vShield was formerly sold separately as three products:

  • vShield Endpoint (antivirus protection for VMs)
  • vShield App (Virtual Application Layer Firewall)
  • vShield Edge (Gateway Security Appliance (firewall/NAT,VPN, etc.))

vShield Endpoint is now included in vSphere 5.1 Standard and higher editions. Current vSphere customers with support essentially are getting vShield Endpoint for free.

The other two vShield products, App and Edge (not to be confused with Bono and Edge) have been rolled up into a new product called vCloud Networking and Security (vCNS) which is a part of the new vCloud Suite. vSphere customers at the Enterprise Plus tier are entitled to a free upgrade to the Standard edition of vCloud Suite. Eligible customers who take advantage of this are essentially getting vShield App and Edge for free.

vShield features such as high availability for firewalls and load balancing will require an upgrade to the Advanced Edition of vCloud Suite.

vCloud Director (vCD)

vCloud Director is a key and integral part of VMware’s IaaS solution framework, and because it is also a part of the new vCloud Suite, eligible customers will essentially get vCloud Director (and vCloud Connector) for free. Very nice!

Breaking It Down

Here’s a basic table summary of the new product entitlements available to vSphere customers:

New vSphere Benefits

StandardEnterprise Enterprise Plus
no vRAM license restrictionno vRAM license restrictionno vRAM license restriction
vShield EndpointvShield EndpointvShield Endpoint
vShield App & Edge
vCloud Director & Connector

Those are some pretty nice benefits which I suspect many VMware customers will be very pleased with.

VMware vSphere 5.1 Feature Preview

vSphere 5.1 might technically be a “minor” update, but just like vSphere 4.1 there are some very significant new features – many around increased scalability and improved operations.  I had the opportunity to spend a little bit of time with the vSphere 5.1 beta and I thought I’d quickly share some of the new feature highlights and perhaps drill into some of these in more detail in the future.

UPDATE:  

vSphere 5.1 now includes vShield Endpoint for AV protection!  For more details on what’s new in vSphere 5.1 also see this whitepaper from VMware

SCALABILITY

The maximum number of hosts which can share a file in read-only mode has been increased from 8 to 32.  These improvements in VMFS locking will enable larger clusters in environments which use linked clones, such as VDI and vCD environments.

The “Monster VM” has also been taking his vitamins as 64 – yes, sixty-four! – vCPUs can now be assigned to a virtual machine.  VMware also announced the end of vRAM as a licensing component which means organizations will be able to utilize these features in larger VMs without an additional license penalty.

And finally improvements have been made in MSCS support to enable Failover Clusters with up to 5 (five) nodes.

VMOTION

Host vMotion can now take place in the absence of shared storage.  Needless to say a low latency environment (such as a Metro Area Network or better) is ideal, but this can empower new scenarios for migrations and many other scenarios.

Also Storage vMotion is now supported for up to 4 parallel operations across 8 volumes.

STORAGE

  • Install ESXi onto an FCoE LUN
  • Suport for 16GB HBAs
  • Improvements for handing All Paths Down (APD) conditions
  • Improved Storage DRS for Linked Clones
  • Install ESXi to FCoE disks
  • Space Efficient Sparse Virtual Disks for Linked Clones

The last is especially interesting as in the past you had to use SDELELTE and manual cumbersome steps to reclaim space from a VM.  Recall that Windows does not delete blocks when a file is deleted, it only removes the directory entry.  Now VMware Tools can initiate a scan in the OS of unused blocks and reorganize those blogs to leave a contiguous collection of blocks at the end of the disk.  Then a SCSI UNMAP command will be sent to the SAN allowing the space to be reclaimed from your thin disks.

Another advantage on the new sparse disk model is that snapshot chains no longer consume space when deleting snapshots.  In the past it was possible to get into a situation where the volume was full and you didn’t have any free space available to consolidate any open snaps.  Because now each snap in the chain consolidates directly into the base disk (as opposed to other snaps and then the base last) no additional free space is required to consolidate snapshots.

NETWORKING

At times, physical networks uplinks on the ESX host can be configured in such a way that proper function is not possible.  vSphere 5.1 includes a new health monitor for networks which checks teaming, VLAN and even MTU settings such that an alarm is created if a network configuration might not deliver the intended results.

In many environments, backups of switch and router configurations are maintained and always at the ready in the event a prior state needs to be restored.  Now your vDS switches and their port groups can be backed up and restored, which could come in useful if your vCenter server were to become unavailable.  The system can also automatically rollback to the previous networking state if networking is found to have been disrupted.

In addition vDS will now support both Port Mirroring and LACP.

TAGGING

This is a nice feature which can help to organize various vSphere building blocks (VMs, networks, volumes, etc.) and relate them to applications, teams or whatever groupings might be helpful.  It works much like tagging pictures for anything to do with “cooking”, “bird watching” or “family” for example.  You can tag virtual machines that exist across different vSphere clusters as being part of the same application, business unit or whatever construct you find useful.  Then when you search for a tag, you can quickly bring up a list of all the objects which possess that tag.

Web Client

The vSphere Web Client has gone through a complete overhaul and I think that most will be pleasantly surprise at just how much of the vSphere GUI functionality is now in the web client.  I tested the web client with Google Chrome and it was fast, response and an enjoyable experience such that the web client did not feel like a second-class citizen.

vSphere 5.1 Web Interface on Google Chome

Single Sign On

Single Sign On is now provided across the web client and the vCloud infrastructure suite, without having to login to the components individually.

VMware Tools & Upgrades

For those that remember the pre-virtualization days, it was often a chore to update hardware driver components (think HP SIM, Dell OpenManage, etc.).  Often times you’d get notification that a certain driver was not the current release, but did that mean there was a compelling reason to upgrade?  Now VMware Tools are yesterday’s hardware drivers and with the version changing  with some ESX patches, how important is it to update VMware Tools and incur a reboot on your guests?

The new model hopes to reduce this by mapping VMware Tools to the virtual hardware version (now “virtual machine compatibility” in the Web UI.  Furthermore, reboots will be required less often in future VMware Tools releases (after the update to 5.1) due to improved driver management.  I think that reboot-free VMware Tools upgrades will be a popular feature in many environments.

GRAPHICS

VMware View environments will benefit from the ability to leverage GPUs to increase the quality of virtual desktops, especially in the areas of full motion video, 3D graphics, and more.

AUTODEPLOY

Auto Deploy now supports stateless caching to enable operations to continue when an Auto Deploy server becomes unavailable.  Also a new Stateful Install option can make it possible to deploy an ESX host more quickly in several scenarios.

SUMMARY

Those are some of the bigger highlights I was able to capture .  I’ll be looking for even more details as VMworld progresses.

Certification Challenge Update

A few months ago I posted a note on my certification adventure and thought I’d post a brief update.

My feelings on certification are mixed.  A decade ago I had the highest certifications available from Microsoft and Novell but at times I word work with others who had the same certifications and I didn’t always see the knowledge and/or troubleshooting skills one would expect.  Once I had a level of experience which I felt spoke more loudly than my technical certs I gave up on them.  I allowed them to expire and focused on a business MBA instead, which (thus far) hasn’t helped my career but I really enjoyed the knowledge and understanding I gained from it.

Having said that certifications do have value in that it is often a checklist or validation that you have some current knowledge in a fast changing field.  The best certification exam I ever took was a Novell Uber CNE or something like that were the test was to fix and repair a REAL broken NDS tree.  As often there is in the real world, there is more than one possible solution to a problem and you were graded by the efficiency of the path you took.  Most certification exams prove little more than that you can pass a multiple choice exam — but the VCDX “exam” is the exception here.  To become a VCDX you must present a complete datacenter design and defend every decision you make in front of a panel of judges.  Now that’s a certification that you can’t fake!

Back to me.  My goals were to get certifications from NetApp (NCDA), Microsoft (MCITP) and VMware (VCP & VCDA).

I passed the NCDA exam and I’m about halfway done with the MCITP path.  For me the most interesting and relevant certifications are the VMware ones, but unfortunately a mandatory 5-day class is required to take the VCP exam and budget and logistical constraints are an obstacle here.  When resources allow for it, I’m really looking forward to starting the VMware path, with a longer term goal of working towards VCDX certification.

iSCSI MultiPathing with VMware vSphere

Every now and then I come across an iSCSI configuration which does not conform to best practices.  There’s several great posts that cover this, but I thought I’d try to briefly cover some the basics and FAQs in this post.

There are some unique problems which could be lurking under the hood of some environments if they do not conform to best practices – let’s take a look.

LINK AGGREGATION

This is something most VMware adminsitrators are familiar with.  You might have four 1GB NIC ports bonded together into a single vSwitch and available for use by virtual machines.  Does this mean that you essentially have 4GB of bandwidth available for your VMs?  Well….yes and no….

Yes, because you have a total pool of 4GB available.  No, because a single VM (or conversation) will only use one pNIC (physical NIC at a time).  Stated another way, you have a 4GB logical pool of bandwidth, but a single VM/session can not use more than what is available on one pNIC at a time (1GB in this case).  For a bit more detail on this see an earlier post on Load Balancing in vSphere.

Getting to iSCSI we more or less have the same thing.  Lets say you created an iSCSI port group (VMkernel) and gave it access to two active NICs within the vSwitch, such that the NIC Teaming for your iSCSI port group looks like this:

Does this mean you have 2GB available for iSCSI?  Absolutely not, and you have no multipathing either.

In ESX 3.5 only a single iSCSI session / TCP connection to a target is supported as noted in the iSCSI Configuration guide which explains “storage systems with a single target containing multiple LUNs have all LUN traffic on that one connection.”  ESX 4.0 was improved to allow multiple iSCSI sessions, but you can’t get to where you want to be just by aggregating the NICS.  Here’s is what the iSCSI Multipathing whitepaper (ESX 4 and 5) says:

In case of simple network adapter teaming, traffic will be redirected at the network layer to the second network adapter during connectivity failure through the first network card, but failover at the path level will not be possible, nor will load balancing between multiple paths.

So we do have fault tolerance (at the NIC/port level), but we have no load balancing or multipathing.  What you really want to to have two iSCSI port groups (each with their own IP) and each port group with 1 active NIC.  If you’ll forgive my graphic skills I’ve attempted to visualise this below:

 

Above we have two iSCSI port groups, each with their own vmkernel IP.  Each port group has only one active pNIC assigned and ideally each one going to a different physical switch on the network.  With this configuration we have true multipathing being done by VMWare’s iSCSI Software Initiator within ESX.  To make sure an iSCSI port group is only using one NIC, you should modify the NIC Teaming to look like this:

And of course for a final step we need to bind our vmknics to the Software iSCSI adapter.  All of this is well detailed in the Multipath Configuration for Software iSCSI whitepaper.

One more quick note — if you are running vSphere 5.0 please make sure you have Update 1 installed as this corrects a bug in ESX 5.0 in which an All Paths Down (APD) condition can occur due to iSCSI traffic taking the wrong path — even with a correct iSCSI configuration — which can severely cripple the affected ESX host(s).

Not going to VMworld? Register for VMware NOW on Monday August 27th

If you’re like me and unable to attend VMworld this year, you can still “virtually” attend the conference with on-demand access to keynotes, new product overviews, demos and more.

I’m especially excited about this year’s VMworld as not only is there the potential to learn about new product announcements, but we will get more insight into VMware’s vision and strategy of the “software defined datacenter” following VMware’s blockbuster purchase of SDN vendor Nicira for $1.2 billion (with a B).  The hypervisor continues to mature and there should be much to talk about here (along with complimentary products), but this year’s event will be a bit more revolutionary following the Nicira purchase and management changes and I’m very excited about learning more about how VMware plans to execute and provide solutions around this new vision of the datacenter.

To register for VMware NOW and learn about new offerings, solutions and strategies on Monday August 26, follow this link.

VMware Making Bold and Strategic Moves

This month VMware announced two acquisitions – Dynamics Ops and Nicria—the latter being a $1.26 Billion acquisition.  These are bold and strategic moves which I think tell us a lot about where both VMware and the IT world are heading.

First let’s take a look at cloud management.  I touched on part of the issue in “Enter The Hybrid Cloud” – increasingly organizations are leveraging cloud compting and services but still have a need/requirement for internal or private cloud.  We see large companies like GM pulling back from outsourcing and concerns in several orgs about a “pure” public cloud, but still wanting to leverage public cloud as a tactical solution.  How do organizations enforce governance and consistency across these disparate clouds?  Dynamic Ops is a huge piece of the puzzle and greatly expands VMware’s ability to promote their vision and provide effective management for private and hybrid clouds.

Nicria is a huge acquisition — $1.26 billion for a solution that is first now just hitting the market – SDN or Software Defined Networking.  SDN at a high level is essentially abstracting the network stack from networking hardware, much as VMware has done for server hardware.  Networking functions of switches and routers become abstracted from the traditional network hardware model.   In “What Really Is Could Computing?” I discussed how (I think) abstraction is a key to the efficiencies of cloud computing by providing an abstraction layer from which you can orchestrate and manage.  Imagine if we can provision networks through an abstraction layer and combine with VXLAN, and combine with cloud orchestration across the rest of the stack?  This has big implications for everything from how we provision, manage and even how we think about DR scenarios.

There are many possibilities here and both speak very well to VMware’s vision of the “Software Defined Datacenter”.  Once you abstract servers, storage and networking you can orchestrate, and VMware just picked up two companies that could be key to such a vision.   I think that both acquisitions are excellent strategic moves by VMware which tell us much about VMware’s vision and I think there is much for customers to be excited about as well.

The Storage Hypervisor Part 3 — Storage Efficiency

In the first post in this series we discussed how ONTAP – the #1 storage platform in terms of revenue – is a storage hypervisor of sorts – providing benefits which parallel those provided by virtualization.  In the second post we covered WAFL and a new Flash Pool feature, and in this post we will cover storage efficiency.

In the interest of time, only a brief introduction to the features which comprise storage efficiency will be discussed here, with links to whitepapers for those who wish to dig deeper.  In future posts, we will look at additional features which will build upon and leverage and even extend these capabilities for new efficiencies as well as explore their combined value.

DEDUPLICATION

There are many deduplication systems on the market but few storage offerings will offer deduplication within the primary storage tier.  Often times you’ll have to purchase an additional device running a different platform for such a capability.   With NetApp’s ONTAP platform, the entire FAS product line from high-end to low-end has enjoyed this capability for years.  After the data is written to disk a post-process scan (which can be scheduled for off-peak hours) will scan for duplicate blocks (as granular as 4K), and deduplicate them – reclaiming the redundant blocks as free space within the volume.

How much space can be reclaimed?  It depends on the environment and how the storage is implemented, but when best practices are followed in virtualized environments, the reduction in storage will often range between 30% and 75% depending on the data set.  Think of all your common operating systems, which have common files and therefore common blocks – in VMware environments it is common to see a 75% reduction in storage for operating system drives.  A typical file share will often see a reduction of around 30% from common documents, media files, etc.

Not only does deduplication reduce storage capacity, but it also increases performance.  Imagine the scenario of a VDI boot storm or a failed ESX host with many VMs powering back on at once.  Because the common blocks are deduplicated, the I/O activity is reduced to a smaller set of SAN blocks, providing more opportunities for cache hits.  When ONTAP deduplication is combined with either Flash Cache or Flash Pools, a significant performance improvement can be realized in these and similar scenarios.

COMPRESSION

ONTAP also provides for compression which does not work on a file basis, but rather against a collection of adjacent blocks of up to 32K.  Intelligent algorithms will determine the “compressibility” of the blocks and will only attempt to compress if significant benefits can be realized.  The compression can be set for either inline compression (on write), post-process compression or a combination of both.  The post-process method is a bit more comprehensive and will compress blocks that the inline method may have passed over.

 

Compression of course can save I/O operations on both reads and writes but at the expense of CPU computations.  Generally speaking you will want to enable both inline and post-process compression on your archive and backup tiers, and the optimal settings for other tiers will vary based on both the application and how it is configured.  The following table gives an overview of the space that can be saved using combinations of dedupe and compression on different data sets:

 

Percent of Storage Saved with:

Application Type Compression Only
(average)
Dedupe Only
(average)
Dedupe +
Compression
(average)
File Services: Home Directories 50% 30% 65%
File Services:  Engineering Data 55% 30% 75%
File Services:  Geoseismic 75% 3% 75%
Virtual Servers & Desktops (OS Volumes) 55% 70% 70%
Database:  Oracle ERP 65% 0% 65%
Database:  SAP 70% 15% 70%
Email:  Exchange 2010 35% 15% 40%

More detail on deduplication and compression within ONTAP is available in this whitepaper

FlexVol — Thin Provisioning

In the ONTAP platform all FlexVols are thin provisioned – meaning that no SAN space is physically consumed until those blocks are actually written to and utilized.  This not only saves space, but can improve performance by helping to maintain a higher spindle-to-data ratio.

Thin provisioning is commonly found on several storage platforms, but in the ONTAP platform, not only is thin provisioning the default for all FlexVols (and across all storage protocols), but you can actually both grow and shrink – yes shrink! – a FlexVol after it has been provisioned.  This provides the maximum opportunity for both storage efficiency as well as flexibility.

FlexClone — Efficient Snapshots

Many SANs have snapshot/clone capabilities but often they come with severe limitations.  For example, several use the “copy on write” method which can be expensive both in terms of disk space and performance.  The FlexClone feature (datasheet) within the ONTAP storage platform enables the rapid creation of clone copies of production volumes.  When a FlexClone is created a small metadata update is made and then only any new changed blocks are written to disk.  No “copy-on-write” is performed and common blocks between the parent and child are fully leveraged.  This space efficient approach minimizes overhead and enables up to 255 snaps per volume.


Because of the FlexClone architecture, ONTAP can provide up to 255 snapshots per volume without the space and/or performance penalties typically associated with snapshots

“But what about my databases and VMs” you ask?  That’s an excellent question as those snapshots won’t be very useful for either QA, development or recovery if they are not application consistent.  This is where NetApp SnapManager comes in, which has the ability to properly quiesce applications including Exchange, SAP, Oracle, UNIX, Windows and VMware virtual machines.

Bottom line is that FlexClones allow you to quickly and effectively take point-in-time application consistent snapshots of your production data, while avoiding the storage capacity and performance penalties which are typically associated with snapshots.  This has profound benefits for QA and development (build up/ tear down) as well as backup and DR as we will get to in future posts.

SnapDrive and Space Reclaim

When Windows deletes data from the NTFS file system it simply updates the directory table, but leaves those blocks physically in use on disk.  In other words, the data is still there, but it’s just no longer “listed” in the directory.  This creates a disparity with the VMFS and SAN levels which are only concerned with whether a block contains data or not.  SnapDrive for Windows is NTFS aware and can extend information about deleted blocks to ONTAP allowing for the space to be reclaimed.

SnapDrive for Windows has other capabilities as well, but reclaiming NTFS space can have a compounding effect, especially where FlexClones are used.

RAID-DP

RAID-DP is the default RAID method used on ONTAP storage.  By integrating with ONTAP’s WAFL technology (reviewed in Part 2), RAID-DP enables the protection of double-parity but without the performance penalties.  According to NetApp, the performance penalty of ONTAP’s RAID-DP is between 2 and 3 percent relative to RAID-4, whereas the traditional write penalty of RAID-6 is often around 30%.  Additionally, RAID-DP is more space efficient than most RAID-5 implementations by enabling a larger number of spindles (up to 26 data spindles and 2 parity spindles per array).

But RAID-DP is mostly about protection which is key when using large SATA drives which have longer rebuild times.  With RAID-DP you can afford to lose 2 spindles within a RAID set, while your hot spare(s) are joining the array.  A double parity scheme (such as RAID-6) would be standard in more arrays if it weren’t for the performance penalty it brings, but RAID-DP solves this problem, allowing the best of both worlds  — improving protection, maintaining performance and optimizing capacity.

STORAGE EFFICIENCY

So far we’ve covered deduplication, compression, thin provisioning (FlexVol), efficient snapshots (FlexClone), Snap Drive Reclaim, and RAID-DP.  When you combine the sum of all these efficiencies you can understand why NetApp offers their guarantee that you will use at least 50% less storage compared to other offerings.  And all these ONTAP features are supported across any protocol — iSCSI, FC, FCoE and NFS — and aross the entire FAS product line.

For organizations already using a different storage array, you can still put a NetApp V-Series in front of most storage arrays and immediately gain the benefits of the ONTAP platform.  In fact, NetApp will guarantee a 35% storage reduction in this scenario as well as gurantee that the V-Series will pay for itself within 9 months.

I’ll be discussing value in more detail in future posts, but for now consider this quote from Mercy Healthcare (Innovator of the Year Winner 2012) and what they did across over 30 hospitals and 400 clinics:

Mercy Healthcare built a state-of-the-art data center and implemented a flexible cloud infrastructure to effectively deploy an Electronic Medical Health Record for storing and protecting patient information and, in the future, to support smaller clinics and healthcare systems. With the help of the NetApp FlexPod(R) architecture, we have saved over 40% of storage space, reduced power consumption by 50%, and now provide rapid access to and data protection for 1,742K patients.

In this post we introduced the technologies behind storage efficiency and in future posts we will take a more specific look at various scenarios – including backups and DR – to see how ONTAP as a storage hypervisor can provide benefits and agility which parallel and complement those provided by VMware.  And we haven’t even gotten to cluster mode yet!  Stay tuned….

The Storage Hypervisor Part 2 — Flash Pools

In Part 1 of this series we talked about how ONTAP could be seen as a storage hypervisor and how the benefits of this could mirror the benefits of a compute hypervisor like VMware vSphere.   The key ingredient is a common OS or platform from which we abstract from.

To give an example of this, NetApp is currently #2 in overall storage market share, but the ONTAP platform is the #1 storage operating system in use today, serving up exabytes of data (the number 5 storage OS is NetApp’s own Engenio line). And because of this common storage hypervisor, some pretty amazing possibilities come into play which can really position an organization for agility – let alone plain old value.  And the benefits aren’t just limited to NetApp storage as the V-Series and the ONTAP Edge Storage Appliance can extend these benefits into more areas — but more on that in future posts.

In this series I’d like to first take a look at some of the unique capabilities inherent in the ONTAP 8.1.1 platform ranging from storage efficiencies, multi-protocol, scalable “infinite and immortal” volumes and more – and then build on this to show how these provide more value in everything from disaster recovery, private cloud, test-dev, and of course agility.  Before I get into storage efficiencies, I thought I’d talk about a new feature just announced in ONTAP 8.1.1 – FlashPools.  But to best understand FlashPools lets take a step back and look at various technologies as well as ONTAP’s unique way of processing writes.

“You Don’t Have [Technology-X] So Mine’s Better!”

This hot rod might just have a turbo engine

So you frequently race cars with your friends and suddenly a new ACME turbo charger becomes available promising a 30% increase in horsepower.  You run out and buy it, hook it up to your car and you’re feeling great about your new turbo-charged wheels.  Your car is indeed much faster now.  By extension it must be superior to anything else that doesn’t have the ACME turbo charger!

Imagine your surprise when you’re friend’s car which doesn’t have the shiny new ACME turbo charger is still keeping up with you.  What happened?  You thought your car was superior but now you’re not so sure.  This mystery might only be answered by popping the hood on the other car and see how they do things.

Such is the issue with NetApp.  There’s been criticism from certain quarters around the lines of “you don’t use flash for primary storage or automated storage tiering” without the context and understanding of how NetApp’s ONTAP handles I/O.  At the end of the day what matters is performance and reliability.

Yesterday — WAFL (Write-Anywhere-File-Layout)

“A stack built upon a WAFL, is a stack optimized for sucess” — Confucious

Most SANs have an NVRAM cache that the controllers write to.  But ONTAP will journal the write requests to NVRAM.  This method not only consumes less NVRAM and improves recoverability, but also improves response times, and allows for disk writes to be optimized.   The NVRAM has two buffers — when the first is full, a consistency check will run to write all the entries to disk.  Now unlike some other file systems, WAFL will store metadata within files, which allows much more flexibility into how to write to disk — hence “write anywhere” — significantly improving write performance.  OK, this part was kind of technical but the take away here is that due to WAFL innovation, NetApp was always been exceptionally efficient at writes and didn’t always need fashionable technologies that others were implementing to keep pace. (for more detail on WAFL, here’s a whitepaper from 2006).

Flash Cache and now – Flash Pools

So FlashCache has been around for a bit now and is a part of NetApp’s Virtual Storage Tier.  It’s basically putting a PCIe Flash card next to your controllers.  The controller has fast access to the flash storage and can use it to effectively cache random read patterns.  This can dramatically reduce latency and can for example increase the number of  concurrent mailboxes that can be serviced by up to 67% in some tests.  This provides excellent results for reads, and it also improves write performance by offloading some read traffic from the hard disk system.   WAFL along with Flash Cache did a great job at delivering performance for years, but today the demands of the most write intensive transactional systems could benefit from something more.

Flash Pool

New in ONTAP 8.1.1 is support for Flash Pools.  A Flash Pool (using patented technology which is an extension to WAFL) is essentially augmenting a logical aggregate of hard disk drives (HDD) with flash SSD drives.    But here’s the twist – the primary data stays on the HDD tier and never moves around (except for write cached blocks once they become “cold”).  The intelligent algorithms will populate the SSDs with the most frequently used read data to accelerate reads.

At the same time, random writes will take place to SSD drives while sequential writes will use the HDD drives, allowing for the most effective use of HDD/SSD depending on the pattern.  So from a write perspective the SSD drives are used to offload the I/O activity of random writes.  Now some applications will intensively write, then read, and then overwrite the same data.  FlashPool is uniquely equipped to service this type of data, as it will offer both read and write acceleration for this data.  The read-write cache will automatically adjust for your workload patterns — set it and forget it.  And Flash Cache and Flash Pool are designed to complement each other, giving you the option to experience the benefits of both working together.

Compare this to automated storage tiering – these systems will move your primary blocks around the different tiers, where as FlashPool is an extension to the WAFL technology which leverages the SSD pools for block caching and write staging.  Some are inclined to think that a SAN without automated storage tiering system that’s moving your primary blocks around to different tiers is somehow deficient.  Automated Storage Tiering can certainly be used to drive more performance, but so can Flash Pools.  But before you write off a solution because it doesn’t have automated storage tiering, take a look under the hood and find out what’s really going on.

PERFORMANCE AND EFFICIENCY

At the end of the day the primary two benefits are performance and efficiency.  Performance to drive faster reads and writes than were possible before.  Efficiency to use fewer resources.

Consider the following scenario – a pair of FAS6210’s with 240 SAS 600GB drives for a total of 144TB.  Now make the following changes – replace those 240 SAS 600GB drives with 216 SATA 1TB drives.  Now add twelve 100GB SSDs into the flash pool.   What’s the net effect?

According to NetApp this increased capacity by 47% and reduced cost by 23%.  That’s a 50% reduction in cost on a per/TB basis and all while consuming 26% less power.  What about performance?  According to NetApp, IOPS did not change more than 2% (plus or minus) from the baseline, but that response times were significantly improved.  In other words, improvements in both capacity and cost were realized, without any decrease in performance.  That’s a nice combination to have when you can get it.

And of course what enables the Flash Pool?  ONTAP does.  Any SAN in the FAS family running ONTAP 8.1.1 or later can use Flash Pools.  In the next post in this series we’ll take a look at some of the ways in which the ONTAP platform makes storage more efficient.  Towards the end of this series I’ll shift focus into showing how the multitude of ONTAP (storage hypervisor) benefits can lead to savings and agility that parallels what VMware has enabled.

Enter The Hybrid Cloud

I think there’s a huge opportunity (and need) for hybrid cloud management tools.  Let me explain.

First of all, there is a great deal of intellectual property that is either highly sensitive and/or subject to regulatory controls.  Actuarial data, R&D, medical records, big data analytics and more.  Many organizations will not allow such profoundly sensitive data onto externally hosted infrastructure for several reasons.  They are familiar with their own security protocols and governance and also in their ability to demonstrate compliance with auditors.  For these reasons I think that a lot of organizations will choose to keep sensitive intellectual property in various private clouds (control) as opposed to public clouds.

But what about all the workloads which don’t share the same intellectual property concerns?  Here the public cloud has two key advantages – one is that it has a lower per-unit cost structure due to “datacenters of scale”.  Second, public clouds are more elastic.  It will often be quicker to consume capacity on a hosted public cloud, than it would be to add capacity to a private cloud.  A third benefit of public cloud would be a reduction in the datacenter operational burden on the IT department.  Running a datacenter is expensive and challenging – why increase your internal operational burdens?

So it seems we have strong use cases for both private and public cloud.  Enter the hybrid cloud.  But now we have new challenges….

Having a different management portal for private and public clouds raises some new challenges.  It’s not just having “single-pane-of-glass” visibility into the aggregated environment but how are you going to maintain consistent security and governance across your hybrid cloud?  How are you going to be able to demonstrate and ensure compliance for HIPPA, PCI and many other auditable requirements?  And if you’re doing chargeback, how do you effectively keep track of it all?

For these reasons and more, I think there is a huge opportunity in the future for hybrid cloud management tools.  VirtuStream is once such vendor with their xStream offering and I’m sure there are others – as well as products under development right now.

What do you think?  Are hybrid clouds going to become more common place and is there a strong need for cloud management tools which can transcend and manage across the elements in hybrid clouds?  Any while we are on the topic what about multiple-hypervisors as well?  Do we need some sort of reference architecture for cloud components in order to enable more effective management solutions?

The Storage Hypervisor – Part 1

One of the keys – I think – to the benefits of cloud computing is abstraction. By abstracting our workloads from the boundaries of physical hardware we find ourselves able to do things and manage our resources in ways we haven’t before. When using hardware virtualization – such as VMware vSphere – the hypervisor not only frees us from the limitations of physical hardware boundaries and compatibilities, but provides a new entry point for management, creating a new paradigm.

Things we used to do with installing agents on each OS – such as monitoring, backups and more – can now be orchestrated through a single hypervisor. Provisioning an OS on bare metal used to be cumbersome and often complicated and expensive tools would be needed to do this at scale. With virtualization, not only can we push out a pre-configured OS in minutes, but with the right tools we can do the same for multi-tiered applications complete with firewall rules. These benefits enable us to position the IT organization for Agility. What if we could do the same with a storage hypervisor?

The concept of a storage hypervisor is similar – abstract away the traditional boundaries and limitations of storage with a unifying platform that enables a new paradigm of efficiencies leading towards agility. In fact, Wikipedia defines storage hypervisor as follows:

The storage hypervisor, a centrally-managed supervisory software program, provides a comprehensive set of storage control and monitoring functions that operate as a transparent virtual layer across consolidated disk pools to improve their availability, speed and utilization.

Before we begin looking at some of the benefits a storage hypervisor could provide, let’s first look at some of the requirements.

UNIFIED STORAGE

This term gets tossed around as much as “cloud computing” or “open source” and as often is the case, it can mean wildly different things. Perhaps the best way to illustrate the concept on unified storage is with contrast. Consider a SAN solution which runs three different operating systems – two versions of storage code, plus a 3rd Linux OS for management. To support all this, the SAN solution requires 9U of rack space, compared to just 5U for a competing SAN that uses only a single – and unified – set of code.

The picture above helps us to visualize just how much cost and complexity – and rack space – is reduced when a truly unified solution is used. Only the rack space savings is visualized, but other benefits are implied and these efficiencies will be detailed in future posts.

NetApp’s ONTAP as the Storage Hypervisor

NetApp’s FAS product line (which excludes the Engenio acquisition) is unified around the ONTAP operating system. One platform, one common set of code, from which to abstract storage and provide some outstanding features that provide new levels of efficiency – and when effectively combined can provide a platform for Agililty.

This series will explore these benefits in detail, but here’s just one to start with. With most SANs if you want to migrate from their “mid-range SAN” to the “high end SAN” you have to go through a costly and time consuming data migration to the new environment. Within the NetApp FAS series, you can upgrade simply by essentially swapping out (or replacing) the controllers. The existing disks and data stay in place allowing you to leverage your existing disk/SAN investment without any complicated data migrations.  That’s a pretty big benefit!  Or should I say Agile?

Earlier this morning Vijay Swami ( @veverything ) and I were discussing a Morningstar article which suggested that another storage vendor was more resilient and “bullet proof” because they had “developed a broad portfolio of solutions, each with a specific target” as opposed to NetApp having a higher “concentration of risk” to potential market “disruption” due to having a single platform.

Let’s think about this carefully for a minute. Is a vendor more “bulletproof” to potential market disruption because their product mix is a patchwork of different solutions that are less than fully integrated? I’d suggest that what’s paramount here is a vendor’s ability to effectively service different market segments (high-end, mid-range, low-end, etc.). If a vendor can effectively serve these same markets with a unified platform instead of a more fragmented offering with different technologies for different markets, that makes the vendor more agile. They will have a lower cost structure and support and integration is simplified. Perhaps the customer benefits the most by having a common OS across all classes of storage – even DR and backups.

In this series I’ll try to take a look at the ways in which ONTAP provides unique benefits for storage efficiency in a virtualized environment, and then build on these features to show how they can be combined with VMware and others in a virtualized environment to empower organizations with Agility.

Is IT a Cost Center? Evolving Towards Agility

Over a decade ago – and well before today’s cloud concepts – I was encountering limitations in defining and expressing the value of various IT initiatives, when I came across a Gartner article discussing how evolved IT organizations would move away from the concept of IT as a cost center.  What exactly does that mean?

The idea of a cost center is basically a necessary expense of doing business.  You need email, and your traditional client-server applications and these require servers and operational overhead.  Thus IT was often perceived as a necessary cost of doing business, and the way to get projects approved was to demonstrate ROI and TCO benefits.  But not all benefits can be effectively measured with either metrics or against alternative options, and thus we have intangibles. 

Intangibles are essentially benefits which are not easily measured.  What is the value of my sales force having access to SaaS applications on their iPads in the field?  What’s the value of rewriting my legacy application to newer agile methods?  What’s the value of choosing an IT strategy and architecture that enables IT to quickly respond to business demands – or in other words, Agility?

This is the horizon at which we realize that IT doesn’t need to be just a cost center, but it can be a vehicle to provide value to the organization.  Long before this “cloud” thing, Bill Gates wrote about the value of quickly getting valuable information to the right people in “Business @ The Speed of Thought”.  The concepts are the same but today in the cloud era we have a new set of technology enablers towards this end.  Today we see this concept exploring new avenues such as leveraging “Big Data” to deliver analytics which could profoundly improve decision making at the highest level.  And what is the value of improving decision making at the highest levels?  What is the value of being first to market?  What is the value of out-maneuvering your competitors?

Above is a value triangle I used in previous posts on Agility within the context of IaaS.  At the bottom level we use virtualization to reduce our capital expense (CAPEX) by reducing servers and associated variable costs.  But then as the organization evolves they discover new capabilities and methods to reduce operational expense with virtualization, automation and orchestration.  Building on this foundation, the evolved IT organization will make strategic moves to capture the value within the Agility zone – such as converged infrastructure and self-service orchestration at the top of the stack.  This is a paradigm shift in both how IT is utilized and perceived.  No longer is IT a mere cost center – now the IT organization is a trusted business partner that can assist in the rapid execution of business strategy.  What’s the value of that?

Some recent good posts I’ve seen that touch on this concept are Chuck Hollis’ post on The ROI Trap and also Mark Thiele’s  “Cloud is a State of Business”.  I’ll be expanding on this concept in future posts at multiple levels of the “stack”.