VMware Pursues SDN With Upcoming NSX Offering

Earlier this week VMware announced VMware NSX – an upcoming offering that takes network virtualization to new levels. NSX appears to be somewhat of a fusion between Nicria’s SDN technology (acquired last year by VMware) and vCloud Network and Security (vCNS – formerly known as vShield App and Edge). Since I already had intentions to

NSX2NSX2

What Really Is Cloud Computing? (Triple-A Cloud)

What is cloud computing?  Ask a consumer, CIO, and salesman and you’ll likely get widely varying responses. The consumer will typically think of the cloud as a hosted service, such as Apple’s iCloud, or uploading pictures to Photobucket, and scores more of like services (just keep in mind that several such services existed before it

3pillars_f3pillars_f

Agility Part 2 — The Evolution of Value in the Private Cloud

When an IT project is commissioned it can be backed by a number of different statements such as: “It will reduce our TCO” “This is a strategic initiative” “The ROI is compelling” “There’s funds in the budget” “Our competitors are doing it” Some of these are better reasons than others, but here’s a question.  Imagine a

agility2agility2

Stacks, the Vblock and Value — A Chat with EMC’s Chad Sakac

…I reached out to EMC’s Chad Sakac to gain more insights from his perspective on how the various stacks…well…stacked up….

stacksstacks

Should You Virtualize vCenter Server (and everything else?)

When concerns are raised around virtualizing vCenter Server, in my experience they usually revolve around either performance and/or out-of-band management. The VROOM! blog at VMware just published a whitepaper that looks closely at vCenter Server performance as a VM versus native (physical) which speaks to these concerns as well as for other workloads. vCenter Performance

vcenter_virtvcenter_virt

Can your VM be restored? VSS and VMware — Part 2 (updated)

The backup job for your VM completed successfully so the backup is good, right? Unfortunately it’s not that simple and a failure to effectively deal with VM backups can result in data loss and perhaps even legal consequences.

vss2vss2

Let Your Fast Zebras Run Free (with a Vblock)

What makes an IT department effective and agile?  How can you let the stars on your team succeed while reducing gridlock and OPEX (operational expenses)? Jon Katzenback and Zia Khan have an intriguing post at Harvard Business Review on fast zebras.  What’s a fast zebra?  I’ll let them explain: Mark Wallace, former US Ambassador to

Zebra galloping in the Masai Mara Kenya.Zebra galloping in the Masai Mara Kenya.

Van Halen on Cloud Security

What in the name of rock-and-roll does Van Halen have to do with the cloud? Join us on a magical journey filled with wonderment and perplexity as we seek to understand this parable.

vh_fivh_fi

LG Optimus Zip Review — Help Raise One Million Dollars for the Make-A-Wish Foundation

The Make-A-Wish Foundation has partnered with Straight Talk Wireless in an effort to raise one million dollars for the Make-A-Wish foundation.  I was provided with the one of the phones and I wanted to take the opportunity to review the LG smartphone, the Straight Talk service and this special promotion which benefits the Make-A-Wish Foundation (you can read about my family’s own experience with the Make-A-Wish foundation here).

I received my phone last week and activated it, but I did not have time to write up the review for a week later, but fortunately there is still one more week left on the promotion.

THE PROMOTION

The promotion is that for each demonstration of a Straight Talk phone at a WalMart, one dollar will be donated to the Make-A-Wish foundation (visit oneminuteonemillion.com to check for time and hours at your local WalMart store).  If you are unable to demo a phone in person, ten cents will also be donated for every unique view of the promotional video at oneminuteonemillion.com.

THE SERVICE – STRAIGHT TALK WIRELESS

LG Optimus Zip

Straight Talk is a service provided by TracFone Wireless and according to their website they are America’s largest no-contract provider with over 19 million subscribers.  The way it works is that you first purchase a phone, which you then own outright.  Then you purchase time in monthly (30-day) increments.  There is no contract, and you don’t ever have to worry about counting minutes.  For $45 you get all the bandwidth you can consume for 30 days.  If you decide that you don’t need to maintain service on the phone, then simply don’t refill your phone after the 30-days.  On the flip side you also have the option of purchasing multiple 30-day “chunks” and they will automatically be added on to the end of your current term.

Keep in mind that this is unlimited voice AND data for $45 a month.  For comparison I just reviewed prices for Sprint and Verizon.  Sprint is $110 monthly for unlimited voice & data and Verizon is $100 for unlimited voice and 10GB of data.  If you desire an unlimited data plan, the StraightTalk service seems to have the potential to save $50 a month or more, and you don’t ever have to commit with a contract.

As for the quality of the service, I did not notice any significant difference between the Straight Talk service and the Sprint service on my phone.  Almost every time I checked my signal meter was just as strong as my Sprint service if not better (I tested the phone exclusively in NorthEast New Jersey).

ACTIVATION

I received my phone and the activation process took about 20 minutes.  You have the option of performing the activation process over the phone or using the Straight Talk website.  I went to the Straight Talk website and created a profile and then registered the serial number of my phone.  Then I added the keycode for the first 30 days of service, dialed a few numbers to work with a guided programming of the phone, rebooted the phone and I was able to place and receive calls.  For most the activation process will take around 20-30 minutes and you can do it all from the comfort of your own home.

THE PHONE – LG OPTIMUS ZIP

The LG Optimus Zip (pictured below) runs Android 2.3 (Gingerbread), so you can run most of your favorite Android apps, including Netflix (which I tested successfully), Pandora, YouTube, Google Maps and many more.  Here’s a short list of the basic phone capabilities:

  • •    Android 2.3 (Gingerbread)
  • •    3.2” touch screen
  • •    GPS, WI-FI, and Bluetooth 3.0 enabled
  • •    3.2 Megapixel rear-facing camera with video capability (VGA resolution)
  • •    Slideout QWERTY keyboard

The LG Optiumus Zip has a 3.2” touch screen which makes the viewing area just a bit smaller than the iPhone 4 (3.5”) for a reference point.  The phone is very nice and compact, but the slideout keyboard on the bottom does add just a bit of thickness and weight to the phone.

LG Optimus Zip with keyboard extended

No this is not an iPhone 5 or a Samsung Galaxy S III, but it is a very capable smartphone, loaded with a full slide out keyboard which can do most anything that we expect our smartphones to do, from GPS navigation, email and text messaging, music/movies/books and the full array of Android apps available in the Google Play store.  Some parents may even like this type of phone for their children as the monthly renewal of the service could be contingent on maintaining adequate grades for example, and you never have to worry about exceeding data limits or additional charges.

LG Optimus Zip (left) next to an iPhone 4 (right)

The bottom line is that this is a very capable Android smartphone and some will find the $45 unlimited data quite appealing.  You never have to worry about contracts or minutes – just purchase time in 30-day increments at a rate that is often half that of what competing networks would charge for a month of unlimited voice and data.

And there’s another good reason to check this phone out – for each person who tries out the phone at a participating WalMart, $1 will be donated to the Make-A-Wish Foundation with a goal of raising one million dollars for an incredible organization that does so many wonderful things.  If you can get a chance, take one for a test drive.  For more information on the Straight Talk and Make-A-Wish promotion, please visit oneminuteonemillion.com.

One Minute — One Million

The Make-A-Wish Foundation is close to my family’s heart (our experience is detailed here), so when I learned that Straight Talk Wireless was teaming up with the Make-A-Wish Foundation for a great promotion, I wanted to spread to help spread the word.

Straight Talk Wireless is donating one dollar for each individual who demos an LG smartphone at their local Walmart on either Saturday October 6, or Saturday October 13.  Just for demoing the phone, Straight Talk Wireless will donate $1 to the Make-A-Wish Foundation — and if you can’t make it to a Walmart — each video viewed on oneminuteonemillion.com will result in a donation of 10 cents.  The goal of the program is to raise one million dollars for the Make-A-Wish foundation — all you have to do is experience one of their phones!

Straight Talk Wireless and the Make-A-Wish Foundation were kind enough to provide me with an LG Optimus Zip this past Friday and I’m just now getting around to getting it setup.  I’ll post back here in a day of two with a more detailed review of the phone, and the Straight Talk wireless service.  For more information about the program between Straight Talk Wireless and the Make-A-Wish Foundation, please click the graphic below.

 

“Dude! Where’s my Server?” – Firewall Edition

Remember when server virtualization was still new and untested and we (endearingly) referred to the skeptics as “server huggers”? You know the type. They’d walk into the server room and say “which server is mine?”  You could always answer in confidence and tell them that their server is “somewhere in one of these first 3 rows of server racks”. Maybe they just wanted to know where to put the asset tag? Or perhaps give it one last hug and feel the warmth eminating from the air vents.  And when it came to P2V, remember the look on their faces right before they said “you want to do what to my server?!?”

We humans don’t naturally accept change very well, but eventually most server huggers would come to accept server virtualization as being safe. Not only has virtualization become socially normalized, but the economic drivers of CAPEX, OPEX, Agility – and even performance have led many former server huggers to accept server virtualization. After all, it is the abstraction of physical resources which is perhaps the biggest enabler of a new paradigm shift of benefits – and to enable and take advantage of these new benefits we had to think differently when it came to servers.

WHAT ABOUT FIREWALLS?

Firewalls can be abstracted too. When we start going over our Visio diagrams of networks and thinking about VLANS, routes and security often times we think in terms of physical hardware. “I need to have two firewalls here – load balancers there, and another firewall for this remote web farm”.  But what if we could abstract firewalls and virtualize them such that perhaps for some elements we didn’t need to purchase and deploy a physical firewall?

VMware vSphere customers who are at the Enterprise Plus level essentially just got a free upgrade to vCloud Suite Standard which includes virtual application firewall capabilities in both vShield App and vShield Edge. And those who upgrade to vCloud Suite Advanced also gain a virtualized load balancer.  Cisco also makes a virtual edition of their Adaptive Security Appliance (ASA) – the Cisco ASA 1000V – which can be integrated into VMware vSphere environments as well.

 

Virtual Firewalls? Sweet Dude!

With solutions like these – abstracting firewalls and network security – it is now possible in many cases to build your security policy into your virtualized environment. Need a web server policy to open 443 and 80 for a specific group of servers while only allowing a custom high-level SQL port back inside? We can do that.  Firewalls between servers which might even be running from the same physical host?  No problem.  By abstracting network security to logical boundaries we might be able to provision applications more quickly and more securely — and perhaps also not needing to purchase as much physical network hardware as we are accustomed to thinking. And with VMWare’s acquisition of Nicria this movement to abstract the network layer has only just begun.

Physical network hardware isn’t going away, but as we review our designs we might want to start thinking about virtualizing certain components of our network security and consider it as an option. Over the longer run, I suspect we will see even more abstraction at the networking level over the years.

It’s Christmas in August for VMware Customers

Some huge announcements were made at VMWorld today, many of which will be warmly received by VMware customers. I’m not referring to some of the great new features like the vSphere 5.1 web client, but things like new product entitlements, improved licensing and more.

Licensing and vRAM

Last year VMware announced a new vRAM licensing component with some controversy. If customers wanted to deploy a large server with 512GB of RAM for example, they would have to purchase additional vSphere licenses to accommodate the memory consumption.  This “vTax” on memory-dense servers has now been lifted, leaving per processor licensing as the only component.  This will enable customers to immediately leverage the new 64 vCPU capabilities of vSphere 5.1 for large virtual machines without having to incur additional licensing costs.

vShield

vShield was formerly sold separately as three products:

  • vShield Endpoint (antivirus protection for VMs)
  • vShield App (Virtual Application Layer Firewall)
  • vShield Edge (Gateway Security Appliance (firewall/NAT,VPN, etc.))

vShield Endpoint is now included in vSphere 5.1 Standard and higher editions. Current vSphere customers with support essentially are getting vShield Endpoint for free.

The other two vShield products, App and Edge (not to be confused with Bono and Edge) have been rolled up into a new product called vCloud Networking and Security (vCNS) which is a part of the new vCloud Suite. vSphere customers at the Enterprise Plus tier are entitled to a free upgrade to the Standard edition of vCloud Suite. Eligible customers who take advantage of this are essentially getting vShield App and Edge for free.

vShield features such as high availability for firewalls and load balancing will require an upgrade to the Advanced Edition of vCloud Suite.

vCloud Director (vCD)

vCloud Director is a key and integral part of VMware’s IaaS solution framework, and because it is also a part of the new vCloud Suite, eligible customers will essentially get vCloud Director (and vCloud Connector) for free. Very nice!

Breaking It Down

Here’s a basic table summary of the new product entitlements available to vSphere customers:

New vSphere Benefits

StandardEnterprise Enterprise Plus
no vRAM license restrictionno vRAM license restrictionno vRAM license restriction
vShield EndpointvShield EndpointvShield Endpoint
vShield App & Edge
vCloud Director & Connector

Those are some pretty nice benefits which I suspect many VMware customers will be very pleased with.

VMware vSphere 5.1 Feature Preview

vSphere 5.1 might technically be a “minor” update, but just like vSphere 4.1 there are some very significant new features – many around increased scalability and improved operations.  I had the opportunity to spend a little bit of time with the vSphere 5.1 beta and I thought I’d quickly share some of the new feature highlights and perhaps drill into some of these in more detail in the future.

UPDATE:  

vSphere 5.1 now includes vShield Endpoint for AV protection!  For more details on what’s new in vSphere 5.1 also see this whitepaper from VMware

SCALABILITY

The maximum number of hosts which can share a file in read-only mode has been increased from 8 to 32.  These improvements in VMFS locking will enable larger clusters in environments which use linked clones, such as VDI and vCD environments.

The “Monster VM” has also been taking his vitamins as 64 – yes, sixty-four! – vCPUs can now be assigned to a virtual machine.  VMware also announced the end of vRAM as a licensing component which means organizations will be able to utilize these features in larger VMs without an additional license penalty.

And finally improvements have been made in MSCS support to enable Failover Clusters with up to 5 (five) nodes.

VMOTION

Host vMotion can now take place in the absence of shared storage.  Needless to say a low latency environment (such as a Metro Area Network or better) is ideal, but this can empower new scenarios for migrations and many other scenarios.

Also Storage vMotion is now supported for up to 4 parallel operations across 8 volumes.

STORAGE

  • Install ESXi onto an FCoE LUN
  • Suport for 16GB HBAs
  • Improvements for handing All Paths Down (APD) conditions
  • Improved Storage DRS for Linked Clones
  • Install ESXi to FCoE disks
  • Space Efficient Sparse Virtual Disks for Linked Clones

The last is especially interesting as in the past you had to use SDELELTE and manual cumbersome steps to reclaim space from a VM.  Recall that Windows does not delete blocks when a file is deleted, it only removes the directory entry.  Now VMware Tools can initiate a scan in the OS of unused blocks and reorganize those blogs to leave a contiguous collection of blocks at the end of the disk.  Then a SCSI UNMAP command will be sent to the SAN allowing the space to be reclaimed from your thin disks.

Another advantage on the new sparse disk model is that snapshot chains no longer consume space when deleting snapshots.  In the past it was possible to get into a situation where the volume was full and you didn’t have any free space available to consolidate any open snaps.  Because now each snap in the chain consolidates directly into the base disk (as opposed to other snaps and then the base last) no additional free space is required to consolidate snapshots.

NETWORKING

At times, physical networks uplinks on the ESX host can be configured in such a way that proper function is not possible.  vSphere 5.1 includes a new health monitor for networks which checks teaming, VLAN and even MTU settings such that an alarm is created if a network configuration might not deliver the intended results.

In many environments, backups of switch and router configurations are maintained and always at the ready in the event a prior state needs to be restored.  Now your vDS switches and their port groups can be backed up and restored, which could come in useful if your vCenter server were to become unavailable.  The system can also automatically rollback to the previous networking state if networking is found to have been disrupted.

In addition vDS will now support both Port Mirroring and LACP.

TAGGING

This is a nice feature which can help to organize various vSphere building blocks (VMs, networks, volumes, etc.) and relate them to applications, teams or whatever groupings might be helpful.  It works much like tagging pictures for anything to do with “cooking”, “bird watching” or “family” for example.  You can tag virtual machines that exist across different vSphere clusters as being part of the same application, business unit or whatever construct you find useful.  Then when you search for a tag, you can quickly bring up a list of all the objects which possess that tag.

Web Client

The vSphere Web Client has gone through a complete overhaul and I think that most will be pleasantly surprise at just how much of the vSphere GUI functionality is now in the web client.  I tested the web client with Google Chrome and it was fast, response and an enjoyable experience such that the web client did not feel like a second-class citizen.

vSphere 5.1 Web Interface on Google Chome

Single Sign On

Single Sign On is now provided across the web client and the vCloud infrastructure suite, without having to login to the components individually.

VMware Tools & Upgrades

For those that remember the pre-virtualization days, it was often a chore to update hardware driver components (think HP SIM, Dell OpenManage, etc.).  Often times you’d get notification that a certain driver was not the current release, but did that mean there was a compelling reason to upgrade?  Now VMware Tools are yesterday’s hardware drivers and with the version changing  with some ESX patches, how important is it to update VMware Tools and incur a reboot on your guests?

The new model hopes to reduce this by mapping VMware Tools to the virtual hardware version (now “virtual machine compatibility” in the Web UI.  Furthermore, reboots will be required less often in future VMware Tools releases (after the update to 5.1) due to improved driver management.  I think that reboot-free VMware Tools upgrades will be a popular feature in many environments.

GRAPHICS

VMware View environments will benefit from the ability to leverage GPUs to increase the quality of virtual desktops, especially in the areas of full motion video, 3D graphics, and more.

AUTODEPLOY

Auto Deploy now supports stateless caching to enable operations to continue when an Auto Deploy server becomes unavailable.  Also a new Stateful Install option can make it possible to deploy an ESX host more quickly in several scenarios.

SUMMARY

Those are some of the bigger highlights I was able to capture .  I’ll be looking for even more details as VMworld progresses.

Certification Challenge Update

A few months ago I posted a note on my certification adventure and thought I’d post a brief update.

My feelings on certification are mixed.  A decade ago I had the highest certifications available from Microsoft and Novell but at times I word work with others who had the same certifications and I didn’t always see the knowledge and/or troubleshooting skills one would expect.  Once I had a level of experience which I felt spoke more loudly than my technical certs I gave up on them.  I allowed them to expire and focused on a business MBA instead, which (thus far) hasn’t helped my career but I really enjoyed the knowledge and understanding I gained from it.

Having said that certifications do have value in that it is often a checklist or validation that you have some current knowledge in a fast changing field.  The best certification exam I ever took was a Novell Uber CNE or something like that were the test was to fix and repair a REAL broken NDS tree.  As often there is in the real world, there is more than one possible solution to a problem and you were graded by the efficiency of the path you took.  Most certification exams prove little more than that you can pass a multiple choice exam — but the VCDX “exam” is the exception here.  To become a VCDX you must present a complete datacenter design and defend every decision you make in front of a panel of judges.  Now that’s a certification that you can’t fake!

Back to me.  My goals were to get certifications from NetApp (NCDA), Microsoft (MCITP) and VMware (VCP & VCDA).

I passed the NCDA exam and I’m about halfway done with the MCITP path.  For me the most interesting and relevant certifications are the VMware ones, but unfortunately a mandatory 5-day class is required to take the VCP exam and budget and logistical constraints are an obstacle here.  When resources allow for it, I’m really looking forward to starting the VMware path, with a longer term goal of working towards VCDX certification.

iSCSI MultiPathing with VMware vSphere

Every now and then I come across an iSCSI configuration which does not conform to best practices.  There’s several great posts that cover this, but I thought I’d try to briefly cover some the basics and FAQs in this post.

There are some unique problems which could be lurking under the hood of some environments if they do not conform to best practices – let’s take a look.

LINK AGGREGATION

This is something most VMware adminsitrators are familiar with.  You might have four 1GB NIC ports bonded together into a single vSwitch and available for use by virtual machines.  Does this mean that you essentially have 4GB of bandwidth available for your VMs?  Well….yes and no….

Yes, because you have a total pool of 4GB available.  No, because a single VM (or conversation) will only use one pNIC (physical NIC at a time).  Stated another way, you have a 4GB logical pool of bandwidth, but a single VM/session can not use more than what is available on one pNIC at a time (1GB in this case).  For a bit more detail on this see an earlier post on Load Balancing in vSphere.

Getting to iSCSI we more or less have the same thing.  Lets say you created an iSCSI port group (VMkernel) and gave it access to two active NICs within the vSwitch, such that the NIC Teaming for your iSCSI port group looks like this:

Does this mean you have 2GB available for iSCSI?  Absolutely not, and you have no multipathing either.

In ESX 3.5 only a single iSCSI session / TCP connection to a target is supported as noted in the iSCSI Configuration guide which explains “storage systems with a single target containing multiple LUNs have all LUN traffic on that one connection.”  ESX 4.0 was improved to allow multiple iSCSI sessions, but you can’t get to where you want to be just by aggregating the NICS.  Here’s is what the iSCSI Multipathing whitepaper (ESX 4 and 5) says:

In case of simple network adapter teaming, traffic will be redirected at the network layer to the second network adapter during connectivity failure through the first network card, but failover at the path level will not be possible, nor will load balancing between multiple paths.

So we do have fault tolerance (at the NIC/port level), but we have no load balancing or multipathing.  What you really want to to have two iSCSI port groups (each with their own IP) and each port group with 1 active NIC.  If you’ll forgive my graphic skills I’ve attempted to visualise this below:

 

Above we have two iSCSI port groups, each with their own vmkernel IP.  Each port group has only one active pNIC assigned and ideally each one going to a different physical switch on the network.  With this configuration we have true multipathing being done by VMWare’s iSCSI Software Initiator within ESX.  To make sure an iSCSI port group is only using one NIC, you should modify the NIC Teaming to look like this:

And of course for a final step we need to bind our vmknics to the Software iSCSI adapter.  All of this is well detailed in the Multipath Configuration for Software iSCSI whitepaper.

One more quick note — if you are running vSphere 5.0 please make sure you have Update 1 installed as this corrects a bug in ESX 5.0 in which an All Paths Down (APD) condition can occur due to iSCSI traffic taking the wrong path — even with a correct iSCSI configuration — which can severely cripple the affected ESX host(s).

Not going to VMworld? Register for VMware NOW on Monday August 27th

If you’re like me and unable to attend VMworld this year, you can still “virtually” attend the conference with on-demand access to keynotes, new product overviews, demos and more.

I’m especially excited about this year’s VMworld as not only is there the potential to learn about new product announcements, but we will get more insight into VMware’s vision and strategy of the “software defined datacenter” following VMware’s blockbuster purchase of SDN vendor Nicira for $1.2 billion (with a B).  The hypervisor continues to mature and there should be much to talk about here (along with complimentary products), but this year’s event will be a bit more revolutionary following the Nicira purchase and management changes and I’m very excited about learning more about how VMware plans to execute and provide solutions around this new vision of the datacenter.

To register for VMware NOW and learn about new offerings, solutions and strategies on Monday August 26, follow this link.

VMware Making Bold and Strategic Moves

This month VMware announced two acquisitions – Dynamics Ops and Nicria—the latter being a $1.26 Billion acquisition.  These are bold and strategic moves which I think tell us a lot about where both VMware and the IT world are heading.

First let’s take a look at cloud management.  I touched on part of the issue in “Enter The Hybrid Cloud” – increasingly organizations are leveraging cloud compting and services but still have a need/requirement for internal or private cloud.  We see large companies like GM pulling back from outsourcing and concerns in several orgs about a “pure” public cloud, but still wanting to leverage public cloud as a tactical solution.  How do organizations enforce governance and consistency across these disparate clouds?  Dynamic Ops is a huge piece of the puzzle and greatly expands VMware’s ability to promote their vision and provide effective management for private and hybrid clouds.

Nicria is a huge acquisition — $1.26 billion for a solution that is first now just hitting the market – SDN or Software Defined Networking.  SDN at a high level is essentially abstracting the network stack from networking hardware, much as VMware has done for server hardware.  Networking functions of switches and routers become abstracted from the traditional network hardware model.   In “What Really Is Could Computing?” I discussed how (I think) abstraction is a key to the efficiencies of cloud computing by providing an abstraction layer from which you can orchestrate and manage.  Imagine if we can provision networks through an abstraction layer and combine with VXLAN, and combine with cloud orchestration across the rest of the stack?  This has big implications for everything from how we provision, manage and even how we think about DR scenarios.

There are many possibilities here and both speak very well to VMware’s vision of the “Software Defined Datacenter”.  Once you abstract servers, storage and networking you can orchestrate, and VMware just picked up two companies that could be key to such a vision.   I think that both acquisitions are excellent strategic moves by VMware which tell us much about VMware’s vision and I think there is much for customers to be excited about as well.

The Storage Hypervisor Part 3 — Storage Efficiency

In the first post in this series we discussed how ONTAP – the #1 storage platform in terms of revenue – is a storage hypervisor of sorts – providing benefits which parallel those provided by virtualization.  In the second post we covered WAFL and a new Flash Pool feature, and in this post we will cover storage efficiency.

In the interest of time, only a brief introduction to the features which comprise storage efficiency will be discussed here, with links to whitepapers for those who wish to dig deeper.  In future posts, we will look at additional features which will build upon and leverage and even extend these capabilities for new efficiencies as well as explore their combined value.

DEDUPLICATION

There are many deduplication systems on the market but few storage offerings will offer deduplication within the primary storage tier.  Often times you’ll have to purchase an additional device running a different platform for such a capability.   With NetApp’s ONTAP platform, the entire FAS product line from high-end to low-end has enjoyed this capability for years.  After the data is written to disk a post-process scan (which can be scheduled for off-peak hours) will scan for duplicate blocks (as granular as 4K), and deduplicate them – reclaiming the redundant blocks as free space within the volume.

How much space can be reclaimed?  It depends on the environment and how the storage is implemented, but when best practices are followed in virtualized environments, the reduction in storage will often range between 30% and 75% depending on the data set.  Think of all your common operating systems, which have common files and therefore common blocks – in VMware environments it is common to see a 75% reduction in storage for operating system drives.  A typical file share will often see a reduction of around 30% from common documents, media files, etc.

Not only does deduplication reduce storage capacity, but it also increases performance.  Imagine the scenario of a VDI boot storm or a failed ESX host with many VMs powering back on at once.  Because the common blocks are deduplicated, the I/O activity is reduced to a smaller set of SAN blocks, providing more opportunities for cache hits.  When ONTAP deduplication is combined with either Flash Cache or Flash Pools, a significant performance improvement can be realized in these and similar scenarios.

COMPRESSION

ONTAP also provides for compression which does not work on a file basis, but rather against a collection of adjacent blocks of up to 32K.  Intelligent algorithms will determine the “compressibility” of the blocks and will only attempt to compress if significant benefits can be realized.  The compression can be set for either inline compression (on write), post-process compression or a combination of both.  The post-process method is a bit more comprehensive and will compress blocks that the inline method may have passed over.

 

Compression of course can save I/O operations on both reads and writes but at the expense of CPU computations.  Generally speaking you will want to enable both inline and post-process compression on your archive and backup tiers, and the optimal settings for other tiers will vary based on both the application and how it is configured.  The following table gives an overview of the space that can be saved using combinations of dedupe and compression on different data sets:

 

Percent of Storage Saved with:

Application Type Compression Only
(average)
Dedupe Only
(average)
Dedupe +
Compression
(average)
File Services: Home Directories 50% 30% 65%
File Services:  Engineering Data 55% 30% 75%
File Services:  Geoseismic 75% 3% 75%
Virtual Servers & Desktops (OS Volumes) 55% 70% 70%
Database:  Oracle ERP 65% 0% 65%
Database:  SAP 70% 15% 70%
Email:  Exchange 2010 35% 15% 40%

More detail on deduplication and compression within ONTAP is available in this whitepaper

FlexVol — Thin Provisioning

In the ONTAP platform all FlexVols are thin provisioned – meaning that no SAN space is physically consumed until those blocks are actually written to and utilized.  This not only saves space, but can improve performance by helping to maintain a higher spindle-to-data ratio.

Thin provisioning is commonly found on several storage platforms, but in the ONTAP platform, not only is thin provisioning the default for all FlexVols (and across all storage protocols), but you can actually both grow and shrink – yes shrink! – a FlexVol after it has been provisioned.  This provides the maximum opportunity for both storage efficiency as well as flexibility.

FlexClone — Efficient Snapshots

Many SANs have snapshot/clone capabilities but often they come with severe limitations.  For example, several use the “copy on write” method which can be expensive both in terms of disk space and performance.  The FlexClone feature (datasheet) within the ONTAP storage platform enables the rapid creation of clone copies of production volumes.  When a FlexClone is created a small metadata update is made and then only any new changed blocks are written to disk.  No “copy-on-write” is performed and common blocks between the parent and child are fully leveraged.  This space efficient approach minimizes overhead and enables up to 255 snaps per volume.


Because of the FlexClone architecture, ONTAP can provide up to 255 snapshots per volume without the space and/or performance penalties typically associated with snapshots

“But what about my databases and VMs” you ask?  That’s an excellent question as those snapshots won’t be very useful for either QA, development or recovery if they are not application consistent.  This is where NetApp SnapManager comes in, which has the ability to properly quiesce applications including Exchange, SAP, Oracle, UNIX, Windows and VMware virtual machines.

Bottom line is that FlexClones allow you to quickly and effectively take point-in-time application consistent snapshots of your production data, while avoiding the storage capacity and performance penalties which are typically associated with snapshots.  This has profound benefits for QA and development (build up/ tear down) as well as backup and DR as we will get to in future posts.

SnapDrive and Space Reclaim

When Windows deletes data from the NTFS file system it simply updates the directory table, but leaves those blocks physically in use on disk.  In other words, the data is still there, but it’s just no longer “listed” in the directory.  This creates a disparity with the VMFS and SAN levels which are only concerned with whether a block contains data or not.  SnapDrive for Windows is NTFS aware and can extend information about deleted blocks to ONTAP allowing for the space to be reclaimed.

SnapDrive for Windows has other capabilities as well, but reclaiming NTFS space can have a compounding effect, especially where FlexClones are used.

RAID-DP

RAID-DP is the default RAID method used on ONTAP storage.  By integrating with ONTAP’s WAFL technology (reviewed in Part 2), RAID-DP enables the protection of double-parity but without the performance penalties.  According to NetApp, the performance penalty of ONTAP’s RAID-DP is between 2 and 3 percent relative to RAID-4, whereas the traditional write penalty of RAID-6 is often around 30%.  Additionally, RAID-DP is more space efficient than most RAID-5 implementations by enabling a larger number of spindles (up to 26 data spindles and 2 parity spindles per array).

But RAID-DP is mostly about protection which is key when using large SATA drives which have longer rebuild times.  With RAID-DP you can afford to lose 2 spindles within a RAID set, while your hot spare(s) are joining the array.  A double parity scheme (such as RAID-6) would be standard in more arrays if it weren’t for the performance penalty it brings, but RAID-DP solves this problem, allowing the best of both worlds  — improving protection, maintaining performance and optimizing capacity.

STORAGE EFFICIENCY

So far we’ve covered deduplication, compression, thin provisioning (FlexVol), efficient snapshots (FlexClone), Snap Drive Reclaim, and RAID-DP.  When you combine the sum of all these efficiencies you can understand why NetApp offers their guarantee that you will use at least 50% less storage compared to other offerings.  And all these ONTAP features are supported across any protocol — iSCSI, FC, FCoE and NFS — and aross the entire FAS product line.

For organizations already using a different storage array, you can still put a NetApp V-Series in front of most storage arrays and immediately gain the benefits of the ONTAP platform.  In fact, NetApp will guarantee a 35% storage reduction in this scenario as well as gurantee that the V-Series will pay for itself within 9 months.

I’ll be discussing value in more detail in future posts, but for now consider this quote from Mercy Healthcare (Innovator of the Year Winner 2012) and what they did across over 30 hospitals and 400 clinics:

Mercy Healthcare built a state-of-the-art data center and implemented a flexible cloud infrastructure to effectively deploy an Electronic Medical Health Record for storing and protecting patient information and, in the future, to support smaller clinics and healthcare systems. With the help of the NetApp FlexPod(R) architecture, we have saved over 40% of storage space, reduced power consumption by 50%, and now provide rapid access to and data protection for 1,742K patients.

In this post we introduced the technologies behind storage efficiency and in future posts we will take a more specific look at various scenarios – including backups and DR – to see how ONTAP as a storage hypervisor can provide benefits and agility which parallel and complement those provided by VMware.  And we haven’t even gotten to cluster mode yet!  Stay tuned….

The Storage Hypervisor Part 2 — Flash Pools

In Part 1 of this series we talked about how ONTAP could be seen as a storage hypervisor and how the benefits of this could mirror the benefits of a compute hypervisor like VMware vSphere.   The key ingredient is a common OS or platform from which we abstract from.

To give an example of this, NetApp is currently #2 in overall storage market share, but the ONTAP platform is the #1 storage operating system in use today, serving up exabytes of data (the number 5 storage OS is NetApp’s own Engenio line). And because of this common storage hypervisor, some pretty amazing possibilities come into play which can really position an organization for agility – let alone plain old value.  And the benefits aren’t just limited to NetApp storage as the V-Series and the ONTAP Edge Storage Appliance can extend these benefits into more areas — but more on that in future posts.

In this series I’d like to first take a look at some of the unique capabilities inherent in the ONTAP 8.1.1 platform ranging from storage efficiencies, multi-protocol, scalable “infinite and immortal” volumes and more – and then build on this to show how these provide more value in everything from disaster recovery, private cloud, test-dev, and of course agility.  Before I get into storage efficiencies, I thought I’d talk about a new feature just announced in ONTAP 8.1.1 – FlashPools.  But to best understand FlashPools lets take a step back and look at various technologies as well as ONTAP’s unique way of processing writes.

“You Don’t Have [Technology-X] So Mine’s Better!”

This hot rod might just have a turbo engine

So you frequently race cars with your friends and suddenly a new ACME turbo charger becomes available promising a 30% increase in horsepower.  You run out and buy it, hook it up to your car and you’re feeling great about your new turbo-charged wheels.  Your car is indeed much faster now.  By extension it must be superior to anything else that doesn’t have the ACME turbo charger!

Imagine your surprise when you’re friend’s car which doesn’t have the shiny new ACME turbo charger is still keeping up with you.  What happened?  You thought your car was superior but now you’re not so sure.  This mystery might only be answered by popping the hood on the other car and see how they do things.

Such is the issue with NetApp.  There’s been criticism from certain quarters around the lines of “you don’t use flash for primary storage or automated storage tiering” without the context and understanding of how NetApp’s ONTAP handles I/O.  At the end of the day what matters is performance and reliability.

Yesterday — WAFL (Write-Anywhere-File-Layout)

“A stack built upon a WAFL, is a stack optimized for sucess” — Confucious

Most SANs have an NVRAM cache that the controllers write to.  But ONTAP will journal the write requests to NVRAM.  This method not only consumes less NVRAM and improves recoverability, but also improves response times, and allows for disk writes to be optimized.   The NVRAM has two buffers — when the first is full, a consistency check will run to write all the entries to disk.  Now unlike some other file systems, WAFL will store metadata within files, which allows much more flexibility into how to write to disk — hence “write anywhere” — significantly improving write performance.  OK, this part was kind of technical but the take away here is that due to WAFL innovation, NetApp was always been exceptionally efficient at writes and didn’t always need fashionable technologies that others were implementing to keep pace. (for more detail on WAFL, here’s a whitepaper from 2006).

Flash Cache and now – Flash Pools

So FlashCache has been around for a bit now and is a part of NetApp’s Virtual Storage Tier.  It’s basically putting a PCIe Flash card next to your controllers.  The controller has fast access to the flash storage and can use it to effectively cache random read patterns.  This can dramatically reduce latency and can for example increase the number of  concurrent mailboxes that can be serviced by up to 67% in some tests.  This provides excellent results for reads, and it also improves write performance by offloading some read traffic from the hard disk system.   WAFL along with Flash Cache did a great job at delivering performance for years, but today the demands of the most write intensive transactional systems could benefit from something more.

Flash Pool

New in ONTAP 8.1.1 is support for Flash Pools.  A Flash Pool (using patented technology which is an extension to WAFL) is essentially augmenting a logical aggregate of hard disk drives (HDD) with flash SSD drives.    But here’s the twist – the primary data stays on the HDD tier and never moves around (except for write cached blocks once they become “cold”).  The intelligent algorithms will populate the SSDs with the most frequently used read data to accelerate reads.

At the same time, random writes will take place to SSD drives while sequential writes will use the HDD drives, allowing for the most effective use of HDD/SSD depending on the pattern.  So from a write perspective the SSD drives are used to offload the I/O activity of random writes.  Now some applications will intensively write, then read, and then overwrite the same data.  FlashPool is uniquely equipped to service this type of data, as it will offer both read and write acceleration for this data.  The read-write cache will automatically adjust for your workload patterns — set it and forget it.  And Flash Cache and Flash Pool are designed to complement each other, giving you the option to experience the benefits of both working together.

Compare this to automated storage tiering – these systems will move your primary blocks around the different tiers, where as FlashPool is an extension to the WAFL technology which leverages the SSD pools for block caching and write staging.  Some are inclined to think that a SAN without automated storage tiering system that’s moving your primary blocks around to different tiers is somehow deficient.  Automated Storage Tiering can certainly be used to drive more performance, but so can Flash Pools.  But before you write off a solution because it doesn’t have automated storage tiering, take a look under the hood and find out what’s really going on.

PERFORMANCE AND EFFICIENCY

At the end of the day the primary two benefits are performance and efficiency.  Performance to drive faster reads and writes than were possible before.  Efficiency to use fewer resources.

Consider the following scenario – a pair of FAS6210’s with 240 SAS 600GB drives for a total of 144TB.  Now make the following changes – replace those 240 SAS 600GB drives with 216 SATA 1TB drives.  Now add twelve 100GB SSDs into the flash pool.   What’s the net effect?

According to NetApp this increased capacity by 47% and reduced cost by 23%.  That’s a 50% reduction in cost on a per/TB basis and all while consuming 26% less power.  What about performance?  According to NetApp, IOPS did not change more than 2% (plus or minus) from the baseline, but that response times were significantly improved.  In other words, improvements in both capacity and cost were realized, without any decrease in performance.  That’s a nice combination to have when you can get it.

And of course what enables the Flash Pool?  ONTAP does.  Any SAN in the FAS family running ONTAP 8.1.1 or later can use Flash Pools.  In the next post in this series we’ll take a look at some of the ways in which the ONTAP platform makes storage more efficient.  Towards the end of this series I’ll shift focus into showing how the multitude of ONTAP (storage hypervisor) benefits can lead to savings and agility that parallels what VMware has enabled.

Enter The Hybrid Cloud

I think there’s a huge opportunity (and need) for hybrid cloud management tools.  Let me explain.

First of all, there is a great deal of intellectual property that is either highly sensitive and/or subject to regulatory controls.  Actuarial data, R&D, medical records, big data analytics and more.  Many organizations will not allow such profoundly sensitive data onto externally hosted infrastructure for several reasons.  They are familiar with their own security protocols and governance and also in their ability to demonstrate compliance with auditors.  For these reasons I think that a lot of organizations will choose to keep sensitive intellectual property in various private clouds (control) as opposed to public clouds.

But what about all the workloads which don’t share the same intellectual property concerns?  Here the public cloud has two key advantages – one is that it has a lower per-unit cost structure due to “datacenters of scale”.  Second, public clouds are more elastic.  It will often be quicker to consume capacity on a hosted public cloud, than it would be to add capacity to a private cloud.  A third benefit of public cloud would be a reduction in the datacenter operational burden on the IT department.  Running a datacenter is expensive and challenging – why increase your internal operational burdens?

So it seems we have strong use cases for both private and public cloud.  Enter the hybrid cloud.  But now we have new challenges….

Having a different management portal for private and public clouds raises some new challenges.  It’s not just having “single-pane-of-glass” visibility into the aggregated environment but how are you going to maintain consistent security and governance across your hybrid cloud?  How are you going to be able to demonstrate and ensure compliance for HIPPA, PCI and many other auditable requirements?  And if you’re doing chargeback, how do you effectively keep track of it all?

For these reasons and more, I think there is a huge opportunity in the future for hybrid cloud management tools.  VirtuStream is once such vendor with their xStream offering and I’m sure there are others – as well as products under development right now.

What do you think?  Are hybrid clouds going to become more common place and is there a strong need for cloud management tools which can transcend and manage across the elements in hybrid clouds?  Any while we are on the topic what about multiple-hypervisors as well?  Do we need some sort of reference architecture for cloud components in order to enable more effective management solutions?