Why Microsoft?

This is a question that can be explored from many different angles, but I’d like to focus on it from not JUST a virtualization perspective, and not JUST a cloud perspective, and not JUST from my own perspective as a vExpert joining Microsoft, but a more holistic perspective which considers all of this, as well

Top 6 Features of vSphere 6

This changes things. It sounds cliché to say “this is our best release ever” because in a sense the newest release is usually the most evolved.  However as a four year VMware vExpert I do think that there is something special about this one.  This is a much more significant jump than going from 4.x

vSphere 6.0 Public Beta — Sign Up to Learn What’s New

Yesterday, VMware announced the public availability of vSphere 6.0 Beta 2.  I can’t tell you what’s all in it due to the NDA, but you can still register for the beta yourself, read about what’s new and download the code for your home lab. There’s some pretty exciting stuff being added to vSphere 6.0 in

Will VMware Start Selling Hardware? Meet MARVIN

The Register is running a story that VMware is preparing to launch a line of hardware servers.

VMware Pursues SDN With Upcoming NSX Offering

Earlier this week VMware announced VMware NSX – an upcoming offering that takes network virtualization to new levels. NSX appears to be somewhat of a fusion between Nicria’s SDN technology (acquired last year by VMware) and vCloud Network and Security (vCNS – formerly known as vShield App and Edge). Since I already had intentions to

What Really Is Cloud Computing? (Triple-A Cloud)

What is cloud computing?  Ask a consumer, CIO, and salesman and you’ll likely get widely varying responses. The consumer will typically think of the cloud as a hosted service, such as Apple’s iCloud, or uploading pictures to Photobucket, and scores more of like services (just keep in mind that several such services existed before it

Agility Part 2 — The Evolution of Value in the Private Cloud

When an IT project is commissioned it can be backed by a number of different statements such as: “It will reduce our TCO” “This is a strategic initiative” “The ROI is compelling” “There’s funds in the budget” “Our competitors are doing it” Some of these are better reasons than others, but here’s a question.  Imagine a

Stacks, the Vblock and Value — A Chat with EMC’s Chad Sakac

…I reached out to EMC’s Chad Sakac to gain more insights from his perspective on how the various stacks…well…stacked up….

The NoCloud Organization — Part 2

No Clouds Allowed

In Part 1 of the NoCloud organization I discussed how complex today’s IT systems can be and I mentioned an organization that believed that if they purchased everything from a single vendor that it would make it easier for them to plug in “monkeys” into their system and pull the levers of the IT machinery.  In this post I’ll discuss the many things I think this organization approached incorrectly, and in part 3 (and possibly 4) I’ll try to discuss these points in more detail — in the context of what the organization SHOULD have been doing to become more efficient and effective.

Our NoCloud organization recently went through some major changes.  The parent company just purchased another company in the US that had been losing money.  The current North American headquarters were in New Jersey but the datacenter needed some cooling upgrades, and a decision was made to put IT under the control of the new company which was just purchased, and then to move the entire datacenter down to Georgia, where more space was available.  Over the next few weeks and months we would learn a lot about the organization we saw a few things in meetings that raised our collective eyebrows, and overtime many fears would be validated.

The new organization it turned out was not terribly evolved.  They followed the traditional view that IT is a cost center and as far as technology, they were well behind us in every area — Clustered SQL, Active Directory, and when it came to VMware they were running ESX 3.0 using captive disk (DAS).  When I got around to finding the right people to talk to, I learned that the reason they were running captive disk because they were never able to get to shared storage to work reliably.  They had several production web servers and other systems in this environment which took them about 3 days to restore when the 3rd disk in the RAID-6 array failed because no one was monitoring it (speaking of monitoring, several production LUNs were lost in a separate incident because all parity and hot spare drives had failed over 18 months and no one had noticed). The VMs were being backed up with traditional agents to tape.  As of August 2011 this environment was still running the same workloads on ESX 3.0 (unsupported), and possibly may still be.

Our virtulization infrastructure was significantly more evolved, but the new organization didn’t have a high opinion of the concept — based undoubtedly on their own results.  I developed a plan to migrate about 350 physical and virtual servers from New Jersey to Georgia using virtulization and then replication for about 90%.  It got a lot of push back initially but eventually the concept would be approved as it could be demonstrated to be more risk averse then any of the other methods, and not to mention meeting the aggressive time table (to which a promotion to VP would be linked to).

Within about a month we designed and ordered new hardware to support this new vSphere environment.  I worked 80-100 hour weeks over the entire summer interviewing application owners, designing the project plan and the migration scheduleand doing most of the heavy lifting.  Many believed the technology (VMware) would not be able to handle such an ambitious project and many believed our process would fail or we would be crushed by logistics.  In the end we exceeded everyone’s expectations.  Most application owners said their workloads performed better, and we even virtualized applications that spanned over a dozen servers including Novell NetWare and Windows NT, working with HPUX systems.

Because the time table was so aggressive, there was no opportunity to develop best practices and operations, so we worked on setting up a sustainable environment (including core monitoring and backups) and we were told — promised — that we would be given the opportunity to circle back later and build an operational framework.  Well after 6 months of working nearly every weekend (no comp days) guess what happened!

For getting the datacenter move done in a ridiculous time frame the new IT director got his promotion to Vice President, which was both celebrated and demonstrated by a shiny new Hummer appearing in the newly designated executive parking space.  Of course no thank-yous could be afforded to anyone else – certainly not those who sacrificed every single weekend over the summer —  lest they might think they play a significant role and try to work outside management’s intended boundaries.  One manager began collecting information for both recognition and monetary reward for those involved in the huge undertaking, but that effort was quickly shut down by managers above him.

As for operations in the new datacenter, suddenly that wasn’t important.  I would be assigned a new role where I would work on projects as assigned by the CIO – and many technical parameters (including don’t use the risky virtualization stuff) would be mandated from the start.  So what happened to the virtual infrastructure that now housed nearly 400 servers, about 70% of them production?  The new team had no experience with virtualization so I would find out that their provisioning process would be to install the OS manually from CD each time.  They were instructed that servers had to be built this way because audits for SoX compliance required it.  I called several managers and explained to them that this was an incorrect interpretation and that they needed to challenge their auditors to let us do what our competitors are doing.  Nothing changed.  So when I needed VMs (for test/dev elements) it would usually take more than a week and at times up to 3 weeks before I got the VM turned over to me.  If I tried to fix anything I would get yelled at (literally) for daring to touch the VMware environment — “that belongs to operations and you are not operations!”.  Never mind that systems were broken and they asked for my help.  Some of this is what inspired a previous post entitled “Let Your Fast Zebras Run Free”.  Others would have similar experiences in other disciplines, as the best employees were ironically the ones which new management wanted to be “corralled” within their respective pens.

Snapshots would be used as backups and left open for months and then they lost hours trying to troubleshoot performance and backup issues that spilled over into production.  Resource limits were set of VMs for no apparent reason, effectively limiting systems with as much as 32GB of RAM to 2GB.   When I delivered the environment every single VM would be backed up by default leveraging the vStorage API, but now VMs were no longer automatically backed up (or efficiently in many cases).  The organization would be told to disable DRS because the movement of VMs across different servers made it confusing to match up asset tags to servers — a manual vMotion would now require a help desk ticket.

Monitoring?  I had led an initiative to build a service map to monitor key elements of our ERP system (Oracle), and critical infrastructure (SAN, AD, SQL, vSphere, Backups and more) and expose it to operations.   We were instructed to abandon our investment in these solutions, as they would be replaced with “whatever the new IT org uses” which we eventually learned was essentially nothing.  Well they did have SCOM but we ended up learning and then teaching them how to use it to a point that was a faction of competency compared to what we had previously.

What about the strategic direction of the IT organization?  After asking to attend the strategy meetings for nearly every week (to which engineers were never invited) eventually I got the call information and we didn’t know whether to laugh or cry.  It turned into a great source of comedy in our office to counter the low morale.  Management didn’t want to be questioned by engineers and soon the meetings would be “cancelled” but continued privately.  Management couldn’t avoid being questioned however, as our parent company in Japan would quietly send over a team of over a dozen people to attempt to understand why ever since the takeover, IT projects were taking so much longer and at much greater cost.  I wonder why!

What about agility?  You might have gotten some hints already, but in order to get anything done — such as poke a hole in a firewall for example — you would have to reach out to all the required teams, and after the obligatory “I need to setup a meeting with my manager” and calendar logistics, we might be able to make progress after several weeks and several meetings (“what was this for again?”) and then we could finally collect the approvals and submit to management.  Or course then we would hope that senior management would actually approve the ticket in time, or else we’d have to get everyone together to pick out a new time window. Several of us became convinced that managers would create processes just to slow things down to a pace they felt comfortable with.

Morale was terrible among the employees.  Training for employees was universally rejected.  I obtained a voucher to attend VMWorld and I offered to use my own transportation to attend – I just needed to get vacation time approved for that week (a month in advance).  The vacation request was denied with no explanation given.  When my daughter was healing from surgery and intensive care I checked my mail to make sure everything was going smoothly and responded to a colleagues question about logistics for an upcoming meeting with a vendor.  This earned me a phone call from my manager who would scream at me for the next 20 minutes for “working” during paid time off and he asked for the hospital room information so he could come by and compescate my phone and laptop.  The nurse (and other visitors) heard the screaming on the other end and expressed concern about the negative energy affecting the patient (my daughter). I would later learn that others would have to go so far as to seek professional help for how they were treated at the workplace.

In many different examples, it became clear to several of us that the new “regime” operated under Theory X — yell at employees (including manager) when the rainbows and unicorns are not aligned as you have envisioned.  We saw the use of fear and intimidation to “motivate” employees and efforts to inhibit communication with management and keep employees in the dark about future plans and direction.  Empowering employees with information could present a threat so just treat them like pawns so they won’t deviate from their intended boundaries designed by management, and just keep pulling the assembly line levers.

I don’t want to offer too much commentary here, but just let the scenario speak for itself for now.  In a couple weeks I’ll post Part 3 in which I’ll attempt to look back at this post from a different perspective — looking at some principles that should be inherent in a well functioning Cloud Organization and how this org specifically failed to meet most (all?) of them.

Virtual Machine Considerations For NetApp Storage

In our environment we’ve done something which at first glance might seem a bit unconventional to many.  We’ve consolidated all OS-based disks (VMDKs) on a dedicated set of datastores, and have done the same with application drives, page files and even VM swap files.

Why go through the extra effort to position each VMDK for each VM on a different datastore?  The driving force behind this idea is deduplication.  NetApp storage has the ability to dedpulicate identical blocks and the boundary for this deduplication is the volume (or FlexVol in NetApp terms).   How many common blocks might there be on the C: drives of all those Windows VMs?  Probably quite a bit.  By segregating the OS, Application, page file (OS) and vSwap drives and then placing them onto common volumes, we can maximize our ability to find common blocks within a single volume, and thus maximize our disk savings.

Of course there are other reasons as well.  If you are doing any type of DR ranging from SRM to NetApp replication, you might want to exclude the page file from all that replication activity.  Since it’s all consolidated onto one or more datastores, just exclude those datastores from your DR plan and/or replication.  Easy!

The NetApp and VMware Storage Best Practices whitepaper does mention consolidating vSwap and OS page files onto common volumes, but does not make direct mention of consolidating OS volumes and app volumes — but it does hint at it by stating “NetApp recommends grouping similar operating systems and similar applications into datastores, which ultimately reside on a deduplication-enabled volume.”

50% GUARANTEE

This is also a big part of the reason for NetApp’s 50% guarantee which guarantees that you will use 50% less disk space if you enable the following features:

• Deduplication  •   RAID-DP   •   Thin Provisioning   •   NetApp SnapShot

Thin Provisioning of course is common to many storage systems, but the other three are unique to NetApp.  RAID-DP allows for a second parity bit without the traditional performance and capacity penalties, while NetApp’s snapshots use far less space that most other snapshot implementations which have to move more blocks for the same operations.  And of course NetApp offers inline block level deduplication that can meet mission critical storage requirements as opposed to post-processing which is far slower and usually reserved for backup tiers only.

In summary NetApp storage offers some unique capabilities which in many cases can be maximized by trying to group together like drives onto the same volumes.

FlexPod In the Enterprise — GE and More

In a recent post I made the case that the FlexPod is not JUST an SMB solution for IaaS, but has been quite successful in the enterprise space as well.  Well it turns out that GE (General Electric) will be speaking at Cisco Live in June to share details and best practices on how they use the FlexPod to provide cloud services to their global business units.

Also, the University of Tennessee will be hosting a session on a FlexPod powered solution which provides over 27,000 students and staff access to virtual desktops and applications.  Details below:

Transforming GE’s Cloud with FlexPod (1:30pm Tuesday, June 12th — BRKPCS-4387)

Learn how General Electric’s corporate IT function supports global business units with private cloud and desktop virtualization solutions built on Cisco UCS, Cisco Nexus and NetApp storage as part of the pre-validated FlexPod architecture.

Cisco Data Center Technologies VP Paul Perez and NetApp Solutions and Integrations Group VP Patrick Rogers will join GE Principal Technologist Steven Fearn to discuss how GE has designed and deployed its cloud environment, best practices, the evolution of data center technologies, and new business models created by IT’s transformation into a service organization.

  • Steve Fearn, GIS Principal Technologist , GE
  • Patrick Rogers, VP Data Center Platforms, PASM HQ, NetApp,
  • Paul Perez, CTO, Server Access and Virtualization Business Unit, Cisco

University of Tennessee Launches Their Virtual Desktop Revolution (Wednesday, June 13th — BRKPCS-4381)

The University of Tennessee at Knoxville launched “Apps@UT,” a program implementing Citrix XenDesktop that runs on a Cisco infrastructure to deliver full Windows desktops and applications as an on-demand service. Apps@UT will give UT Knoxville’s more than 27,000 students and staff anytime access to virtual desktops, applications, personal files and network resources.

Join Mike Stanley, Sr. IT Director, University of Tennessee and Sean Connelly, Sr. Director Enterprise Services, Citrix Systems as they discuss and share the virtual desktop revolution underway at UT, including:

  • UT’s vision for delivering apps and desktops to students and faculty to provide a higher degree of flexibility and availability of technology to them.
  • UT’s critical decision points in designing and deploying Citrix XenDesktop on the Cisco UCS platform with NetApp storage as part of a FlexPod datacenter architecture.
  • How UT optimized the performance of networks, applications and services.
  • How UT expects to save time and resources by enabling easy, centralized management of desktops and applications; reduce the cost of software refreshes, security patches, and other maintenance.
  • How UT will introduce new, thin clients, as older PCs in the university labs are phased out and replaced under the Apps@UT program to generate energy savings and reduce both the frequency and cost of hardware refreshes.

 

For more details check out the Cisco Live Session Catalog.

What Is Converged Infrastructure? (a response to ZDNet)

Recently there was an article posted on ZDnet entitled “Converged Infrastructure vs Reference Architecture” which has collected a bit of attention and I wanted to add my thoughts regarding what I thought were some misleading conclusions.

If the article read as if it were written by an enthusiastic VCE employee, that’s because it was.  Now that’s not inherently a bad thing – the Vblock is a great product with much to be excited about (as my own blog posts reveal).  However, I walked away feeling as if the article did not accurately describe the market or the trade-offs involved with the various solutions.  I observed this article being shared multiple times but mostly from people with an agenda and not the more independent analysts.  In the spirit of full disclosure I work for a NetApp Partner where we offer FlexPod solutions.  Now let’s take a look at that article….

CONVERGED INFRASTRUCTURE vs. ?

Let’s start at the beginning.  The first problem I have with the article is the title: “Converged Infrastructure vs. Reference Architecture”.  This suggests of course that reference architectures are NOT converged infrastructure and that customers must therefore choose between the two.  Is that truly the reality of this market?  Let’s start by defining what converged infrastructure is.  Here is a selection of excerpts from the Wikipedia definition of converged infrastructure:

Converged infrastructure packages multiple information technology (IT) components into a single, optimized computing solution. Components of a converged infrastructure solution include servers, data storage devices, networking equipment and software for IT infrastructure management, automation and orchestration.

Converged infrastructure is used by IT organizations to centralize the management of IT resources, consolidate systems, increase resource utilization rates, and lower costs. These objectives are enabled by the creation of pools of computers, storage and networking resources that can be shared by multiple applications and managed in a collective manner using policy driven processes.

If we look at the benefits of converged infrastructure we can further improve our definition:

Writing in CIO magazine, Forrester Research analyst Robert Whiteley noted that converged infrastructures, [combine] server, storage, and networks into a single framework, help to transform the economics [of] running the datacenter thus accelerating the transition to IP storage to help build infrastructures that are “cloud-ready”.[3]

Can a reference architecture do these things?  Could a FlexPod or VSPEX solution for example be used to create “pools of computing, storage and networking resources that can be shared by multiple applications and managed in a collective manner using policy driven process”?  That’s not only what they were designed to do but it is exactly how they are being used today.  In the case of the FlexPod – the market leading reference architecture – customers choose their own automation/orchestration solutions to best match their needs and business model ranging from Cloupia, CA, Cisco IA and more.  And of course these solutions automate and orchestrate across Cisco UCS  (compute), NetApp storage and Cisco networking solutions, leading to….tada!…..converged infrastructure (some real world examples in a bit).

Now a solution using the product model (such as Vblock) could indeed make the claim that their solution is “more converged” when it comes to procurement and support because you are dealing with one product and one vendor.  This is absolutely true, but it is quite different than suggesting that reference architecture is not a converged architecture, given the established definitions.  Furthermore, this overlooks the acclaimed support model NetApp has cultivated with their FlexPod partners which allows customers to make the initial call with the vendor of their choice, and then the call is seamlessly managed across other vendors in the stack as needed with shared ownership along the way.

It seems that someone is trying to bend the definition of converged infrastructure into a shape to better suit their own agenda as even solutions like HP CloudStack and IBM Pure Systems are labeled as somehow not being converged infrastructure.  In the spirit of this election year, I rate the suggestion that reference architectures are not converged infrastructure as “Pants on Fire”.  Let’s move on…..

REFERENCE ARCHITECTURES – SMB ONLY?

The article opines that “…Reference Architectures are a great solution for a low budget SMB that is looking to introduce itself to the world of Cloud” and goes on to attempt to explain that only a “true converged infrastructure” is suitable for “serious contenders” looking for a “robust, scalable…private cloud infrastructure” and of course finally concludes that “the only such solution….is VCE’s Vblock”.  One wonders how this advertisement passed editorial review at ZDNet.

It is certainly true that the Vblock is well suited for high end enterprise/cloud environments, as well as that the FlexPod is very accessible for the SMB customer.  It is understandably tempting to limit these products to relative positions as a result, but the clear fact is that reference architectures like the FlexPod are being successfully used in world-class enterprise and cloud deployments.  I’ve only so much room here, but here’s just a few examples.

Accenture uses FlexPod to host a private cloud for SAP customers as well as a FlexPod powered cloud for the French government (G-cloud).  Euronet chose FlexPod for their central datacenter supporting over $50 billion in financial transactions in Europe annually.   Terremark (Verizon) is one of several hosting providers using the Flexpod, along with several other blue-chip names – one of which to be revealed this week.  In fact, if you look at the list of DaaS/IaaS and other service providers already using NetApp storage – it’s probably not a big stretch that several are FlexPod customers as well.  ING Direct and EGGER are among some of the larger international FlexPod customers, and if you’ve been a patient at one of the Cook County Health Care System facilities in Chicago, your patient records – along with another half million patients each year – are stored on a FlexPod powered datacenter.  So as you can see, just because the FlexPod may be more accessible to the SMB market does not mean that it can’t – and isn’t being used in a world class enterprise cloud offering.  Now ask yourself if EMC launched the VSPEX offering to focus only on the SMB market.  Here we see clear examples of a reference architecture providing world class enterprise cloud services well beyond the SMB space with more to come.  Pants still on fire.

ENGINEERING TRADE OFFS

On a recent trip to IKEA I found myself trying to match different parts of different sizes in order to scale out an existing (cabinet) solution.  It would have been nice if there were a document helping to show me which parts were designed to work together, in order to save me some engineering pain.  In this respect, I think that the reference architecture will replace the traditional “build-your-own” approach and gain further traction in the marketplace.  Now there is no denying that the Vblock – being fully “productized” – has several advantages.  There is much more engineering that goes into a Vblock for example than a reference architecture like a FlexPod.  And while there are clear advantages to this approach, there are also more parameters that one might have to operate within as a result.  For example, you might have to wait significantly longer to upgrade to the latest build/release until all the components have been fully engineered to support that release, as well as other potential boundaries and limitations required for support.  For many this will not be an issue, but for some it will.  There is less engineering that is pre-established for you in the reference architecture and yes this can increase the complexity of support, but it can also empower the customer with more flexibility – including being able to select the orchestration suite that best meets your needs and objectives rather than the one offering that is built into the product.  I’m told by a partner that one Vblock customer ended up dissatisfied with the solution and has placed four different FlexPod orders over the past 6 months.  There are a few trade-offs at play here and there is no “one-size-fits-all” solution.

The ZDNet article to me read as if there were only one viable solution and I just don’t see the market this way.  The Vblock is an outstanding product, but in my opinion so is the FlexPod – customers should take the time to understand these trade-offs and what works best for them.  I love being able to explain the value proposition of the FlexPod to customers and how they can choose among orchestration solutions that are the best fit for their goals.  Yes, there is more engineering put into a “full product” like the Vblock than references architectures – whether this is the right fit may depend on your goals, preferences and operational model.    At the end of the day what we all seek to do is to provide converged infrastructure in a manner which enables and empowers business to achieve new benefits from this utility model of computing, and there’s more than one solution that can effectively achieve this result for customers.

Now here’s my spin — I love what the FlexPod solution provides to us across hosted and on-premises IaaS/cloud solutions.  Cisco UCS + VMware + NetApp + choose your own orchestration + a converged support model = value for our customers and own own offerings.  It’s tremendously exciting to work with customers and market opportunities with the FlexPod and come up with solutions that provide phenomenal value and potential that has all parties incredibly excited.  At the end of the day, there are several different converged infrastructure solutions which are providing great value in the market place.  I don’t think that a reference architecture is somehow an incomplete or insufficiently “robust”  solution just because it is not fully “productized”.  In my opinion, a reference architecture — such as the FlexPod — can provide an outstanding basis for IaaS, private cloud, value and opportunity and I’m excited to have the opportunity to design solutions around it.

 

The Stack Market: Vblock, FlexPod, VSPEX (and me)

I’ve written several times on this blog about converged infrastructure and a few recent events have compelled me to revisit the topic. These events include the launch of EMC’s VSPEX solution along with my decision to accept a new position with a NetApp/FlexPod partner.

I’ve spoken very favorably in the past about the Vblock and even interviewed EMC’s Chad Sakac on this blog about the Vblock, the value proposition, and how it compares to other “stacks”.  Have my opinions changed? And what does the introduction of VSPEX mean for the market? Let’s start by looking at stacks using a car analogy.

THE SPORTS CAR ANALOGY

Imagine that you are a sports car enthusiast and you have three different choices for obtaining a vehicle. One option is to pick out the best individual components and build your shiny new custom sports car from the ground up. You get exactly what you want from whomever you want, but you have to incur the engineering burden of figuring out how best to get all those parts to work together —  and if you change a part or two in the future you’ll have to re-engineer it then as well. This build-your-own approach is the traditional way of doing things, but it requires good amounts of both engineering expertise and time.

Don’t want an off-the-shelf Corvette? Customize it!

Now there’s one company out there who has selected best-of-breed components from multiple vendors — the best chassis, tires, engine, transmission, and more — all engineered to work together to form one fantastic sports car. This car can then be purchased and supported as a single product. Simply select the size and model you want and you’ve got one great sports car ready to burn some rubber.

But what if you had a favorite gearbox or set of tires that you wanted to use in order to change your ratios or to achieve some other specific goal? You’re kinda stuck. Now depending on your goals that might not be such a bad thing, but some might want a more middle-of-the road solution that offers more flexibility.

Some may prefer to step into the fashion stylings of this custom car

Fortunately, there is a second company that has put together a reference architecture. In this model, the company has selected a finite list of best-of-breed solutions and has certified these solutions to work together to achieve the desired results. Like the first solution, the engineering burden is removed from you the customer, and you’re getting a solution that was designed to work together. In addition you have a bit more freedom to pick best-of-breed components, but you are forgoing some of the benefits of the single product model such as procurement and support. In short, you gain more flexibility while taking a step back from the “pure product” concept.

THE STACK MARKET

The Vblock is of course the “product” solution in the example above. It’s an excellent solution comprised of EMC Storage, VMware vSphere, Cisco server (UCS) and networking hardware, and Unified Infrastructure Manager for orchestration across these components. The FlexPod uses Cisco UCS and Nexus (as does the Vblock) but offers a choice of hypervisor and leverages NetApp storage solutions. FlexPod customers can also select from a variety of orchestration solutions ranging from Cloupia, CA Automation Suite, Cisco IA, and many more.

This stack is comprised of three components designed to deliver a tasty breakfast

For a variety of reasons the FlexPod has been catching on as of late enjoying more than 400% growth in customer adoption in the past year. While the Vblock has experienced a measure of success, it has become clear that for reasons ranging from the technical to the political, there will be a market need for more “flexible” solutions –as opposed to the single product model — for some time. EMC reacted to this and the success of the FlexPod by launching their own reference architecture called VSPEX to fill this gap in their portfolio.

SO WHICH IS BEST?

There are certainly some for which the product approach is a good fit, and others will opt for the relative flexibility of the reference architecture model. Now everyone with an agenda will try to give their own spin on why reference architecture X or storage solution Y is better so here’s my opinion/spin:  they are all great solutions — you need to find the right fit for your business and technical goals.

We can split hairs and compare EMC Storage to NetApp storage and I’m sure each will find what they believe to be relative advantages. As for VSPEX, yes it does seem to cover a broader array of hardware than the FlexPod, but I happen to be very comfortable with Cisco UCS blades just like the Vblock solution is. And with so many strong management solutions available (Cloupia, CA, BMC, Cisco IA, etc.) it becomes easier to tailor the solution to your needs either as a solutions provider or a IT organization.

A sample FlexPod for VMware configuration

In several ways I believe that EMC’s decision to introduce the VSPEX is a validation of the success of the FlexPod solution which has been extremely well received in the channel.  Some will of course espouse that the VSPEX is somehow superior because it includes more vendors and components, but I think that at a certain point the value of a solution can be diluted.  The components of the FlexPod are certainly best-of -breed including Cisco UCS, VMware vSphere (or Hyper-V), NetApp storage and then customers can select the orchestration and management elements that best align with their needs and goals.  In fact, I really don’t need to state that I think that the FlexPod is a great solution — the clear market success combined with EMC’s decision to attempt to emulate this with their own reference architecture (VSPEX) speaks for itself I think.

When we take our propeller-hats off and take a step back we see that the real value here to the customer isn’t component X, Y or Z, but rather the sum of all of parts in being able to provide private cloud and IaaS solutions which provide captial, operational and strategic benefits to the organization (see “What is Cloud Computing?“). I couldn’t be more excited to have the opportunity to work with NetApp and FlexPod solutions that have been proven to be remarkably successful and have delivered phenominal value to their customers.

VALUE

Your business has a new initiative which requires a web farm of 10 servers, plus an additional 12 servers supporting databases and middleware.  How long would it take to design, purchase, engineer and then provision the resources for this new initiative?  Using the traditional “build-your-own” approach this could take months, significantly impacting the success and trajectory of your business initiatives.  A well designed converged infrastructure such as a FlexPod solution can provide the technical foundation for vastly reducing your “mean time to provision” and improving your business and profitability as a result.

How is it possible that companies like Instagram and DropBox had only 13 and 89 employees respectively at last count?  Who maintains the datacenter, the cooling, the cabling, the halon system, and everything else that goes along with a datacenter?  The answer is that they leverage the cloud.  In this same manner, mid-to-large companies can leverage IaaS using solutions like the FlexPod in a variety of cloud scenarios to achieve the same type of success.  The FlexPod is an outstanding platform towards this end and I’m very excited about working with it, in a variety of cloud scenarios to help customers achieve these benefits.  More to come.

The vExpert Program Just Rocks!

This week when I returned home I found a very nice gift package from VMware for the VMware vExpert Program which included a vExpert laptop bag, luggage tag and certificate.  This was a very nice gift from a wonderful program and I thought I’d take a quick moment to think about the significance of the vExpert program.

One vantage point is to compare the vExpert program with other communities and programs.  I’ve been working with Microsoft technologies for over well over a decade, and while Microsoft did have an MVP program, it seemed (at least at the time) largely based on offering advice on forums.  Many vExperts do this as well, but in my opinion the vExpert program is a dynamic and thriving group that quite simply seems to be far more engaged (especially via social media) than I have seen with other technical communities.  I’ve never personally met another vExpert, but with this thriving community, sometimes it feels as if I have.  Between blogs, user groups, speaking engagements, social media and much more, I just can’t recall having seen a technical community with quite so much passion, energy, expertise and of course – impact.  In my opinion, the VMware and vExpert communities are really something special, and this reflects very favorably on VMware, the community and those who were instrumental in cultivating and developing the community (and I can’t finish that sentence without also mentioning John Troyer for his instrumental role in developing and fostering this community).  I simply can’t think of another IT-technical community that is even close to being as engaged and kinetic as this one is.

The vExpert program of course has rewards that go well beyond being a part of a community of so many inspiring and great people.  Some of my personal favorite perks beyond the community itself are the ability to access VMworld presentations and especially VMware trial licenses so that I can expand my learning and exposure in my home lab (and sometimes blog about it).

What about you?  The simple reason that I had the privilege of being a part of the vExpert community is because one day I decided that I had experience, knowledge and thoughts that I wanted to share, and I went out to start a blog where I could do just that.  Perhaps you have some valuable experiences, ideas and thoughts as well that can promote the community as well as yourself in the process.  Get involved in your VMUG, start a blog, or whatever community activities that are available to you to share your experiences and expertise.  You’ll never know what could come of it unless you try.

Personal notes

I was very humbled to have been accepted into the VMware vExpert program in 2011 and I’m hoping again to be considered for the honor again this year.  As some of you who watch my tweet stream know, I haven’t been very satisfied with either the quality or quantity of my blog posts this past year.  Being on the sidelines for a significant period of time, denied me access to experience as well as technology and it left me with a bit less to blog about (especially on products I wasn’t able to gain much exposure to beyond lab experimentation).  I also had several “big idea” blog posts and/or presentations that I simply never got around to, ranging on topics from converged infrastructure, to private/public/vCloud trends, presentations and much more.  On the bright side perhaps I can find the time to work on a few of these old ideas as well as some new ideas now that I’m no longer on the sidelines.  I hope that in the coming week and months I’ll be able to be making contributions which will both be of value to the community as well as something that I will find satisfaction with.

In summary I think the vExpert community is really something special and I’m looking forward to more great people joining the vExpert community and hopefully increasing my own contributions as well.  If you think you may have something to offer, I’d like to encourage you to step up and share your gifts, thoughts and experiences.  Perhaps one day you can join the ranks of the vExperts as well!

Dell XPSz Laptop (It’s Thin and Fast)

Many of us in the IT field need laptops for our profession so I thought I would review my experience so far with the MacBook-like Dell XPSz.

I recently accepted a new opportunity which required me to provide my own laptop.  I had an existing line of credit with Dell (plus I’ve had good experiences with their products) so I went to explore their laptop lineup which is when I noticed the newer XPSz model.

Obligatory Platform Tangent:  While I do have an iPad, it is the only Apple product I have.  Many seem to be enthusiastic about the Apple Mac, but I don’t mind the “complexity” of Windows.  In fact I’d rather have a platform where the complexity is available for me to work with, rather than hidden from me, even though the interface might be elegant.  When working in IT there’s simply better support/compatibility for Windows today and so far I just haven’t seen a compelling reason to switch to Apple.

The XPSz appears to be modeled directly after the Apple MacBook Pro.  They both have a metallic finish, are exceptionally thin, and share similar stylings.  Now this comparison is not totally fair as the Apple MacBook has a true unibody design while the Dell XPsz tries to emulate this look without having a true unibody frame.  However, as you’ll see later this is a tradeoff I’m comfortable with.

Here’s side-by-side shots of the two systems from the side:

The MacBook from the top:

The Dell XPSz from the top:

As you can see the visual appearance is very similar.  The keyboard layout is practically identical and both have a very slim profile with the metallic chrome finish.

I selected the top-of-the line XPSz which included a 250GB Samsung SSD and the computer is blazing fast.  The time it takes from when I initiate a restart command to when I am back at a login prompt?  22 seconds.   The computer is simply blazing fast and if I need more space, I can always connect a drive to one of the USB 3.0 or eSATA ports.

One thing that took some adjustment to was the keyboard.  The keyboard is very nicely backlit but has a different feedback and texture to it (I don’t know how the MacBook compares here, but visually the keyboards appear to be identical).  After a week of using it however, I’ve gotten accustomed to the new feel and I find it quite durable as well.

As a point of interest and for my own curiousity I decided to compare several qualities between the Dell XPSz and an equivalently matched MacBook Pro

Dell XPS 15zMacBook Pro 15
ProcessorIntel i7 2.8 GhzIntel i7 2.5 Ghz
RAM8GB DDR3 1333Mhz8GB DDR3 1333Mhz
Display1920 x 10801680 x 1050
Disk250GB SSD250GB SSD
GraphicsNVIDIA® GeForce® GT 525M 2GBAMD Radeon 1GB
Weight5.54 lbs5.6 lbs
Height0.97 inches0.95 inches
List Price$1599$3,249

You can draw your own conclusions from looking at the features and price above.  I think the Dell XPS 15z is a great laptop for the money.  I am enjoying mine and I highly recommend it.

SQL 2012 (Denali) Enables Exciting New HA Scenarios

Up until now there’s been several ways to try to achieve high availability with SQL Server but all present significant complications.  There’s SQL clustering, but this is expensive and still only leaves us with one copy of the database (shared storage).  There’s mirroring, but this is done on a per-database level and presents challenges with some applications.  There’s log shipping, which is somewhat difficult to setup and requires manual effort for failover and other challenges.

There are also other challenges around disk performance and availability.  In a previous post, I wrote about some PCI Flash storage solutions that can be used to improve performance, but there are challenges here with vMotion and more.

What if the best of clustering, mirroring and log shipping could be combined into a new HA paradigm, and that this paradigm could also solve some problems around PCI Flash (i.e. vMotion) as well?  I don’t want to present SQL 2012’s Always-On feature as a magic bullet, but it seems to me that there’s a lot here to be excited about.

To start with, SQL 2012 is based on the cluster concept so this doesn’t go away.  But on top of the clustering foundation, SQL 2012 will replicate databases (and groups of databases) to other hosts in the cluster (no shared storage).  In addition, the passive copies of the data are accessible and could be used for searches and queries.  You can now essentially have one “write” copy of the database and multiple “read only” copies of those databases within the cluster that are ready to automatically fail over and become primary if needed.

This opens up many new possibilities for scalability, performance and HA.  You can now select a group of databases (databases managed as a single logical unit) to protect and have automatically replicated to other host(s) within the cluster.  Because shared storage is not being used, any corruption or availability issues at the DB/LUN levels do not need to incur downtime.  And these secondary copies can be configured to be accessible for reporting and other read-only functions – allowing such activity to take place without adding any load on the primary DB server.  You could even configure a host for asynchronous replication to create a remote replica without slowing down the primary DB/host.

And as for the challenges with PCI Flash cards and vMotion, that problem pretty much goes away.  You can now leverage PCI Flash in your host servers to maximize I/O performance and allow SQL 2012 Always On to provide the HA (via replication) and not need be concerned so much about trying to vMotion across non-shared-storage (PCI Flash).

I haven’t had the opportunity to work with SQL 2012 yet but it sounds very promising and I am very much looking forward to the new possibilities.  Microsoft’s SQL 2012 virtual launch event is scheduled for March 7, 2012 and more details are available here. 

Installing VMware Tools on CentOS

There’s more than one way of course but I ended up using the following series of commands with good results.

Before you begin make sure that the proper Linux distro is selected in the virtual machine’s configuration (options tab, guest operating system) and that you initiate the VMware Tools install (which presents the appropriate CDROM to the VM).  Then enter the following commands:

mkdir /media/cdrom
mount /dev/cdrom /media/cdrom
cd /tmp
tar –xf /media/cdrom/VM*
cd vm*
./vmware-install.pl

At this point you should be able to follow the prompts.  On a few older distros I found that I had to add the “z” option to the tar command in order for it to work properly.

As Bob Plankers (@plankers) mentioned, you may also want to run the following command to clean up the TMP directory (unless you use other methods like tmpclean):

rm -r /tmp/vmware-tools-distrib

Blue Shift Is Now 100% Open Source!

Blue Shift is now open source!

I am pleased to announce that Blue Shift as an online hosted service is now 100% open source!  We here at Blue Shift are excited that this move will empower our brand, reduce costs, and allow us to provide a better service to you all while opening up our broader vision to the community.

As other Internet pioneers have done, anyone can now access our HTML source code and improve it!  We invite developers to look at our source code and find ways to improve it and improve our community.  After all, you don’t want to use a hosted service that’s not “open”, do you?

VENDOR LOCK-IN

Vendor lock-in can be a big concern and we see our commitment to open source and open principles as a solution to this threat.  If you were building a house, you wouldn’t want to limit yourself to only Andersen windows and Kohler plumbing, would you?  Vendor lock-in can be scary!  Instead of relying on a single solution and single point of support we will be leveraging a myriad of community based tools to make sure our service remains as “open” as possible!”

We are excited about being able to slap the “100% Open Source” and “100% Cloud Powered” logos on our site and you should be just as excited as we are!

WHAT’S THAT?

This isn’t what you wanted?  Maybe what you really wanted instead of an “open” cloud infrastructure was a reliable foundation that you could effectively control along with open APIs that others can tap into and leverage in order to improve and extend the solution?  And maybe even have seamless portability between clouds?

NOTE:  This is not a rant against open source per se – most of us are familiar with the value of LAMP stacks and more for example.  But when it comes to IaaS/PaaS, exactly what kind of “open” do we really want and need?

Building Blocks of SAN 2.0 — Flash, Thunder & Lightning

EMC Adds To Their Flash Storage Solutions with “Thunder” and “Lightning”

This week EMC announced two new flash based storage products, appropriately codenamed Lightning and Thunder.  There’s a recorded webcast here, and also excellent posts by Chuck Hollis (1, 2, and 3) and Greg Schulz  (1 and 2) and also Chad Sakac on the new solutions.  I thought I’d take a moment as well to look at these announcements from perhaps a slightly different perspective, using other products as a starting point.

[Standard disclaimer:  I’m just an IT guy interested in technology sharing my observations and opinions.   Please feel to offer comments, corrections and concerns by commenting at the end of this post]

The Need for Speed

This part is pretty simple.  With Moore’s Law CPU capabilities have roughly doubled every 18 months, but what about storage?  We have faster and faster processors driving more and more storage transactions (IOPS), but there hasn’t been matching improvement in the storage arena to keep up with the increasing demand for IOPS.  Flash-based storage – which we’ve seen in everything from phones, tablets and SSDs – just may be the paradigm shift that’s needed to help storage catch up to the demands being placed upon it.

Not only is flash storage faster, but by the terabyte it’s much smaller and uses less power than an array of spinning hard drive platters.  When you think about how SANs have evolved over the past few decades, it’s not inconceivable that flash could essentially become the basis for SAN 2.0.  It may be a while before the economics and technology and maturity of solutions completely replace hard drives, but for now we are seeing flash storage being introduced in strategic locations in the storage ecosystem and that is what Lightning and Thunder are all about.

The PCI Flash Card

FusionIO is the most popular PCI Flash solution today with cards that can provide over 500,000 IOPS and very low latency – so low that it rather than being measured in milliseconds it is measured in microseconds.  Current FusionIO cards have read and write latencies of less than 50 microseconds.

Part of the low latency comes from the fact that the storage is on the PCI bus and close to the CPU and memory.  While generally speaking, transports like fiber channel have low latency, latency will generally increase when it hits the wire, and the most demanding high-transaction environments will “feel” this latency difference.

I had the opportunity to witness the impact of SQL database being relocated from a SAN environment to a PCI based FusionIO card in an ESXi 5.0 environment and the impact was profound.  Both data throughput and IOPS were increased by 400% from the previous SAN configuration (SATA disk) and even more at peak levels.  As great as this solution was from a performance perspective, it also introduced new challenges.

The Shared Storage Problem

Using a PCIe based card for storage now meant that the storage was essentially captive to the host server.  The storage could not be shared with other hosts, making things which we sometimes take for granted like vMotion and high-availability a challenge, in addition to giving up additional capabilities that may have been offered by the storage array.  Fortunately there is a way to work around this somewhat.

What if we moved the databases back to the SAN, and instead used the PCIe Flash cards as a read-only cache?  This way the databases are on shared storage, enabling again possibilities for vMotion, high availability and more.  This is exactly what FusionIO has done with their IOTurbine product.

The IOTurbine solution effectively converts the FusionIO PCI card in the host into a read-only cache to accelerate the database performance.  A driver is installed inside the guest OS of the VM which enables it to use leverage the FusionIO card as cache as well as being compatible and transparent with vMotion.  As virtual machines are vMotioned to different hosts, the cache on the PCI Flash card will adjust over time to optimize itself for the workloads currently running on that host.

You might be thinking “this is all great, but what about my write performance?”.  It is true that in this configuration write performance is not directly accelerated, but it may very well be indirectly accelerated.  All the read-based IOPS that are served up by the PCI Flash card, have been offloaded from both your SAN and your SAN transport (i.e. fiber channel), enabling more IOPS and bandwidth for your write transactions.  I’ve read reports that some have experienced improvements in their write performance by as much as 300% in their environments due to this phenomenon.  The other consideration is your read-to-write ratio.  If 85% of your IOPS are reads and 15% are writes for example, you should see an excellent performance increase from this type of architecture.

So perhaps we can have the best of both worlds – PCI based Flash storage (used as read-only cache) and still have all the benefits of shared storage, enabling vMotion, high-availability and more.  With this foundation lets take a look at EMC’s “Thunder” and “Lightning”.

Thunder and Lightning

EMC has been a storage leader for some time now, so if Flash is going to play a key role in storage in the future, EMC will need to increase their investments in this area.  Flash storage is not new to EMC.  In 2010, EMC introduced flash storage as the “gold” tier on storage arrays using FAST Cache, which basically moves the “hottest” blocks on the array to flash storage in order to boost performance.  In fact EMC claims that 1.3 Exabytes of flash storage are running under EMC’s FAST solution today.  EMC is now introducing new flash solutions strategically within the storage chain to further improve options for performance.

The first product is VFCache (codename Lightning) which uses a Micron PCI Flash card in much the same manner as the read-only cache scenario discussed earlier.  The VFCache card will vastly accelerate reads, while allowing writes to safely pass through to the storage array.  Based on my own experience with PCI Flash cards I have no reason to doubt EMC’s claims that augmenting a VNX SAN with VFCache delivers performance increases of 201% and 260% respectively for Oracle and Microsoft SQL.  What did raise my eyebrow a bit was the slide below:

Note that while mostly similar, the FusionIO card selected is only PCIe 4x while the EMC/Micron card is PCIe 8x.  FusionIO does have PCIe 8x cards (ioDrive2 Duo) which FusionIO scored as significantly faster than their 4x counterparts.  Taking all this into consideration, I’m tempted to postulate that to the extent the hardware is equal, the difference between the two solutions may not be quite as large as it may be suggested above.  None the less, both solutions are certain to provide a substantial performance boost in the host systems in which it is deployed.

Like the FusionIO solution, EMC’s VFCache uses a filter driver within the guest operating system, which allows the flash cache to be targeted and isolated to specific workloads.  It also features a vCenter plugin which allows for the cache settings to be modified as well as to display some related metrics.

What caught me a bit off guard about EMC’s VFCache solution is its less than complete support for vMotion today (although I’m sure this will change).  With the 1.0 release of VFCache it seems that vMotion to another host with a VFCache card is only possible with a series of scripts (provided by EMC) which basically disables the cache in the VM, relocates the cache, and then re-enables the cache in the VM (this is detailed better here).  In contrast, the IOTurbine solution (as I understand it) will operate a bit more transparently with the caveat that the cache on the “new” host will have to be repopulated overtime with the “correct” blocks in order for the previous performance level to be fully realized.  (As a side note I’m speculating that it wouldn’t be terribly difficult to configure the environment to support vMotion to a host server without a VFCache card and forgo the performance benefits temporarily).

Having said all this, VFCache is definitely a 1.0 solution today and EMC has many more capabilities they intend to introduce in the future including:

  • Mezzanine VFCache cards for blade servers (i.e. Cisco UCS)
  • De-Duplication in the card to increase logical capacity
  • Distributed-Cache (this could facilitate a seamless vMotion scenario with no temporary performance loss).
  • More focused caching algorithms and larger capacity.

In a nutshell that’s Lightning.  Here’s a short video review by Demartek that breaks down the Lightning / VFCache solution:

THUNDER

As mentioned before, you get the best performance when the storage is close to the CPU and memory of the host system, but for many workloads the latency of a more traditional over-the-wire transport will be adequate.   For this scenario EMC will be introducing later this year “Thunder” – which is basically a 2U or 4U flash-based SAN.    Details on the specs aren’t available at this time but it sounds as if the appliances can be “stacked” in order to provide terabytes of networked flash storage.

If I took notes properly, I believe that both Ethernet and InfiniBand based interfaces will be available for connecting to “Thunder” based flash storage.   There’s also an RDMA over Ethernet (RoE) possibility which it would be interesting to see performance measures of as they become available, as this scenario would allow for the converged networking trend which began in earnest with FCoE (Fiber Channel over Ethernet).  And of course in those cases where latency needs to be under 50 microseconds, Thunder can be augmented with “Lightning” to provide that high-performance/low-latency performance profile.

WHAT IT ALL MEANS

It took a long time for SANs to develop and evolve to where they are today.  Similarly a transition away from the traditional hard drive in the enterprise storage array is not going to happen overnight.  But as a leader in storage, EMC is strategically introducing flash solutions at various points in the storage chain (FAST, full array, and PCI bus) in order to alleviate the storage performance gap that many environments have experienced.  Of course, EMC is not the first first vendor to offer a comprehensive Flash PCI card based solution, but it will be interesting to see how EMC continues to develop the solution and integrates it within the broader EMC, VMware and even Vblock ecosystem over time.

VFCache is a 1.0 solution today, and EMC has shared their development roadmap in which they plan to continue to invest in the platform, making improvements regarding capacity, VMware integration, performance and more.  And I would certainly expect at some point to see Lightning and/or Thunder make appearances within the Vblock product line as well.  Flash – when properly positioned within the storage chain – has the potential to solve many of today’s storage performance challenges and it will be exciting to see these new solutions continue to be developed and introduced in the coming year(s).

What’s your take on the future of flash?  Did I get it right?  Join the conversation with any questions, comments or concerns below!

VMware vSphere Distributed Switch Best Practices

A few days ago I shared a link to VMware’s latest whitepaper on Twitter and many others have as well.  Due to the somewhat temporary and transient nature of Twitter I thought it would be good to post a more permanent link to the white paper here.

This really is a very strong design document and whitepaper that goes well beyond just a few best practices.  If you work with vSphere Distributed Switches at all, you’ll definitely want to take some time to read through this document.

VMware vSphere Distributed Switch Best Practices 

http://www.vmware.com/resources/techresources/10250