vSphere 6.0 Public Beta — Sign Up to Learn What’s New

Yesterday, VMware announced the public availability of vSphere 6.0 Beta 2.  I can’t tell you what’s all in it due to the NDA, but you can still register for the beta yourself, read about what’s new and download the code for your home lab. There’s some pretty exciting stuff being added to vSphere 6.0 in

v6v6

Will VMware Start Selling Hardware? Meet MARVIN

The Register is running a story that VMware is preparing to launch a line of hardware servers.

marvinmarvin

VMware Pursues SDN With Upcoming NSX Offering

Earlier this week VMware announced VMware NSX – an upcoming offering that takes network virtualization to new levels. NSX appears to be somewhat of a fusion between Nicria’s SDN technology (acquired last year by VMware) and vCloud Network and Security (vCNS – formerly known as vShield App and Edge). Since I already had intentions to

NSX2NSX2

What Really Is Cloud Computing? (Triple-A Cloud)

What is cloud computing?  Ask a consumer, CIO, and salesman and you’ll likely get widely varying responses. The consumer will typically think of the cloud as a hosted service, such as Apple’s iCloud, or uploading pictures to Photobucket, and scores more of like services (just keep in mind that several such services existed before it

3pillars_f3pillars_f

Agility Part 2 — The Evolution of Value in the Private Cloud

When an IT project is commissioned it can be backed by a number of different statements such as: “It will reduce our TCO” “This is a strategic initiative” “The ROI is compelling” “There’s funds in the budget” “Our competitors are doing it” Some of these are better reasons than others, but here’s a question.  Imagine a

agility2agility2

Stacks, the Vblock and Value — A Chat with EMC’s Chad Sakac

…I reached out to EMC’s Chad Sakac to gain more insights from his perspective on how the various stacks…well…stacked up….

stacksstacks

Should You Virtualize vCenter Server (and everything else?)

When concerns are raised around virtualizing vCenter Server, in my experience they usually revolve around either performance and/or out-of-band management. The VROOM! blog at VMware just published a whitepaper that looks closely at vCenter Server performance as a VM versus native (physical) which speaks to these concerns as well as for other workloads. vCenter Performance

vcenter_virtvcenter_virt

Can your VM be restored? VSS and VMware — Part 2 (updated)

The backup job for your VM completed successfully so the backup is good, right? Unfortunately it’s not that simple and a failure to effectively deal with VM backups can result in data loss and perhaps even legal consequences.

vss2vss2

Mortgages in post-TARP America (and some venting)

This is a non-tech post but wanted to share some quick facts about our mortgage horror story (and also to vent a bit so that I can move on). If you’ve ever wondered how it could be possible to pay two-thirds the value of a tiny summer cottage built 90 years ago and still owe the bank over a half-million dollars, read on.

There’s A LOT of information here but I want to share just a small sliver to show just how ridiculous this is. I’ve heard people say things along the lines of “people who are in bad situations did dumb things to get there”. Let me know if any of you still feel that way in a few paragraphs. Before I go into the mortgage history there’s some necessary background to explain how this situation developed in the first place.

Our daughter had had extremely unusual medical condition and we were advised to relocate to a short list of hospitals which had the staff to appropriately care for her. We found a small 1BR cottage (600 sq. ft.) available for rent which was close to a support system (which turned out to essential and may have saved her life).

At one point my wife needed an emergency appendectomy but was errantly sent home by the hospital. Different doctors called the next day and said, “I read your scans, please get here ASAP”. Long story short, the doctors believed she was within minutes of having large amounts of toxins released. The doctor said “I’ll understand if you want to sue” and we explained that we were grateful for their emergency work and that we weren’t like that. We tried to pay the doctors first and then the hospital (several night stay) with what disposable income we had. The week that the statute of limitations expired, the hospital sued us for a VERY large amount and moved to have my wages garnished.

We got legal help and filed for bankruptcy. When we were before the judge, we presented less than $4,000 of credit card debt. The judge was incredulous that we had no more consumer debt to claim and asked several times for more debt to claim. We didn’t want to claim our vehicles, so we only claimed $4K in credit cards and the hospital bill.

Not even one month after being discharged for bankruptcy, our landlord came to us and said “I need to sell this property. Either you buy it or I will sell it to someone else who will evict you.” Not an ideal situation but we had to put our daughter first – moving at that time was not an option. We reached out to our bankruptcy attorney who advised that if we could afford insane mortgage payments for the first year and make every payment that we could later refinance. So we entered into a mortgage at 10.8% plus paying an additional 3% in pre-paid interest on this premise that we could refinance. Per the Truth In Lending statement we would have paid over one million dollars in INTEREST ALONE over the 30 year term of the mortgage.

With that background, the below table summarizes the initial mortgage and every modification that was available along the way:

  Accepted Increase in Debt Owed Interest Rate Percentage Decrease in Monthly Payment Break Even Point
Origination 10.8% NA NA
Mod 1 Paid $200; denied by bank $6,000 6.75% 35.4% 6 months
HOPE Yes $12,985 9.125% 5.3% 6 years
Mod 2 No $20,436 8% increasing annually 28.9% 2 years
HAMP No $37,720 7.25% 1.2% 87 years

 

In the first 18 months of the mortgage we paid to the bank exactly 66% of what the house was appraised for just this summer. Go back and read that sentence again to let it sink in. After countless calls to the bank we finally got pre-approved for a modification (Mod 1) that would have been sustainable. It would have reduced our monthly payments by over one-third. We paid $200 to lock in this deal, but the bank denied it. Why? Because there wasn’t enough equity in the mortgage and therefore they couldn’t add more debt to the mortgage. Remember that when we look at the next mods.

The HOPE mod was a pre-TARP Federal program. It added $13,000 in debt to be financed. The savings from the reduced mortgage payments would have taken 6 years just to break even on this “deal” but we accepted because we were desperate enough that a 5% reduction in monthly payments would help us keep food on the table.

We kept going back to the bank asking for help and we were offered a mod (Mod 2) that would have added an ADDITIONAL $20,000 to our mortgage. We said no thanks.

Then we applied for HAMP which is a TARP program for which banks were given BILLIONS in taxpayer dollars. While the banks accepted the taxpayer dollars they had great latitude in which to implement the program for their own interest. The HAMP solution would have added over $37,000 in ADDITIONAL debt while reducing monthly mortgage payments by 1.2%. It would have taken 87 years for the savings from the reduced mortgage payments to cover JUST the additional debt that was added. Just to be clear that’s an 87 year payback on just the mod for a 30 year mortgage from a bank that accepted tens of billions in taxpayer TARP dollars.

So far I’ve only covered the modifications and none of the countless examples of bad faith by the bank. Just two quick stories. For one of these deals we received a FedEx letter on December 28 to announce a deal and we had to return signed by January 3 – making any legal or professional review impossible.

On another incident we never received the package and the bank said this would be our only offer before legal action. We never received the package. Long story short we had to work with our Congressman who was eventually able to find the tracking number and a manager at the bank was absolutely livid that this information got out. The tracking revealed that the letter made it to a distribution facility near us before the bank RECALLED it and it was sent back to their facility. The bank knew how much we already paid into the mortgage in a very short time, so they figured they try to have their cake and eat it took and finish what they’ve been doing to us for years.

And that’s just a taste of the background that led us to current situation. We went to court and he had no opportunity to present our evidence – we were forced to either accept a new modification or be evicted AND owe the bank a half-million dollars. So after paying 66% of the appraised value in 3 years what would be the new mod? To now finance an amount that is 35% larger than what the property was appraised at. When we add the legal fees we now owe, basically our monthly payments are the same as when the load was originated and we still have no equity AND an underwater mortgage.

For me it’s not so much about the money, but the impact on my family. Some might say “living in a small house is cute” but this is a bit beyond that. Meals are eating kneeling on the floor and when the refrigerator door is open no one can move between any rooms. Inconveniences yes, and things could be far worse, but we now have 3 children and living in this space is beyond “dysfunctional” or “inconvenient”. It affects our moods. It affects our time. It affects our studies and our careers. It affects how we raise our children. Many of the things I always imagined doing with my children are simply not possible in this environment. And now the bank – with the help of the court system – has essentially forced us to stay here and to spend every waking hour working to try to keep this one roof over our heads.

So yes, it’s a very demoralizing condition and occasionally I vent about it, but I try to remind myself to be thankful for our health and all that we do have. I’m going to try to not vent as much more now that I’ve gotten this out, but if I do I hope you’ll understand and forgive me. The other thing that drives me nuts is the inference I hear on occasion that “they made bad choices”. There was only one point where we ever really had any choice – and we made the only and right one.

VMware vSphere 6 — What’s New?

v6vSphere 6 has been in public beta for several months now and this week at VMworld some of the new capabilities are now public. vSphere 6 remains in beta for a future release (sign up here!), but let’s take a quick look at some of the new features that have been announced (so far)

SMP for Fault Tolerance

Just a quick overview here. Fault Tolerance is a pretty neat feature that can keep a second copy of a VM in complete lockstep for HA purposes. The second VM has it’s own VMDKs which can sit on a different datastore or SAN, while each CPU transaction is maintained on both servers. This is a great way to provide redundancy for applications which can’t afford to lose cycles during a fail-over event, but the Achilles heel was always that it was limited to a single vCPU.

v6_FT

VMware announced earlier this year that it would be discontinuing vSphere Heartbeat and now we know why. With Fault Tolerance being able to support VMs with up to 4 vCPUs in vSphere 6, it would no longer be necessary for high availability to be provided by in-OS clustering.  VMs of up to 4 vCPUs and 64GB of RAM can now enjoy the benefits of VMware Fault Tolerance.

vMotion Improvements

Some of the vMotion improvements announced include being able to vMotion across difference vCenter instances, across routed networks (this “may” work now but was never formally supported), and perhaps most importantly long distance vMotion.

BwC4PLgIQAAsoo7The latency tolerance for vMotion will be increased from 10ms to 100ms in vSphere 6! With this generous of a tolerance for latency so many more vMotion scenarios would now be a possibility without the normal geographic penalties. Personally, I think VMware should demonstrate this capability by vMotioning a VM to an EVO:RAIL cluster in a hot air balloon with a 4G LTE wireless network.

vVOLS

This is a huge feature in my opinion – a whole evolution beyond what VAAI introduced — and rather than try to drill deep here I’ll try to stick to a simple overview. A vVol is a new logical construct that appears as a datastore in your admin tools, which allows the virtual disk to be a “first class citizen” in storage (versus the LUN or volume). A vVol does not use VMFS but is a new abstraction layer that enables object based storage access (with your VMDKs being the objects).

I found a good illustration of a “before and after” view of all these pieces on Greg Schulz’ StorageIO blog which are shown below:

BEFORE vVOLs

BEFORE vVOLs

WITH vVOLs

WITH vVOLs

 

There’s several things going on here which I’ll just quickly touch on. First there is one protocol end point now versus many as illustrated below. This enables more API capabilities to be exposed and if I understand correctly, VMware has plans to allow third parties to develop filter APIs here.

download (2)

Protocols are consolidated into a single endpoint

 

vVOLS are hardware integrated much like VAAI which means the storage vendors will develop their definitions for the API to activate the capabilities of their storage arrays. For example one capability is the ability to offload the snapshot function from a copy-on write flat file – to have the storage array handle it. While snapshots are an awesome feature of vSphere (which are not backups by the way), I’m not a big fan of the copy-on-write delta file method. I’ve seen snap chains 40+ levels deep (without anyone knowing) and snaps that were left open for months until the datastore filled up. By offloading snapshots and other operations to the storage array these things can be handled a lot more effectively.

I didn’t even get to storage profiles yet which allow to define what characteristics a certain VMDK should have. There’s many scenarios here but at a high level just removing the complexity of LUNs and RAID characteristics from admins is a big deal. When a VM is provisioned the admin needs only to select the storage policy (or one is forced for them) and the desired settings are enforced without the complexity being visible.

With that very basic into a highly encourage you to read one or more of the following blog posts which go FAR deeper into vVOLs, how they work, and their benefits.

Also check out what EMC, NetApp and Nimble Storage are doing with vVols, just to name a few.

Virtual Datacenter

This is a new logical construct within vSphere which allows you to enjoin multiple vSphere clusters into one construct to force consistent policy settings, provide a top level management point and facilitate cross-cluster vMotion.

Improved Web Client

The web client has significantly improved with each release but many (like me) find the web client to be a bit slow at times. It’s clear that VMware has spent some time on this as from using the beta I can assure you that there is a significant improvement in response time between the 6.0 and 5.5 web clients.

SUMMARY

That’s just a quick summary of some of the features that were mentioned in the general session. Even more details should be available over time as vSphere 6 grows closer and closer to a GA (General Availability) release. If you’re anything like me, you probably can’t wait for vSphere 6 – perhaps the biggest feature I’m looking forward to is vVols. Until then, happy virtualizing!

VMware Announces EVO:RAIL for the Software Defined Data Center

VMW-LOGO-EVO-Rail-108-300x278The much rumored “MARVIN” has manifested today as EVO:RAIL which represents VMware’s entry into the “Infrastructure In-A-Box” or hyper-converged market.

Each “RAIL” consists of a block of four (4) x86 rack mount servers available from a list of partners, with VMware vSphere and VSAN. Because EVO can scale this solution will likely find acceptance in both branch offices as well as some larger scale-out designs – all with an HTML5 front end. Customers can now simply procure virtualization infrastructure — including storage — by purchasing multiple “RAILs” as needed for scale

Screenshot_9

“Shall we dance?”

This is a truly a software defined infrastructure solution which enables IT shops to procure infrastructure through a single vendor and scale-out as needed. Nutanix was the first to find success with this business model, and others will be sure to follow (also see Cisco and Simplivity) . I expect that this will be an increasingly popular (and disruptive) trend in the marketplace.

Also EVO RACK will be announced as being in the tech preview stage which will be intended to scale to multiple server racks of SDDC infrastructure.

Screenshot_5

More details will be announced later in the day, but for now be sure to check out Duncan Epping’s announcement post as well as VMware’s EVO:RAILS site for more details.

UPDATE:  Also see VMware CTO Chris Wolf’s announcement post on EVO RAIL here.vmw-evo-rail-screen-2

Get Excited for VMworld 2014!

picard_giddy

Captain Picard can’t contain his enthusiasm for VMworld 2014

It’s the season for VMworld and all of us are getting a bit excited. I’ve never been to VMworld (and won’t this year either) but I’m still quite excited about what this VMworld will bring. Why? I’m glad you asked.

The two big reasons are what’s going to be announced/revealed as well as all the great ways to follow VMworld remotely (I’m am expert at this now!). My mind is already racing about designs, use cases and planning around deploying several elements that these new capabilities we expect to be announced.

vSphere 6

This is all under NDA so we can’t talk about all the exciting new capabilities just yet, but if you’ve participated in the vSphere 6 beta you know that there’s some pretty major features we can expect to be announced here and possibly a surprise or two yet. One of the features we do know a bit about are….

vVOLs

vVOLS aren’t really a new concept as it was introduced at VMworld 2012 as a preview of where VMware would be going with storage. Over two years in the making and now with the overwhelming support of VMware’s storage partners (EMC, NetApp, Nimble Storage and more) vVols are poised to make a big splash. More on this after the embargo is lifted but here’s some available content from VMware on vVols until then.

Infrastructure-In-a-Box

Call it hyperscale, scale-out, or software defined (all of these work) but we are basically talking about modular hardware sold as single units which can be enjoined to form large pools of vSphere infrastructure. We’re not just talking about vSphere here, but also software defined storage (i.e. VSAN) and possibly SDN as well (i.e. NSX). Nutanix is one vendor who already sells hardware based on this model and quite successfully.

There’s been rumors all summer about VMware offering such a single box model which has so far been named MARVIN, Magic and Mystic if I’m not mistaken. Now we’ll get a chance to see the details behind what may be VMware’s entry into the hardware market. Also I wouldn’t be surprised at all to see some other big names making similar moves in this new and growing space.

PernixData FVP 2.0

I posted on PernixData FVP 1.5 here (which won “Best New Product” at VMworld 2013 and 2.0 will be a big jump with some exciting features – including the ability to use memory on your ESXi hosts as an acceleration tier (read cache and clustered write offloading).

PernixData is planning on having a big presence at VMworld this year – be sure to check them out

vCloud Air

Today VMware announced the re-branding of vCloud Hybrid Service (vCHS) as vCloud Air and also introduced some new vCloud Air pricing calculators. I think VMware has a growing story here with their public cloud offering and it’s integration with vSphere based on-prem private clouds.

A growing differentiation point with cloud providers is services on top of the stack and VMware recently introduced disaster recovery a few months ago. I hope to see some enhancements and possibly even new services and/or pricing options announced at VMworld.

Sessions

Sessions are a huge part of the value of VMworld. These sessions are recorded and (in time) are made available for on-demand playback (access required). Duncan Epping has taken the time to highlight some of this year’s “must attend” VMworld sessions here.

Keeping Up Remotely

Like I said I’m an expert on this. Several of the general sessions will be available via live stream and there’s twitter and bloggers as well. It’s not the same as being there but it’s not hard to keep up with some of the details and big news either.

All the details on VMworld social media from hashtags to bloggers and more are available here. Also don’t forget the official VMworld app and the live stream of the general sessions.

Looking forward to a great VMworld and some exciting new solutions and offerings that will help us solve problems, fill gaps and create value. Have an enjoyable and safe VMworld whether your attending in person or remotely!

Software Defined Speed — A Look at PernixData FVP

pernixdata_logoPernixData FVP is a solution I’ve worked with in one environment for perhaps the past 6 months or so. I’ve been meaning to write about it (more than just tweets anyway) for some time, but I’m first now getting around to it.

The first question of course is “what does PernixData FVP do and why might I want it in my vSphere infrastructure?”. The short answer I usually give is that it’s Nitrus Oxide for your storage tier – just add FVP to your existing storage infrastructure and enjoy the speed (plus it’s legal)!

The longer answer is a bit more detailed than that, and first it would be helpful to have a quick overview of various storage architectures.

STORAGE ARCHITECTURES

Traditional Storage Array

sanstockHere we are talking about hardware that is designed to offer up storage via usually fiber channel, iSCSI or NFS protocols. For the purposes of this article, most any hardware based storage array from NetApp, EMC, Nimble Storage, HP, Dell and many others fits this definition.  This is a tried and true design, but as our capacity and performance needs grow, scale-out ability can become an issue in some environments (especially Google, Facebook, etc.).  In fairness some storage array vendors have implemented scale-out capabilities into their solutions, but for our purposes here I am simply trying to build a distinction between architectures at a VERY high level.

Hyper-Scale

Remember scale-out NFS and Hadoop? These designs typically did not rely on a monolithic storage array but multiple nodes using direct-attached storage and logically joined by…software.  First we had “software defined” compute with VMware abstracting the CPU and memory resources of server hardware.  Now we are abstracting at the storage controller level as well to unlock more potential.

Recently several vendors have had success with incorporating Hyper-Scale concepts into virtual storage arrays for vSphere, including Nutanix, VMware (VSAN), Simplivity, and more. Hyper-scale infrastructure is truly “software defined” as software and logical controllers are the key to making this distributed and scalable architecture work.

Screenshot_74

Click to enlarge

Occasionally this design is referred to as “Web Scale” as it does invoke a highly parallel environment designed for scale, but I prefer the term Hyper-Scale for several reasons, including that the use cases go far beyond just “web”. We’re talking about applying web scale principles to present “software defined storage”.

Considerations with Hyper-Scale

If write activity is in progress on a server node and it crashes hard before the data is replicated, what happens? (the answer is “nothing good”). The solution here is to write in parallel to two or more nodes (depending on your tolerance for failure settings). This is why a 10GB or better backbone is critical for hyper-scale designs – every write needs to be copied to at least one more host before it is considered to be committed.

Another consideration is locality to processor. For some applications anything under 20ms of latency is “adequate”, but some mission critical OLTP systems measure latency in the fractions of milliseconds. For these applications, latency can be significantly reduced by having the data closer to the CPU rather than having to fetch it from other nodes (more on this later).

Enter PernixData FVP

So let’s say you have an existing vSphere infrastructure and you have a storage array that while it could benefit from better performance, you are otherwise comfortable with. With PernixData FVP you can keep your existing storage array — eliminating the CAPEX burden of a new storage array — and accelerate it by decoupling performance from the storage array onto a new logical “flash cluster” that transcends your server nodes.

Screenshot_75

Click to enlarge

There are other solutions for adding flash-based read cache to your environment including vSphere’s vFlash capability, but most are local only (no flash cluster concept) and don’t offer the ability to cache writes.  PernixData FVP is unique in my experience in that it is a true flash cluster that transcends across your server nodes that will accelerate BOTH reads and writes.

INSTALLATION

I’ve done this more than a few times now but I must say it’s rather straight forward.

First you will need to install some flash in your servers. In the environment I worked on we used FusionIO PCI cards, but SSDs will work as well. How much flash should you use? It depends on your performance profile and objectives, but as a general starting point, about 10% of the total size of the dataset you wish to accelerate is a usually a good place to start.

Then you install PernixData FVP which is done in two steps. First there’s a component you install on your vCenter server which adds an additional database to track some new flash performance metrics. Once installed you can managed and view the flash cluster from the vSphere Client (including the vSphere Web Client as of FVP 1.5).

Screenshot_76

Managing the Flash Cluster from the vSphere Web Client (click to enlarge)

The second step is to install the FVP VIB (vSphere Installation Bundle) on each ESXi host. I must have installed and uninstalled the FVP VIB several dozen times by now and it’s quite easy – just a standard ESXCLI VIB install.

First put the ESXi host into maintenance mode (stopping any active I/O) and perform the install ( a single ESXCLI command) and exit maintenance mode, and repeat for all additional ESXi hosts in the cluster.

CONFIGURATION

Once you define and create the flash cluster, you can designate policy by datastore or VM. The two policies are write-though and write-back. With a write-through policy you are only using the flash cluster for reads – the most commonly used blocks as determined by efficient algorithms are maintained on the flash cluster for quick access. Not only does this reduce storage latency, but it reduces the IOPS load that your storage controller must process which should result in a performance improvement on the storage controller as well.

With the write-back policy writes are also processed by the flash cluster. Writes are written to the flash cluster (two nodes for failure tolerance) and are then de-staged back to the storage array as performance allows. The net result is that the commit time or latency from the application’s perspective is vastly reduced — incredibly important for write-intensive (i.e. OLTP) applications.

FVP_graph2

1 Day IOPS Chart for a database VM (click to expand)

The graph above shows a chart (from the vSphere Web Client) of a database server accelerated by PernixData FVP for the past day. The purple line shows the latency that is incurred at the storage controller level, but the blue line is what the VM or application “feels”. The orange line represents the latency to local flash which is measured in fractions of a millisecond. The distance between the purple and blue lines is latency that has been effectively removed from the application by PernixData FVP.

FVP_historicalAlso one nice feature about FVP is that it reminds you right in the vSphere client what it is doing for you.  In the environment I work on, it has saved almost 2 billion IOPS (pronounced “Beeeeeelion”) and 87TB of storage traffic just in the past 25 days.

Nitrus Oxide For Your Storage Array

In review, now you can see why I say PernixData FVP is much like adding Nitrus Oxide to a car (and of course being legal). You don’t have to buy a new car – you can just make the one you already have faster. And if you buy a new car (or storage array) you can still use your server-side flash cluster to accelerate it.

Much of what makes PernixData FVP special is the clustered file system that enables it to quickly and efficiently process writes to multiple hosts at once. This capability makes PernixData FVP a great fit for write-intensive transactional applications for which latency is key. Or maybe you have an array with slower SATA disk and you might find it more cost effective to simply accelerate it rather than getting a new storage array. Either way adding a server-side flash cluster to your vSphere cluster will significantly boost your performance. The DBA team in this environment has seen the time duration on some batch jobs decrease by over 900%.

What’s Next?

PernixData isn’t done yet. Their next release will include the following features:

  • RAM (memory) as a storage tier
  • NFS Support
  • Network Compression (reducing replication throughput)
  • Topology Aware Replica Groups (control over the hosts used for DR and/or performance considerations).

The biggest feature there is RAM support. That’s right, you’ll be able to skip the flash if you prefer and use the RAM in your host servers as your clustered read and write cache. Just buy your host servers with the extra RAM capacity you want to use as cache and add FVP. And because memory is close to the CPU it should be quite fast. I’m looking forward to testing this capability when it comes out of beta and I’ll try to follow up with a post on that experience when the time comes.

The addition of network compression should also reduce the amount of data to be transmitted. ESXi already compresses memory pages because even with the CPU overhead it will increase performance by reducing swapping. FVP is using the same concept here to reduce the amount of data that has to be transmitted across the cluster.

In summary I found PernixData FVP a pleasure to use. It’s not difficult to install and it decouples most of the performance pain away from the storage controller and onto the server-side flash cluster (or RAM cluster in the next release). But the best result was seeing the impact on database performance and transaction times. If you have a write-intensive application that can benefit from server-side caching (not just reads but writes too!) then you owe it to yourself to take a look at PernixData FVP. I’ll be taking another look when 2.0 becomes available.

Monitoring Storage Elements with LSI Controllers in ESXi

Cisco UCS servers have made quite an impact in the market and are currently #1 in blades.  Most UCS Servers don’t use any local storage beyond maybe booting ESXi from an SD card.  But what if you had a use case where you needed to use direct attached storage? Not a common use case today, but VMware VSAN is likely to change that.

The problem I encountered is that ESXi in UCS servers would not report health for storage elements to ESXi.  Cisco UCS servers use LSI controllers and we were completely blind to events like a hard drive failure, RAID rebuild, predictive failure and so forth. The use case here was a single UCS-C server with direct-attached storage which hasn’t been a common use case until just now with VMware VSAN.

Using different combinations of drivers blessed by VMware and Cisco I was unable to get physical drive and controller health to report in ESXi. I did my due diligience on a few Google searches but was unable to find any solution.

Then I went on the LSI website to look at the available downloads and something caught my eye — an SMI-S provider for VMware. I remembered that SMI-S is basically CIM, which is what ESXi uses to collect health information. This is a separate VIB that is independent of the megaraid_sas driver in ESXi.  With the SMI-S provider installed in ESXi suddenly I could see all the things that were missing in the health section such as:

  • Controller health
  • Battery health
  • Physical drive health
  • Logical drive health

Screenshot_50Basically the moral of the story is this — if you have an LSI array controller (common in UCS-C) then you’ll need to follow these steps to get health monitoring on your storage elements:

1) Go to LSI’s website and download the current SMI-S provider for VMware for your card.

2) Upload the VIB file to a VMFS datastore

3) From an SSH shell type “esxcli software vib install -v [full path to vib file]”

4) Reboot

I’m not clear on why this capability is not exposed by the driver, but it seems for the time being that installing this additional VIB is required to get ESXi to monitor the health of storage elements on LSI controllers.

Hope some will find this valuable.

vSphere 6.0 Public Beta — Sign Up to Learn What’s New

vSphere 6.0 Public Beta — Sign Up to Learn What’s New

Yesterday, VMware announced the public availability of vSphere 6.0 Beta 2.  I can’t tell you what’s all in it due to the NDA, but you can still register for the beta yourself, read about what’s new and download the code for your home lab.

There’s some pretty exciting stuff being added to vSphere 6.0 in quite a few areas.  One of these new areas is vVols — a new abstraction for volumes that enables tighter integration with storage arrays through the VASA API. You can read more about vVols in vSphere 6.0 on Rawlinson’s post.

One more thing — after you sign up for the beta you will be able to attend the following two webinars on the vSphere 6.0 beta

  • Introduction / Overview – Tuesday, July 8, 2014
  • Installation & Upgrade – Thursday, July 10, 2014

Needless to say there’s some pretty awesome stuff in the 6.0 Beta.  Start your download engines!

https://communities.vmware.com/community/vmtn/vsphere-beta

 

Nimble Storage Revisited: The CS700 and Adaptive Flash

Back in 2010 I noticed with this blog post the entry of Nimble Storage into the storage market. With their release of their new CS700 line and what they call Adaptive Flash, I figured it was a good time for a second look.

CASL Architecture

Before we look at the new offerings a quick refresh on Nimble Storage’s CASL architecture would be in order. CASL stands for Cache Accelerated Sequential Layout and Nimble describes the key functions here:

CASL collects or coalesces random writes, compresses them, and writes them sequentially to disks.

Nimble states that this this approach to writes can be “as much as 100x faster” than traditional disks.  The image below is a bit fuzzy, but if you click to expand it should be readable.

Screenshot_12

CASL Features (click to enlarge)

It is important to note that both the compression and the automated storage tiering to flash is inline (no post-process or bolt-ons) which adds additional efficiencies. Also features such as snaps, data protection, replication and zero-copy clones are included.

For more details on CASL (including a 75 minute video deep dive) visit Nimble Storage’s CASL page here: http://www.nimblestorage.com/products/architecture.php

New Offering: CS-700

The CS700 is the new model which features Ivy Bridge processors, 12 HDDs and 4SSDs for a hybrid storage pool Nimble claims is up to 2.5x faster than previous models, with up to 125K IOPS from just one shelf.cs700

Now you can buy expansion shelves for the CS700 including an All-Flash shelf and this is where something called “Adaptive Flash” kicks in. The All-Flash shelves host up to 12.8TB of flash each in a 3U shelf and are used exclusively for reads.nimbleflashThe product materials on Adaptive Storage I found to be a bit light on technical details but from what I can discern some of the secret sauce is provided by a back-end cloud engine.

Nimble Storage has a robust “phone home” feature called InfoSight which sends health, configuration and utilization information to cloud services for analysis. Several vendors do this, but the twist here seems to be that they are using the resources of the cloud based engine to “crunch” your utilization data and send guidance back to your controllers on how they should be leveraging the flash tier. In summary the big idea here seems to be that leveraging greater computing resources “big data” style in the cloud can make better decisions on cache allocation and tuning that the controllers themselves.

The Big Picture

Nimble uses a scale-out architecture to scale out storage nodes into clusters. Nimble Storage claims that a four (4) node cluster with Adaptive Flash and support a half-million IOPS.

Below is a table (created by Nimble Storage) which position the CS700 in a 4-node cluster against EMC’s VNX7600 with ExtremeIO. I’d like to see an independent comparison but it appears Nimble Storage may be on to something with this architecture.

nimble_vs_vnx_and_xtremio

All-Flash arrays are nice but they aren’t the only game in town. Nimble Storage seems to have a compelling story around a hybrid solution which is driven by both controller software, as well as back-end software hosted on cloud services.

Patch Available for NFS APD issue on ESXi 5.5 U1

There is an issue with using NFS on ESXi 5.5 U1 where intermittent APDs (All Paths Down) conditions occur which can disrupt active workloads. The KB for the issue is here.

Patch 5.5 E4 was released on June 10 which fixes this issue.  The patch can be obtained here and the KB for the patch is here.

Will VMware Start Selling Hardware? Meet MARVIN

Will VMware Start Selling Hardware? Meet MARVIN

UPDATE:  After a Twitter discussion this morning with Christian Mohn ( @h0bbel — see his MARVIN post here ) I think we are in agreement on MARVIN may be.  This post has been updated accordingly.

The Register is running a story that VMware is preparing to launch a line of hardware servers leveraging vSphere and VSAN:

Evidence for MARVIN’s existence comes from two sources.

One is this trademark filingdescribing MARVIN as “Computer hardware for virtualization; computer hardware enabling users to manage virtual computing resources that include networking and data storage”.

The second source is the tweet below, which depicts a poster for MARVIN on a VMware campus.

BpgDQmDCYAESs-A

If this pans out to be true it would be a very interesting development indeed.  It is important to note that that the trademark specifically says that MARVIN is “hardware”. But will it be VMware’s hardware?  As Christian pointed on his post it would go against VMware’s DNA to sell it’s own hardware.  But EMC — VMware’s majority owner — already has VSPEX — a confederated hardware offering from multiple OEMs but purchased through EMC.  It seems more plausible that VMware would leverage a VSPEX-like model and utilize Dell, Cisco, SuperMicro, etc. hardware for MARVIN.  What VMware really needs is a way to sell converged infrastructure nodes as one SKU (mitigate design risk) and one point of support — a VSPEX-like model for MARVIN would accomplish exactly this without VMware actually selling their own hardware.

MARVIN at first glance would also seem to be a validation of the Nutanix model — build a scale-out storage solution and sell as boxes that include the full stack.  That’s not an apples to apples comparison and it’s not my intent to split hairs here, but one of the attractive things about the Nutanix model is that “you just buy a box”.  By combining VMware VSAN with vSphere and hardware, VMware can offer a scale-out modular solution where customers just need to “buy a box” as well.

Of course its possible to build your own VSAN-enabled vSphere cluster using hardware of your choice from the HCL, but as noted with some recent issues there’s some risk in not selecting the optimal components. By offering a complete IaaS stack as a modular hardware unit, this eliminates the “design risk” for the customer and enables more support options.

One more thing to keep in mind.  EMC recently acquired DSSD with the goal of developing persistent storage that sits in the memory bus, therefore closer to the CPU. It wouldn’t surprise me to see this introduced in future editions as well.

This could be an interesting development.  What are your thoughts about the potential entry of VMware into the hardware market?

Also what could MARVIN stand for?  How about…

Modular ARray of Virtualization Infrastructure Nodes?

Be Mindful of vSphere Client Versions When Working with OVAs

A colleague of mine was working with an OVA had used several times before.  After upgrading ESXi with the Heartbleed patches (5.5 Update 1a) he found that he received a generic connection failed error when uploading the OVA.

(Note:  in this drill there is no vCenter but we were connecting directly to the ESXi host)

I noticed that his vSphere client was a slightly older build — pre Heartbleed but new enough that it would appear to work fine with the 5.5 host. Knowing that Heartbleed is about SSL I recommended that he update the vSphere client to the same build that was released with the Heartbleed patch.  This changed the error but did not fix the problem. Not sure what the exactly underlying issue is but the existing OVAs (that were created with 5.5) could no longer be deployed.

Using the latest vSphere client he tried exporting the source VMs into a new OVA and was able to import with no issues.

I’m not sure of the exact interaction but I’m assuming that the OVAs are signed with the private key and that somehow the Heartbleed patch “breaks” some interaction here such that the OVA is not accepted. Perhaps there will be a KB on this in the future but for the time being make sure that you have the latest build of the vSphere Client when creating and importing OVAs.

Why vCenter Log Insight is a “Must Have” for vSphere Environments

I recently took VMware’s vCenter Log Insight (2.0 beta) for a test drive and I was impressed at the time-to-value as well as the benefits relative to cost. Before I get started, I’d like to step back a bit and look at vSphere monitoring and explore the benefits of log monitoring.

UPDATE 6-11-2014:  vCenter Log Insight 2.0 is now GA and has been released!

Monitoring vSphere with vCenter

vCenter out of the box does a great job of monitoring the vast majority of the things you’d want to know about. Hardware failures, datastore space, CPU/Memory utilization, failed tasks and so on. But chances are that on more than one occasion you had to peruse through ESXi host logs and/or vCenter log files to either find more detail or perhaps discover errors for conditions that vCenter doesn’t report on.

For example are you seeing SCSI errors or warnings? Path failures or All Paths Down (APD) errors? Any unauthorized intrusion attempts? Are API calls timing out? Is one host logging more errors than others? The bottom line is that for full holistic monitoring of a vSphere environment, log monitoring is a required element. The traditional problem here is time – SSH into a host at a time as needed and manually peruse the log files? There needs to be a better way.

Splunk is a popular option for log monitoring as it has the capability to ingest logs from multiple sources so that you can correlate events and/or time frames across multiple devices. There is a vSphere app for Splunk which I understand works fairly well, however once of the issues seems to be cost. As ESXi and vCenter logs can create large amounts of logs, this increases costs as Splunk is usually priced around the volume of log data that is ingested.

Enter vSphere Log Insight

vSphere Log Insight is designed for vSphere environments and list pricing starts at $250 per device (a device being an ESXi host, a vCenter Server, SAN, switch/router, firewall, etc.).

I decided to download the beta of Log Insight 2.0 and give it a spin. It’s simply a pre-built virtual appliance that you import as an OVA. Once I had the appliance running I logged into the website and added details and credentials to access the vCenter server. Within 30 minutes of downloading I was exploring the interface which was now collecting logs from vCenter and all the ESXi hosts defined within it.

One of the first things I noticed was the clean, fast and snappy HTML5 based interface. Compared to the flash based vCenter Web client it’s hard to not notice the difference (which increases my anticipation of the next vSphere release which I hope to have an HTML5 based interface).

Out the box, Log Insight comes with dashboards and content packs for both vSphere and vCenter Operations Manager (vCOPS). In the image below you will see on the left pane several dashboard views that can be selected within the vSphere pack. In the main window, one can click on and point in time on the top graph, an element of the pie chart, or even the “has results” of one of the queries and be instantly taken to an “Interactive Analytics” view where you can view the log events in detail (click image to expand).

LI_4

vCenter Overview Dashboard in Log Insight 2.0 (beta)

If you were on the “Storage – SCSI Latency Errors” screen for example you’d see bar graphs for SCSI errors by device, path and host to quickly identify anomalies, as well as some pre-built queries as shown below.  Clicking on any “Has Results” text will take you to a drill down view of the events that match the query.

LI_3

The next day we ran into an issue were a certain VM failed to vMotion to another host. I logged into vCenter Log Insight, selected the “vCenter Server – Overview” tab, set the time range to “past 5 minutes”, and instantly identified the time interval of the failure. I clicked on it and in a blink I was looking at all the relevant log entries. It literally took me seconds to log in and get to this point – a huge time saver!

But wait, there’s more!

vCenter Log Insight is at it’s core a SYSLOG engine. While it is designed to immediately exploit vSphere log elements it can also be used for SANs, switches, firewalls and more. If you browse the Soltuion Exchange you will see that content packs already exist for NetApp, HyTrust, VCE, Cisco UCS, vCAC, Brocade, EMC VNX, Puppet and more. In summary you can point Log Insight at anything that outputs logs with a growing library of content packs to provide even more value.

The Bottom Line

The bottom line is that if you want to see everything going on in your vSphere environment you need to be looking at logs. Log Insight can be used to create alarms as well vastly expedite the process to peruse through log files from multiple sources to see what is going on.

I was impressed in how easy it was to deploy and in how quickly we received almost immediate value from it. At a list price of $250 per device (per year) it seems like a no-brainer for many mission-critical vSphere environments.

vCenter Log Insight 1.0 is available today, but if you’re evaluating, give the beta of Log Insight 2.0 a try.

Also take a look at the following whitepapers:

End Your Data Center Logging Chaos with VMware vCenter Log Insight

VMware vCenter Log Insight Delivers Immediate Value to IT Operations