Storagebod Rotating Header Image

A Ball of Destruction…

I’m not sure that EMC haven’t started an unwelcome trend; I had a road-map discussion with a vendor this week where they started to talk about upcoming changes to their questioning ‘but surely that’s not just a disruptive upgrade but destructive?’ was met with an affirmative. Of course like EMC; the upgrade would not be compulsory but probably advisable.

The interesting thing with this one is that it was not a storage hardware platform but a software defined storage product. And we tend to be a lot more tolerant of such disruptive and potentially destructive upgrades. Architecturally as we move to more storage as software as opposed to being software wrapped in hardware; this is going to be more common and we are going to have design infrastructure platforms and applications to cope with this.

This almost inevitably means that we will need to purchase more hardware than previously to allow us to build zones of availability to allow upgrades to core systems to be carried out out as non-disruptively as possible. And when we start to dig into the nitty-gritty; we may find that this starts to push costs and complexity up…whether these costs go up so much that the whole commodity storage argument starts to fall to pieces is still open to debate.

I think for some businesses it might well do; especially those who don’t really understand the cloud model and start to move traditional applications into the cloud without a great deal of thought and understanding.

Now this doesn’t let EMC off the hook at all but to be honest; EMC have a really ropey track-record on non-disruptive upgrades in the past…more so than most realise. Major Enginuity upgrades have always come with a certain amount of disruption and my experience has not always been good; the levels of planning and certification required has kept many storage contractors gainfully employed. Clariion upgrades have also been scary in the past and even today, Isilon upgrades are no-where as near as clean as they have you believe.

EMC could have of course got away with the recent debacle if they’d simply released a new hardware platform and everyone would have accepted that this was going to involve data-migration and move data around.

Still, the scariest upgrade I ever had was an upgrade of an IBM Shark which failed half-way and left us with one node at one level of software and one at a different level. And IBM scratching their heads. But recently, the smoothest upgrades have been even elephants can learn to dance.

As storage vendors struggle with a number of issues; including the setting of the sun on traditional data protection schemes such as RAID; I would expect the number of destructive and disruptive upgrades to increase. And the marketing spin around them from everyone to reach dizzying heights. As vendors manipulate the data we are storing in more and more complex and clever ways; the potential for disruption and destructive upgrades is going increase.

Architectural mistakes are going to be made; wrong alleys will be followed…Great vendors will admit and support their customers through these changes. This will be easier for those who are shipping software products wrapped with hardware; this is going to be much harder for the software-only vendors. If a feature is so complex that it seems magic; you might not want to use it…I’m looking for simple to manage, operate and explain.

An argument for Public Cloud? Maybe, as this will take the onus away from you to arrange. Caveat Emptor though and this may just mean that disruption is imposed upon you and if you’ve not designed your applications to cope with this…Ho hum!






  1. Love the perspective – nice post

  2. alpharob says:

    ” including the setting of the sun on traditional data protection schemes such as RAID;”

    Not so sure about that one. I want a chunk of data protected three-ways in a box.
    That’s RAID6 or equiv. Even Moshe and crews’ latest and greatest is RAID6 equiv

    under the covers. RAID6 from here on out. Triple parity?!? Bah.

    “15 minute disk rebuilds with RAID6-equivalent data protection.”

  3. chadsakac says:

    Disclosure – EMCer here.

    Thanks for the post and adding to the dialog.

    Interestingly, ‘Bod – even in the largest public clouds, disruptive upgrades happen (Xen update in a percentage of AWS EC2 instances:

    Should people hold their partners up to a high standard of lessening any and all disruption to them (which can be in upgrades, it can be in failing to meet performance/functional requirements) – ABSOLUTELY.

    Persistence stacks are tricky.

    The only way to insulate oneself completely is at the application layer in the stack.

    1. storagebod says:

      Definitely…application layer is where a lot of this belongs now. Of course; these sort of non-functional-requirements aren’t very sexy and developers aren’t often interested in them…

  4. alpharob says:

    ” I would expect the number of destructive and disruptive upgrades to increase.”

    There is no excuse for this. Data is as important as it gets in IT. The fact that V7000 upgrades are so smooth? I would speculate that a lot of testing took place. Why in the world wouldn’t you bang away at that upgrade code? Stupidity?

  5. John says:

    Good point here. Previously vendors have used hardware refresh cycles to do this (VNXe to VNXe gen2 a great example of this, as they have massively redone the FLARE/DART layering, chunk size, and all kinds of things, like making controller failover be functionally useful). A LOT OF disruptive upgrades have been hidden this way.

    Also think of the downtime required with Type 1 Scale up system where you have to swap out controller heads.

    The difference with a software storage system is a couple (I recently switched our core systems to VSAN). Even with a disruptive migration, you can roll one node at a time, and as we’ve gone farther into the commodity side (SuperMicro now has 4 hour parts replacement) so the costs associated with going N+2 to help with things like this make it a lot easier. I so that the core IO services can focus on performance and availability and thats it. At least as a Type 3 loosely coupled system the ability to roll nodes in and out non-disruptively is possible.

    I was freaked out by EMC announcements but the thing that I forget with ExtremeIO is it is not a true scale out loosely coupled system. It is more like VMAX in that it is tightly coupled global cache (why it requires RDMA) and as a Type 2 and not a type 3, doing rolling upgrades with major underlying changes is going to be a lot harder.

Leave a Reply

Your email address will not be published. Required fields are marked *