Storagebod Rotating Header Image

Big Ideas

5 Minutes

One of the frustrations when dealing with vendors is actually getting real availability figures for their kit; you will get generalisation,s like it is designed to be 99.999% available or perhaps 99.9999% available. But what do those figures really mean to you and how significant are they?

Well, 99.999% available equates to a bit over 5 minutes of downtime and 99.9999% equates to a bit over 30 seconds downtime over a year. And in the scheme of things, that sounds pretty good.

However, these are design criteria and aims; what are the real world figures? Vendors, you will find are very coy about this; in fact, every presentation I have had with regards to availability are under very strict NDA and sometimes not even notes are allowed to be taken. Presentations are never allowed to be taken away.

Yet, there’s a funny thing….I’ve never known a presentation where the design criteria are not met or even significantly exceeded. So why are the vendors so coy about their figures? I have never been entirely sure; it may be that their ‘mid-range’ arrays display very similar real world availability figures to their more ‘Enterprise’ arrays…or it might be that once you have real world availability figures, you might start ask some harder questions.

Sample size; raw availability figures are not especially useful if you don’t know the sample size. Availability figures are almost always quoted as an average and unless you’ve got a real bad design; more arrays can skew figures.

Sample characteristics; I’ve known vendors when backed into a corner to provide figures do some really sneaky things; for example, they may provide figures for a specific model and software release. This is often done to hide a bad release for example. You should always try to ask for the figures for the entire life of a product; this will allow you to judge the quality of the code. If possible as for a breakdown on a month-by-month basis annotated with the code release schedule.

There are many tricks that vendors try to pull to hide causes of downtime and non-availability but instead of focusing on the availability figures; as a customer, it is sometimes better to ask different specific questions.

What is the longest outage that you have suffered on one of your arrays? What was the root cause? How much data loss was sustained? Did the customer have to invoke disaster recovery or any recovery procedures? What is the average length of outage on an array that has gone down?

Do not believe a vendor when they tell you that they don’t have these figures and information closely and easily to hand. They do and if they don’t; they are pretty negligent about their QC and analytics. Surely they don’t just use all their Big Data capability to crunch marketing stats? Scrub that, they probably do.

Another nasty thing that vendors are in the habit of doing is forcing customers to not disclose to other customers that they have had issues and what they were. And of course we all comply and never discuss such things.

So 5 minutes…it’s about long enough to ask some awkward questions.

Defined Storage…

Listening to the ‘Speaking In Tech’ podcast got me thinking a bit more about the software-defined meme and wondering if it is a real thing as opposed to a load of hype; so for the time being I’ve decided to treat it as a real thing or at least that it might become a real thing…and in time, maybe a better real thing?

So Software Defined Storage?

The role of the storage array seems to be changing at present or arguably simplifying; the storage array is becoming where you store stuff which you want to persist. And that may sound silly but basically what I mean is that the storage array is not where you are going to process transactions. Your transactional storage will be as close to the compute as possible or at least this appears to be the current direction of travel.

But there is also a certain amount of discussion and debate about storage quality of service, guaranteed performance and how we implement it.

Bod’s Thoughts

This all comes down to services, discovery and a subscription model. Storage devices will have to publish their capabilities via some kind of API; applications will use this to find what services and capabilities an array has and then subscribe to them.

So a storage device may publish available capacity, IOP capability, latency but it could also publish that it has the ability to do snapshots, replication, thick and thin allocation. It could also publish a cost associated with this.

Applications, application developers and support teams might make decisions at this point what services they subscribe to; perhaps a fixed capacity and IOPs, perhaps take the array-based snapshots but do the replication at an application layer.

Applications will have a lot more control about what storage they have and use; they will make decisions whether certain data is pinned in local SSD or never gets anywhere near the local SSD; whether it needs sequential storage or random access..It might have it’s RTO and RPO parameters; making decisions about what transactions can be lost and which need to be committed now.

And this happens, the data-centre becomes something which is managed as opposed to the siloed components.

I’ve probably not explained my thinking as well as I could do but I think it’s a topic that I’m going to keep coming back to over the months.

 

 

 

Enterprising Marketing

I love it when Chuck invents new market segments, ‘Entry-Level Enterprise Storage Arrays’ appears to be his latest one; he’s a genius when he comes up with these terms. And it is always a space where EMC have a new offering.

But is it a real segment or just m-architecture? Actually, the whole Enterprise Storage Array thing is getting a bit old and I am not sure whether it has any real meaning any more and it is all rather disparaging to the customer. You need Enterprise, you don’t need Enterprise…you need 99.999% availability, you only need 99.99% availability.

As a customer, I need 100% availability; I need my applications to be available when I need them. Now, this may mean that I actually only need them to be available an hour a month but during that hour I need them to be 100% available.

So what I look for vendors is the way that they mitigate against failure and understand my problems but I don’t think the term ‘Enterprise Storage’ brings much value to the game; especially when it is constantly being misused and appropriated by the m-architecture consultants.

But I do think it is time for some serious discussions about storage architectures; dual-head, scale-up architectures vs multiple-head, scale-out architectures vs RAIN architectures; understanding the failure modes and behaviours is probably much more important than the marketing terms which surround them.

EMC have offerings in all of those spaces; all at different cost points but there is one thing I can guarantee, the ‘Enterprise’ ones are the most expensive.

There is also a case for looking at the architecture as a whole; too many times I have come across the thinking that what we need to do is make our storage really available, when the biggest cause of outage is application failure. Fix the most broken thing first; if your application is down because it’s poorly written or architected, no amount of Enterprise anything is going to fix it. Another $2000 per terabyte is money you need to invest elsewhere.

Just How Much Storage?

A good friend of mine recently got in contact to ask my professional opinion on something for a book he was writing; it always amazes me that anyone asks my professional opinion on anything…especially people who have known me for many years but as he’s a great friend, I thought I’d  try to help.

He asked me how much a petabyte of storage would cost today and when I thought it would affordable for an individual? Both parts of the question are interesting in their own way.

How would a petabyte of storage cost? Why, it very much depends; it’s not as much as it cost last year but not as a cheap as some people would think. Firstly, it depends on what you might want to do with it; capacity, throughput and I/O performance are just part of the equation.

Of course then you’ve got the cost of actually running it; 400-500 spindles of spinning stuff takes a reasonable amount of power, cooling and facilities. Even if you can pack it densely, it is still likely to fall through the average floor.

There are some very good deals to be had mind you but you are still looking at several hundred thousand pounds, especially if you look at a four year cost.

And when will the average individual be able to afford a petabyte of storage? Well without some significant changes in storage technology; we are some time away from this being feasible. Even with 10 Terabyte disks, we are talking over a hundred disks.

But will we ever need a petabyte of personal storage? That’s extremely hard to say; I wonder if we will we see the amount of personal storage peak in the next decade?

And as for on-premises personal storage?

That should start to go into decline, for me it is already beginning to do so; I carry less storage around than I used to…I’ve replaced my 120Gb iPod with a 32 Gb phone but if I’m out with my camera, I’ve probably got 32Gb+ of cards with me. Yet with connected cameras coming and 4G (once we get reasonable tariffs), this will probably start to fall off.

I also expect to see the use of spinning rust go into decline as PVRs are replaced with streaming devices; it seems madness to me that a decent proportion of the world’s storage is storing redundant copies of the same content. How many copies of EastEnders does the world need to be stored on a locally spinning drive?

So I am not sure that we will get to a petabyte of personal storage any time soon but we already have access to many petabytes of storage via the Interwebs.

Personally, I didn’t buy any spinning rust last year and although I expect to buy some this year; this will mostly be refreshing what I’ve got.

Professionally, looks like over a petabyte per month is going to be pretty much run-rate.

That is a trend I expect to see continue; the difference between commercial and personal consumption is going to grow. There will be scary amounts of data around about you and generated by you; you just won’t know it or access it.

2013 – The Year of Stew!

Bubble, bubble…there’s lots of things bubbling away in the storage pot at the moment and it appears to be almost ready to serve. Acquisitions are adding ingredients to the stew and we will see a spate in early 2013 as well; the fleshing out of the next generation of storage arrays will continue.

Yes, we will see some more tick-tock refreshes; storage roadmaps have become tied to the Intel/AMD roadmap as they have become commoditised.  More IOPs, more capacity and more features that you will possibly never use. And the announcements will make you yawn, certainly the roadmap presentations that I have had are not exactly stimulating.

It is the Flash announcements and finally shipping product that will generate the most interest; latency, long the enemy of performance and utilisation will be slain or at least have GBH visited upon it.

The question is going to be how to implement Flash and the options are going to be myriad; there is going to be significant focus on how to bring this low-latency device closer to the server. I would expect to see an explosion in Cache devices both in the server but also in appliance format.

And we will finally see some all-Flash arrays starting to ship from the big boys; this will bring credibility to some of the smaller players. It is easier to compete with something than trying to introduce a completely new class of array.

But I think the really interesting stuff is going to be happening in the file-system space; Ceph will grow in maturity and with OpenStack gaining traction, expect this to mature fast. This is going to force some of the object storage vendors to move away from their appliance model and also encourage some more mature vendors to look at their file-systems and see them as potential differentiators.

CDMI also appears to be actually beginning to happen; I have been very sceptical about this but the number of vendors who are beginning to ship CDMI-compatible product is gaining momentum.

Another trend I am seeing is the deployment of multiple storage solutions within a data-centre; few people are currently standardising, there’s a lot of experimentation and there is an acknowledgement that one size really does not fit all.

Expect a lot of pain as infrastructure teams try to make things just work; Dev-Ops teams will continue to forge ahead, traditional infrastructure teams will be playing catch-up until new ways of working can be put in place.  This is not one way traffic though; expect some fun and games in 2014/2015 as some chickens come home to roost.

Management tools are going to be big again….expect lots of attempts to build single-pane of glass management tools which cater for everything. APIs and Automation will be held up as some kind of universal magic toolset; expect that cauldron to bubble over and cause a mess as the Sorceror gets more apprentices who try to short-cut.

I see a year of fun and change….and some tasty bowls of nourishment with some really soggy horrible dumplings floating about.

End Of Year Thoughts

I constantly wonder at hype and how it takes hold; Storage seems to be especially vulnerable to this at present, we are under constant bombardment that this technology or that technology will make our lives easier, our businesses more profitable and the world a better place. From the use of SSDs to the deployment of big data analytics for marketing; it is a barrage not seen in IT since the claims that 4GL languages were going to make every programmer in the world redundant.

Every year is going to be the year of the Cloud, Big Data, BYOD, VDI; every year it seems to be the year of something new but also the year of last year’s product; Product Will Eat Itself, every meme perpetuating another meme.

Analysts struggle to make sense and come up with meaningless tools to demonstrate that they know even less about the real world than you could possibly have thought; Magic Quadrants, Points of Proof, Hype-Cycles and Fluffy Clouds are used to try to influence people and con the influencers.

More and more products come to the market and vanish to all intents and purposes. Some find homes in niches allowing the vendor to claim some kind of success. Just don’t prod and poke too hard. There are simply too many start-ups in storage at the moment to make much sense of the market; many with the same USP.

Yet how many of us make time to talk properly to the vendors and give them some honest feedback? And how often is that feedback well received? Unfortunately we are all too nice in general and possibly afraid of upsetting someone. This will probably surprise many of the vendors out there but many of us do hold back (there are some grumpy exceptions).

Some vendors are very good at getting out and talking at the C-level; I’d like to see more vendors getting out and talking with the levels below. I’d like 2013 to be the year that you solve some of my problems and not just the CIO’s…because if you do, you might just find that I have time to implement this year’s product and may be buy some more product instead of bitching about the maintenance costs of last year’s product.

Flash is dead but still no tiers?

Flash is dead; its an interim technology with no future and yet it continues to be a hot topic and technology. I suppose I really ought to qualify the statement, Flash will be dead in the next 5-10 years and I’m really thinking about the use of Flash in the data-centre.

Flash is important as it is the most significant improvement in storage performance since the introduction of the RAMAC in 1956; disks really have not improved that much and although we have had various kickers which have allowed us to improve capacity, at the end of the day they are mechnical devices and are limited.

15k RPM disks are pretty much as fast as you are going to get and although there have been attempts to build faster spinning stuff,; reliability, power and heat have really curtailed these developments.

But we now have a storage device which is much faster and has very different characteristics to disk and as such, this introduces a different dynamic to the market. At first, the major vendors tried to treat Flash as just another type of disk; then various start-ups questioned that and suggested that it would be better to design a new array from the ground-up and treat Flash as something new.

What if they are both wrong?

Storage tiering has always been something that has had lip-service paid to but no-one has ever really done it with a great deal of success. And when you had spinning rust; the benefits were less realisable, it was hard work and vendors did not make it easy.  They certainly wanted to encourage you to use their more expensive Tier 1 disk and moving data around was hard.

But Flash came along and with an eye-watering price-point; the vendors wanted to sell you Flash but even they understood that this was a hard-sell at the sort of prices they wanted to charge. So, Storage Tiering became hot again; we have the traditional arrays with Flash in and the ability to automatically move data around the array. This appears to work with varying degrees of success but there are architectural issues which mean you never get the complete performance benefit of Flash.

And then we have the start-ups who are designing devices which are Flash only; tuned for optimal performance and with none of the compromises which hamper the more traditional vendors. Unfortunately, this means building silos of fast storage and everything ends up sitting on this still expensive resource. When challenged about this, the general response you get from the start-ups is that tiering is too hard and just stick everything on their arrays. Well obviously they would say that.

I come back to my original statements, Flash is an interim technology and will be replaced in the next 5-10 years with something faster and better. It seems likely that spinning rust will hang-around for longer and we are heading to a world where we have storage devices with radically different performance characteristics; we have a data explosion and putting everything on a single tier is becoming less feasible and sensible.

We need a tiering technology that sits outside of the actual arrays; so that the arrays can be built optimally to support whatever storage technology comes along. Where would such a technology live? Hypervisor? Operating System? Appliance? File-System? Application?

I would prefer to see it live in the application and have applications handle the life of their data correctly but that’ll never happen. So it’ll probably have to live in the infrastructure layer and ideally it would handle a heterogeneous multi-vendor storage environment; it may well break the traditional storage concepts of a LUN and other sacred cows.

But in order to support a storage environment that is going to look very different or at least should look very different; we need someone to come along and start again. There are a various stop-gap solutions in the storage virtualisation space but these still enforce many of the traditional tropes of today’s storage.

I can see many vendors reading this and muttering ‘HSM, it’s just too hard!’ Yes it is hard but we can only ignore it for so long. Flash was an opportunity to do something; mostly squandered now but you’ve got five years or so to fix it.

The way I look at it; that’s two refresh cycles; it’s going to become an RFP question soon.

 

 

 

 

Software Sucks!

Every now and then, I write a blog article that could probably get me sued, sacked or both; this started off as one of those and has been heavily edited as to avoid naming names…

Software Quality Sucks; the ‘Release Early, Release Often’ meme appears to have permeated into every level of the IT stack; from the buggy applications to the foundational infrastructure, it appears that it is acceptable to foist beta quality code on your customers as a stable release.

Having run a test team for the past few years has been eye-opening; by the time my team gets hands on your code…there should be no P1s and very few P2s but the amount of fundamentally broken code that has made it to us is scary.

And then also running an infrastructure team, this is beyond scary and heading into realms of terror and just to make things nice and frightening, every now and then, I ‘like’ to search vendor patch/bug databases for terms like ‘data corruption’, ‘data loss’ and other such cheery terms; don’t do this if you want to sleep well at night.

Recently I have come across such wonderful phenomena as a performance monitoring tool which slows your system down the longer it runs; clocks that drift for no explicable reason and can lock out authentication; reboots which can take hours; non-disruptive upgrades which are only non-disruptive if run at a quiet time; errors that you should ignore most of the time but sometimes they might be real; files that disappear on renaming; updates replacing a update which makes a severity 1 problem worse..even installing fixes seems to be fraught with risk.

Obviously no-one in their right minds ever takes a new vendor code release into production; certainly your sanity needs questioning if you put a new product which has less than two year’s GA into production. Yet often the demands are that we do so.

But it does lead me wondering, has software quality really got worse? It certainly feels that it has? So what are the possible reasons, especially in the realms of infrastructure?

Complexity? Yes, infrastructure devices are trying to do more; no-where is this more obvious than in the realms of storage where both capabilities and integration points have multiplied significantly. It is no longer enough to support the FC protocol; you must support SMB, NFS, iSCSI and integration points with VMware and Hyper-V. And with VMware on an 12 month refresh cycle pretty much, it is getting tougher for vendors and users to decide which version to settle on.

The Internet? How could this cause a reduction in software quality? Actually, the Internet as a distribution method has made it a lot easier and cheaper to release fixes; before if you had a serious bug, you would find yourself having to distribute physical media and often in the case of infrastructure, mobilising a force of Engineers to upgrade software. This cost money, took time and generally you did not want to do it; it was a big hassle. Now, send out an advisory notice with a link and  let your customers get on with it.

End-users? We are a lot more accepting of poor quality code; we are used to patching everything from our PC to our Consoles to our Cameras to our TVs; especially, those of us who work in IT and find it relatively easy to do so.

Perhaps it is time to start a ‘Slow Software Movement’ which focuses on delivering things right first time?

Why So Large?

One of the most impressive demonstrations I saw at SNW Europe was from the guys at Amplidata; on their stand, they had a tiny implementation of Amplistor with the back-end storage being USB memory-sticks. This enabled a quick and effect demonstration of their erasure encoding protect and the different protection levels on offer; pull one stick and both video streams kept working, pull another one and one stopped, the other kept playing.

It was a nice little demonstration of the power of their solution; well I liked it.

But it did start me thinking, why do we assume that object-stores should be large? Why do the start-ups really only target petabyte+ requirements? Certainly those who are putting together hardware appliances seem to want to play in that space?

Is there not a market for a consumer-level device? Actually, as we move to a mixed-tier environment even at the consumer level with SSD for application/operating system and SATA for content, this might start to make a lot of sense.

We could start to choose protection levels for content appropriate to the content; so we might have a much higher level of protection for our unique content, think photos and videos of the kids; we might even look at some kind of Cloud storage integration for off-site.

And then I started to think some more; is there not a market for an consumer device which talks NFS, SMB and S3? Probably not yet but there may well be in the future as applications begin to support things like S3 natively. I can see this playing especially well for consumers who use tablets as their primary computing device, many apps already talk to the various cloud storage providers and it is not a stretch to think that they might be able to talk to a local cloud/object store as well.

I have seen home NAS boxes which support S3 as a back-up target; actually another device that I saw at SNW which is more a SMB device than a home NAS supports a plethora of Cloud Storage options. The Imation Dataguard Data Protection Device looks very interesting from that point of view. So when will we see the likes of Synology, Drobo and competitors serve object storage and not just use it as a back-up target?

I think it will happen but the question is, will Microsoft, Apple etc beat the object storage vendors to the punch and integrate it into the operating system?

Good Enough Isn’t?

One of the impacts of the global slowdown has been that many companies have been focussing on services and infrastructure which is just good enough. For some time now, many of the mainstream arrays have been considered to be good enough. But the impact of SSD and Flash may change our thinking and in fact I hope it does.

So perhaps Good Enough Isn’t Really Good Enough? Good Enough is only really Good Enough if you are prepared to stagnate and not change; if we look at many enterprise infrastructures, they haven’t really changed that much in over the past 20 years and the thinking behind them has not changed dramatically. Even virtualisation has not really changed our thinking because still despite the many pundits and bloggers like me who witter on about service thinking and Business alignment; for many it is just hot-air.

There appears to be a lack of imagination that permeates our whole business; if a vendor turns up and says ‘I have a solution which can reduce your back-up windows by 50%’, the IT manager could think ‘Well, I don’t have a problem with my back-up windows; they all run perfectly well and everyone is happy…’. What they don’t tend to ask is ‘If my back-up windows are reduced by 50%, what can I do with the time that I have saved; what new service can be offered to the Business?’

Over the past few years, the focus has been on Good Enough; we need to get out of this rut and start to believe that we can do things better.

As storage people, we have been beaten up by everyone with regards to cost and yet I still hear it time and time again that storage is the bottle-neck in all infrastructures; time to provision, performance, and capacity; yet we are still happy to sit comfortably talking about ‘Good Enough Storage’.

Well, let me tell you that it isn’t ‘Good Enough’ and we need to be a lot more vocal in articulating why it isn’t and why doing things differently would be better; working a lot closer with our customers in explaining the impact of ‘Good Enough’ and letting them decide what is ‘Good Enough’.