Storagebod Rotating Header Image

Corporate IT

Storage is Interesting…

A fellow blogger has a habit of referring to storage as snorage and I suspect that is the attitude of many. What’s so interesting about storage, it’s just that place that you keep your stuff? And many years ago as an entry level systems programmer; there were two teams that I was never going to join…one being the test team and the other being the storage team, because they were boring. Recently I have run both a test team and a storage team and enjoyed the experience immensely.

So why do I keep doing storage? Well, firstly I have little choice but to stick to infrastructure; I’m a pretty lousy programmer and it seems that I can do less damage in infrastructure. If you ever received more cheque-books in the post from a certain retail bank, I can only apologise.

But storage is cool; firstly it’s BIG and EXPENSIVE; who doesn’t like raising orders for millions? It is also so much more than that place where you store your stuff; you have to get it back for starters. I think that people are beginning to realise that storage might be a little more complex than first thought; a few years ago , the average home user only really worried about how much disk that they had but the introduction of SSDs into the consumer market has hammered home how the type of storage matters and the impact it can have on the user experience.

Spinning rust platters keep getting bigger but for many, this just means that the amount of free-disk keeps increasing, the increase in speed is what people really want. Instant On..it changes things.

So even in the consumer market; storage is taking on a multi-dimensional personality; it scales both in capacity but also in speed. In the Enterprise; things are more interesting.

Capacity is obvious; how much space do you need? Performance? Well, performance is more complex and has more facets than most realise. Are you interested in IOPs? Are you interested in throughput? Are you interested in aggregate throughput or single stream? Are you dealing with large or small files? Large or small blocks? Random or sequential?

Now for 80% of use-cases; you can probably get away with taking a balanced approach and just allocating storage from a general purpose pool. But 20% of your applications are going to need something different and that is where it gets interesting.

Most of the time when I have conversations with application teams or vendors; when I ask the question as to what type of storage that they require, the answer comes back is generally fast. There then follows a conversation as to what fast means and whether the budget meets their desire to be fast.

If we move to ‘Software Defined Storage’, this could be a lot more complex than people think. Application developers may well have to really understand how their applications store data and how they interact with the infrastructure that they live on. If you pick the wrong pool,  your application performance could drop through the floor or the wrong availability level, you experience a massive outage.

So if you thought storage was snorage; most developers and people still do, you might want to start taking an interest. If infrastructure becomes code; I may need to get better a coding but some of you are going to have to get better at infrastructure. Move beyond fast and large and understand the subtleties; it is interesting…I promise you!

Snakebite….

So EMC have unveiled ViPR; their software defined storage initiative; like many EMC World announcements, there’s not a huge amount of detail, especially if you aren’t at EMC World. It has left many of blogger peers scratching their heads and wondering what the hell it is and whether it is something new.

Now like them, I am in that very same camp but unlike them, I am foolish enough to have a bit of guess and make myself look a fool when the EMCers descend on me and tell me how wrong I am.

Firstly, let me say what I think it isn’t; I really don’t believe that is a storage virtualisation product in the same way that SVC and VSP are. The closest EMC have to a product like this is VPLEX; a product which sits in the data-path and virtualises the disk behind it. This I don’t think is a product like this. Arguably these products are mis-named anyway; I think of these as Storage Federation products.

So that is what ViPR isn’t (and can I say that I really hate products with a mix of upper and lower case in their names!).

It is worth looking back in time to one of EMC’s most hated products (by me and many users); Control Center. I think ViPR might have some roots in ECC; to me it feels that someone has taken Control Center and turned it into a web-service; so instead of interacting by a GUI, you interact via the API.

And I wonder if that was how the control component of ViPR came about; when rewriting the core of ECC, I posit that it was abstracted away from the GUI component and perhaps some bright spark came along and thought…what if we exposed the core via an API?

Okay, it might not been of ECC and it could have been Unisphere but this seems a fairly logical thing to do. So perhaps the core of ViPR is nothing really that new, it’s just a change in presentation layer.

[Update: So a lot of the code came from Project Orion which Chad talks about here. So it has been kicking around in EMC for some time, this kind of programmable interface was being discussed and asked for at various ECC user-group/briefings prior to that.]

Then EMC have brought some additional third party arrays into the mix; NetApp seems to be the first one. Using IP that EMC picked up when they bought the UK company, WysDM; who had both a very nice backup reporting tool but also a NAS/Fileserver management tool?

Building additional third party support should be relatively simple using either their CLI or in some cases an exposed API.

So there you go, ViPR is basically a storage management tool without a GUI, or at least it is GUI optional. And with it’s REST API, perhaps you could build your own GUI or your own CLI? Or perhaps your development teams can get on and generally consume all the storage you’ve got but in a programmatic way.

It all seems pretty obvious and begs the question why no-one did this before? I think it might have been arrogance and complacency; this tool should make it easier to plug anyone’s storage into your estate.

But if this was all ViPR was; it’d be pretty tedious. Still EMC obviously read my blog and obviously read this and rapidly turned it into a product or perhaps they simply talk to lots of people too. If I’ve thought it, plenty of others had.

Object Storage has struggled to find a place in many Enterprises; it doesn’t lend itself to many applications and many developers just don’t get it. But for some applications it is ideal; it seems that it would better to have both Object and File Access to the same data, you probably don’t want store it twice either.

So yet again, it’s all about changing the presentation layer without impacting the underlying constructs. However unlike the more traditional gateways into an Object Store; EMC are putting a Object Gateway onto an NFS/SMB share (note to Chuck: call it SMB, not CIFS). Now this is almost certainly going to have to sit in the data-path for Objects. There will be some interesting locking/security model challenges and the like; simultaneous NFS/SMB and Object access is going to be interesting.

It will also require the maintenance of a separate metadata-store, something with a fast database to get that metadata out of. And perhaps EMC own some technologies to do this as well. A loosely coupled metadata store does bring some problems but it allows EMC to leverage Isilon’s architecture and also grab hold of data sitting on 3rd party devices.

[Update: Seems like EMC are using Cassandra as their underlying database. Whether it is Object on File or File on Object; not sure but whatever happens, it is allowing you access via Object or File.]

So ViPR is really at least two products; not one. So..perhaps it’s a Snakebite..

Question is…will it leave them and us lying in the gutter staring at the stars wondering why everyone is looking at us strangely?

 

Can Pachyderms Polka?

Chris’ pieces on IBM’s storage revenues here and here make for some interesting reading. Things are not looking great with the exception of XIV and Storwize products. I am not sure if Chris’ analysis is entirely correct as it is hard to get any granularity from IBM. But it doesn’t surprise me either; there are some serious weaknesses in IBM’s storage portfolio.

Firstly, there is still an awful lot of OEMed kit from NetApp in the portfolio; it certainly appears that this is not selling or being as sold as well as it was in the past. So IBM’s struggles have some interesting knock-on to NetApp.

IBM are certainly positioning the Storwize products in the space which was traditionally occupied by the OEMed LSI (now NetApp) arrays; pricing is pretty aggressive and places them firmly in the space occupied by other competing dual-head arrays. And they finally have a feature set to match their competitors, well certainly in the block space. .

XIV seems to compete pretty well when put up against the lower-end VMAX and HDS ‘enterprise-class’ arrays. It is incredibly easy to manage, performs well enough but is not the platform for the most demanding applications. But IBM have grasped one of the underlying issues with storage today; that is it all needed to be simplified. I still have some doubts about the architecture but XIV have tried to solve the spindle-to-gigabyte issue. There is no doubt in my mind that traditional RAID-5 and 6 are long term broken. If not today, very soon. The introduction of SSDs into the architecture appears to have removed some of the more interesting performance characteristics of the architecture. XIV is a great example of ‘good enough’.

So IBM have some good products from the low-end to the lowish-enterprise block space. Of course, there is an issue in that they seriously overlap; nothing new there though, I’ve never known a company compete against itself so often.

DS8K only really survives for one reason; that is to support the mainframe. If IBM had been sensible and had the foresight to do so; they would have looked at FiCon connectivity for SVC and done it. Instead IBM decided that the mainframe customers were so conservative that they would never accept a new product or at least it would have taken 10 years or so for them to do so. So now they are going to end-up building and supporting the DS8K range for another 10 years at least; if they’d invested the time earlier, they could be considering sunsetting the DS8K.

But where IBM really, really suffer and struggle is in the NAS space. They’ve had abortive attempts at building their own products;  they re-sell NetApp in the form of nSeries these days and also have SONAS/V7000-Unified. Well the nSeries is NetApp; it gets all of the advantages and disadvantages that brings i.e a great product whose best days seem behind it at present.

SONAS/V7000-Unified are not really happening for IBM; although built on solid foundations, the delivery has not been there and IBM really have no idea how to market or sell the product. There have been some quality issues and arguably the V7000-Unified was rushed and not thought all the way through. I mean who thought a two node GPFS cluster was ever a good idea for a production system.

And that brings me onto my favourite IBM storage product; GPFS. The one that I will laud to the hills; a howitzer of a product which will let you blow your feet off but also could be IBM’s edge. Yet in the decade and a bit that I have been involved with it; IBM almost never sells it. Customers buy it but really you have to know about it; most IBM sales would have no idea where to start and even when it might be appropriate.

At the GPFS User Group this week, I saw presentations on GPFS with OpenStack, Hadoop, hints of object-storage and more. But you will probably never hear an IBMer outside of a very select bunch talk about it. If IBM were EMC, you’d never hear them shut-up about it.

One of the funniest things I heard at the GPFS User Group were the guys who repurposed an Isilon cluster as a GPFS cluster. It seems it might work very well.

I personally think it’s about time that IBM open-sourced GPFS and put it into the community. It’s to good not too and perhaps the community could turn it into the core of a software-defined-storage solution to shake a few people. I could build half-a-dozen interesting appliances tomorrow.

Still I suspect like Cinderella, GPFS will be stuck in the kitchen waiting for an invite to the ball.

Object Paucity

Another year, another conference season sees me stuck on this side of the pond watching the press releases from afar, promising myself that I’ll watch the keynotes online or ‘on demand’ as people have it these days. I never find the time and have to catch up with the 140 character synopsis that regularly appear on Twitter.

I can already see the storage vendors pimping their stuff at NAB; especially the Object storage vendors who want to push their stuff. Yet, it still isn’t really happening….

I had a long chat recently with one of my peers who deals with the more usual side of IT; the IT world full of web-developers and the likes. He’d spent many months investigating Object Storage; putting together a proposition firmly targeted at the development community; Object APIs and the likes. S3 compatible, storage-on-demand built on solid technology.

And what has he ended up implementing? A bloody NFS/CIFS gateway into their shiny-new object storage because it turns outs what the developers really want is a POSIX file-system.

Sitting here on the broadcast/media side of the fence where we want gobs of storage provision quickly to store large objects with relatively intuitive metadata; we are finding the same thing. I’ve not gone down the route of putting in an Object storage solution because finding one which is supported across all the tools in today’s workflows is near impossible. So it seems that we are looking more and more to NFS to provide us with the sort of transparency we need to support complex digital workflows.

I regularly suggest that we put in feature requests to the tools vendors to at least support S3; the looks I generally get are one of quiet bemusement or outright hostility and mutterings about Amazon and Cloud.

Then again, look how long it has taken for NFS to gain general acceptance and for vendors to not demand ‘proper’ local file-systems. So give it 20 years or so and we’ll be rocking.

If I was an object storage vendor and I didn’t have my own gateway product; I’d be seriously considering buying/building one. I think it’s going to be a real struggle otherwise and it’s not the Operations teams who are your problem.

Me, I’d love for someone to put an object-storage gateway into the base operating system; I’d love to be able to mount an object-store and have it appear on my desktop. At least at that point, I might be able to con some of the tools to work with an object-store. If anyone has a desktop gateway which I can point at my own S3-like store, I’d love to have a play.

 

/dev/null – The only truly Petascale Archive

As data volumes increase in all industries and the challenges of data management continue to grow; we look for places to store our increasing data hoard and inevitably the subject of archiving and tape comes up.

It is the cheapest place to archive data by some way; my calculations currently give it a four-year cost something in the region of five-six times cheaper than the cheapest commercial disk alternative . However tape’s biggest advantage is almost its biggest problem; it is considered to be cheap and hence for some reason no-one factors in the long-term costs.

Archives by their nature live for a long-time; more and more companies are talking about archives which will grow and exist forever. And as companies no longer seem to be able to categorise data into data to keep and data not to keep; exponential data-growth and generally bad data-management; multi-year, multi-petabyte archives will eventually become the norm for many.

This could spell the death for the tape-archive as it stands or it will necessitate some significant changes in both user and vendor behaviour. A ten year archive will see at least four refreshes of the LTO standard on average; this means that your latest tape technology will not be able to read your oldest tapes. It is also likely that you are looking at some kind of extended maintenance and associated costs for your oldest tape-drives; they will certainly be End of Support Life. Media may be certified for 30 years; drives aren’t.

Migration will become a way of life for these archives and it is this that will be a major challenge for storage teams and anyone maintaining an archive at scale.

It currently takes 88 days to migrate a petabyte of data from LTO5-to-LTO6; this assumes 24×7, no drive issues, no media issues and a pair of drives to migrate the data. You will also be loading about 500 tapes and unloading about 500 tapes. You can cut this time by putting in more drives but your costs will soon start escalate as SAN ports, servers and periphery infrastructure mounts up.

And then all you need is for someone to recall the data whilst you are trying migrate it; 88 days is extremely optimistic.

Of course a petabyte seems an awful lot of data but archives of a petabyte+ are becoming less uncommon. The vendors are pushing the value of data; so no-one wants to delete what is a potentially valuable asset. In fact, working out the value of individual datum is extremely hard and hence we tend to place the same value on every byte archived.

So although tape might be the only economical place to store data today but as data volumes grow; it becomes less viable as long-term archive unless it is a write-once, read-never (and I mean never) archive…if that is the case, perhaps in Unix parlance, /dev/null is the only sensible place for your data.

But if you think your data has value or more importantly your C-levels think that your data has value; there’s a serious discussion to be had…before the situation gets out of hand. Just remember, any data migration which takes longer than a year will most likely fail.

Service Power..

Getting IT departments to start thinking like service providers is an up-hill struggle; getting beyond cost to value seems to be a leap too far for many. I wonder if it is a psychological thing driven by fear of change but also a fear of assessing value.

How do you assess the value of a service; well, arguably, it is quite is simple…it is worth whatever someone is willing to pay for it. And with the increase prevalence of service providers vying with internal IT departments; it should be relatively simple. They’ve pretty much set the base-line.

And then there are the things that the internal IT department just should be able to do better; they should be able to assess Business need better than external. They should know the Business and be listening to the ‘water cooler’ conversations.

They should become experts in what their company does; understand the frustrations and come up with ways of doing things better.

Yet there is often a fear of presenting the Business with innovative and better services. I think it is a fear of going to the Business and presenting a costed solution; there is a fear of asking for money. And there is certainly a fear of Finance but present the costs to the Business users first and get them to come to the table with you.

So we offer the same old services and wonder why the Business are going elsewhere to do the innovative stuff and while they are at it; they start procuring the services we used to provide. Quite frankly, many Corporate IT departments are in a death spiral; trying to hang-on to things that they could let go.

Don’t think I can’t ask the Business for this much money to provide this new service…think, what if the Business want this service and ask someone else? At least you are going to be bidding on your own terms and not being forced into a competitive bid against an external service provide; when it comes down to it, the external provider almost certainly employees a better sales-team than you.

By proposing new services yourself or perhaps even taking existing ‘products’ and turning them into a service; you are choosing the battle-ground yourselves…you can find the high ground and fight from a position of power.

5 Minutes

One of the frustrations when dealing with vendors is actually getting real availability figures for their kit; you will get generalisation,s like it is designed to be 99.999% available or perhaps 99.9999% available. But what do those figures really mean to you and how significant are they?

Well, 99.999% available equates to a bit over 5 minutes of downtime and 99.9999% equates to a bit over 30 seconds downtime over a year. And in the scheme of things, that sounds pretty good.

However, these are design criteria and aims; what are the real world figures? Vendors, you will find are very coy about this; in fact, every presentation I have had with regards to availability are under very strict NDA and sometimes not even notes are allowed to be taken. Presentations are never allowed to be taken away.

Yet, there’s a funny thing….I’ve never known a presentation where the design criteria are not met or even significantly exceeded. So why are the vendors so coy about their figures? I have never been entirely sure; it may be that their ‘mid-range’ arrays display very similar real world availability figures to their more ‘Enterprise’ arrays…or it might be that once you have real world availability figures, you might start ask some harder questions.

Sample size; raw availability figures are not especially useful if you don’t know the sample size. Availability figures are almost always quoted as an average and unless you’ve got a real bad design; more arrays can skew figures.

Sample characteristics; I’ve known vendors when backed into a corner to provide figures do some really sneaky things; for example, they may provide figures for a specific model and software release. This is often done to hide a bad release for example. You should always try to ask for the figures for the entire life of a product; this will allow you to judge the quality of the code. If possible as for a breakdown on a month-by-month basis annotated with the code release schedule.

There are many tricks that vendors try to pull to hide causes of downtime and non-availability but instead of focusing on the availability figures; as a customer, it is sometimes better to ask different specific questions.

What is the longest outage that you have suffered on one of your arrays? What was the root cause? How much data loss was sustained? Did the customer have to invoke disaster recovery or any recovery procedures? What is the average length of outage on an array that has gone down?

Do not believe a vendor when they tell you that they don’t have these figures and information closely and easily to hand. They do and if they don’t; they are pretty negligent about their QC and analytics. Surely they don’t just use all their Big Data capability to crunch marketing stats? Scrub that, they probably do.

Another nasty thing that vendors are in the habit of doing is forcing customers to not disclose to other customers that they have had issues and what they were. And of course we all comply and never discuss such things.

So 5 minutes…it’s about long enough to ask some awkward questions.

The Complexity Legacy

I don’t blog about my day-job very often but I want to relate a conversation I had today; I was chatting to one of the storage administrators who works on our corporate IT systems, they’ve recently put in some XIV systems (some might be an understatement) and I asked how he was getting on with them. He’s been doing the storage administrator thing for a long time and cut his teeth on the Big Iron arrays and I thought he might be a bit resentful at how easy the XIV is to administer but no…he mentioned a case recently when they needed to allocate a large chunk of storage in a real hurry; took 30 minutes to do a job which he felt would take all day on a VMAX.

And I believe him but…

Here’s the thing; in theory using the latest GUI tools such as Unisphere for VMAX, surely this should be the case for VMAX? So what is going on? Quite simply the Big Iron arrays are hampered by a legacy of complexity; even experienced administrators and perhaps especially experienced administrators like to treat them as complex, cumbersome beasts. It is almost as if we’ve developed a fear of them and treat them with kid gloves.

And I don’t believe it is just VMAX that is suffering from this; all of the Big Iron arrays suffer from this perception of complexity. Perhaps because they are still expensive, perhaps because the vendors like to position them as Enterprise beasts and not as something which as easy as to configure as your home NAS and perhaps because the storage community are completely complicit in the secret occult world of Enterprise storage?

Teach the elephants to dance…they can and they might not crush your toes.

Just How Much Storage?

A good friend of mine recently got in contact to ask my professional opinion on something for a book he was writing; it always amazes me that anyone asks my professional opinion on anything…especially people who have known me for many years but as he’s a great friend, I thought I’d  try to help.

He asked me how much a petabyte of storage would cost today and when I thought it would affordable for an individual? Both parts of the question are interesting in their own way.

How would a petabyte of storage cost? Why, it very much depends; it’s not as much as it cost last year but not as a cheap as some people would think. Firstly, it depends on what you might want to do with it; capacity, throughput and I/O performance are just part of the equation.

Of course then you’ve got the cost of actually running it; 400-500 spindles of spinning stuff takes a reasonable amount of power, cooling and facilities. Even if you can pack it densely, it is still likely to fall through the average floor.

There are some very good deals to be had mind you but you are still looking at several hundred thousand pounds, especially if you look at a four year cost.

And when will the average individual be able to afford a petabyte of storage? Well without some significant changes in storage technology; we are some time away from this being feasible. Even with 10 Terabyte disks, we are talking over a hundred disks.

But will we ever need a petabyte of personal storage? That’s extremely hard to say; I wonder if we will we see the amount of personal storage peak in the next decade?

And as for on-premises personal storage?

That should start to go into decline, for me it is already beginning to do so; I carry less storage around than I used to…I’ve replaced my 120Gb iPod with a 32 Gb phone but if I’m out with my camera, I’ve probably got 32Gb+ of cards with me. Yet with connected cameras coming and 4G (once we get reasonable tariffs), this will probably start to fall off.

I also expect to see the use of spinning rust go into decline as PVRs are replaced with streaming devices; it seems madness to me that a decent proportion of the world’s storage is storing redundant copies of the same content. How many copies of EastEnders does the world need to be stored on a locally spinning drive?

So I am not sure that we will get to a petabyte of personal storage any time soon but we already have access to many petabytes of storage via the Interwebs.

Personally, I didn’t buy any spinning rust last year and although I expect to buy some this year; this will mostly be refreshing what I’ve got.

Professionally, looks like over a petabyte per month is going to be pretty much run-rate.

That is a trend I expect to see continue; the difference between commercial and personal consumption is going to grow. There will be scary amounts of data around about you and generated by you; you just won’t know it or access it.

Software Sucks!

Every now and then, I write a blog article that could probably get me sued, sacked or both; this started off as one of those and has been heavily edited as to avoid naming names…

Software Quality Sucks; the ‘Release Early, Release Often’ meme appears to have permeated into every level of the IT stack; from the buggy applications to the foundational infrastructure, it appears that it is acceptable to foist beta quality code on your customers as a stable release.

Having run a test team for the past few years has been eye-opening; by the time my team gets hands on your code…there should be no P1s and very few P2s but the amount of fundamentally broken code that has made it to us is scary.

And then also running an infrastructure team, this is beyond scary and heading into realms of terror and just to make things nice and frightening, every now and then, I ‘like’ to search vendor patch/bug databases for terms like ‘data corruption’, ‘data loss’ and other such cheery terms; don’t do this if you want to sleep well at night.

Recently I have come across such wonderful phenomena as a performance monitoring tool which slows your system down the longer it runs; clocks that drift for no explicable reason and can lock out authentication; reboots which can take hours; non-disruptive upgrades which are only non-disruptive if run at a quiet time; errors that you should ignore most of the time but sometimes they might be real; files that disappear on renaming; updates replacing a update which makes a severity 1 problem worse..even installing fixes seems to be fraught with risk.

Obviously no-one in their right minds ever takes a new vendor code release into production; certainly your sanity needs questioning if you put a new product which has less than two year’s GA into production. Yet often the demands are that we do so.

But it does lead me wondering, has software quality really got worse? It certainly feels that it has? So what are the possible reasons, especially in the realms of infrastructure?

Complexity? Yes, infrastructure devices are trying to do more; no-where is this more obvious than in the realms of storage where both capabilities and integration points have multiplied significantly. It is no longer enough to support the FC protocol; you must support SMB, NFS, iSCSI and integration points with VMware and Hyper-V. And with VMware on an 12 month refresh cycle pretty much, it is getting tougher for vendors and users to decide which version to settle on.

The Internet? How could this cause a reduction in software quality? Actually, the Internet as a distribution method has made it a lot easier and cheaper to release fixes; before if you had a serious bug, you would find yourself having to distribute physical media and often in the case of infrastructure, mobilising a force of Engineers to upgrade software. This cost money, took time and generally you did not want to do it; it was a big hassle. Now, send out an advisory notice with a link and  let your customers get on with it.

End-users? We are a lot more accepting of poor quality code; we are used to patching everything from our PC to our Consoles to our Cameras to our TVs; especially, those of us who work in IT and find it relatively easy to do so.

Perhaps it is time to start a ‘Slow Software Movement’ which focuses on delivering things right first time?