Storagebod Rotating Header Image

General IT Management

Reality is persistent

I see quite a few posts about this storage or that it is going to change everything or has changed everything. And yet, I see little real evidence that storage usage is really changing for many.  So why is this? 

Let’s take on some of the received wisdom that seems to be percolating around. 

Object Storage can displace block and file?

It depends; replacing block with object is somewhat hard. You can’t really get the performance out of it; you will struggle with the APIs especially to drive performance for random operations and partial updates.

Replacing file with object is somewhat easier, most unstructured data could happily be stored as object and it is. It’s an object called a file. I wonder how many applications even using S3  APIs treat Object Storage anything other than a file-store, how many use some of the extended metadata capabilities? 

In many organisations; what we want is cheaper block and file. If we can fake this by putting a gateway device in front of Object Storage; that’s what we will do. The Object vendors have woken up to this and that is what they are doing. 

But if a vendor can do ‘native’ file with some of the availability advantages of a well-written erasure coding scheme at a compelling price point, we won’t care.

And when I can boot from Object me.   

All new developers are Object Storage aficionados?

I’m afraid from my limited sample size; I find this is rarely the case. Most seem to want to interact with file-systems or databases for their persistence layer. Now the nature of the databases that they want interact with is changing with more becoming comfortable with NoSQL databases.

Most applications just don’t produce enough data to warrant any kind of persistence layer that requires Object or even any kind of persistence layer at all.  

Developers rarely care about what their storage is; they just want it to be there and work according to their needs. 

Technology X will replace Technology Y

Only if Technology Y does not continue to develop and only if Technology X has a really good economic advantage. I do see a time when NAND could replace rotational rust for all primary storage but for secondary and tertiary storage; we might still be a way off. 

It also turns out that many people have a really over-inflated idea about how many IOPs their application need; there appears to be a real machismo about claiming that you need 1000s of IOPS…when our monitoring shows that someone could write with a quill pen and still fulfil the requirement. Latency does turn out to be important; when you do your 10 IOPS, you want it to be quick. 

Storage is either free or really cheap?

An individual gigabyte is basically free; a thousand of these is pretty cheap but a billion gigabytes is starting to get a little pricey.

A Terabyte is not a lot of storage? 

In my real life, I get to see a lot of people who request a terabyte of storage for a server because hell, even their laptop has this amount of storage. But for many servers, a terabyte is a huge amount of storage..many applications just don’t have this level of requirement for persistent data. A terabyte is still a really large database for many applications; unless the application developers haven’t bother to write a clean-up process.

Software-Defined is Cheaper? 

Buy a calculator and factor in your true costs. Work out what compromises you might have to make and then work out what that is worth to you. 

Google/Amazon do it, so we can too?

You could but is it really your core business? Don’t try to compete with the web-scale companies unless you are one..focus on providing your business with the service it requires. 

Storage Administration is dead?

It changed, you should change too but there is still a role for people who want to manage the persistent data-layer in all it’s forms. It’s no longer storage…it’s persistence.

Mine is the only reality?

I really hope not…





A Ball of Destruction…

I’m not sure that EMC haven’t started an unwelcome trend; I had a road-map discussion with a vendor this week where they started to talk about upcoming changes to their questioning ‘but surely that’s not just a disruptive upgrade but destructive?’ was met with an affirmative. Of course like EMC; the upgrade would not be compulsory but probably advisable.

The interesting thing with this one is that it was not a storage hardware platform but a software defined storage product. And we tend to be a lot more tolerant of such disruptive and potentially destructive upgrades. Architecturally as we move to more storage as software as opposed to being software wrapped in hardware; this is going to be more common and we are going to have design infrastructure platforms and applications to cope with this.

This almost inevitably means that we will need to purchase more hardware than previously to allow us to build zones of availability to allow upgrades to core systems to be carried out out as non-disruptively as possible. And when we start to dig into the nitty-gritty; we may find that this starts to push costs and complexity up…whether these costs go up so much that the whole commodity storage argument starts to fall to pieces is still open to debate.

I think for some businesses it might well do; especially those who don’t really understand the cloud model and start to move traditional applications into the cloud without a great deal of thought and understanding.

Now this doesn’t let EMC off the hook at all but to be honest; EMC have a really ropey track-record on non-disruptive upgrades in the past…more so than most realise. Major Enginuity upgrades have always come with a certain amount of disruption and my experience has not always been good; the levels of planning and certification required has kept many storage contractors gainfully employed. Clariion upgrades have also been scary in the past and even today, Isilon upgrades are no-where as near as clean as they have you believe.

EMC could have of course got away with the recent debacle if they’d simply released a new hardware platform and everyone would have accepted that this was going to involve data-migration and move data around.

Still, the scariest upgrade I ever had was an upgrade of an IBM Shark which failed half-way and left us with one node at one level of software and one at a different level. And IBM scratching their heads. But recently, the smoothest upgrades have been even elephants can learn to dance.

As storage vendors struggle with a number of issues; including the setting of the sun on traditional data protection schemes such as RAID; I would expect the number of destructive and disruptive upgrades to increase. And the marketing spin around them from everyone to reach dizzying heights. As vendors manipulate the data we are storing in more and more complex and clever ways; the potential for disruption and destructive upgrades is going increase.

Architectural mistakes are going to be made; wrong alleys will be followed…Great vendors will admit and support their customers through these changes. This will be easier for those who are shipping software products wrapped with hardware; this is going to be much harder for the software-only vendors. If a feature is so complex that it seems magic; you might not want to use it…I’m looking for simple to manage, operate and explain.

An argument for Public Cloud? Maybe, as this will take the onus away from you to arrange. Caveat Emptor though and this may just mean that disruption is imposed upon you and if you’ve not designed your applications to cope with this…Ho hum!





Buying High-End Storage

I was in the process of writing a blog about buying High-End storage..then I remembered this sketch. So in a purely lazy blog entry, I think this sums at the experience of many storage buyers…


I think as we head into a month or so of breathless announcements, bonkers valuations and industry is worth a watch…

But I do have some special audiophile SAN cables which will enhance the quality of your data if you want some!! It may even enbiggen it!


I’m a big fan of Etherealmind and his blog; I like that it is a good mix of technical and professional advice; he’s also a good guy to spend an hour or so chatting to, he’s always generous with his time to peers and even when he knows a lot more than you about a subject, you never really feel patronised or lectured to.

I particularly liked this blog, myself and Greg are really on the same page with regards to work/life balance but it is this paragraph that stands out..


Why am I focussed on work life ? After 25 or so years in technology, I have developed some level of mastery.  Working on different products is usually just a few days work to come up to speed on the CLI or GUI. Takes a few more weeks to understand some of the subtle tricks. Say a month to be competent, maybe two months. The harder part is refreshing my knowledge on different technologies – for example, SSL, MPLS, Proxy, HTTP, IPsec, SSL VPN. I often need to refresh my knowledge since it fades from my brain or there is some advancement. IPsec is a good example where DMVPN is a solid advancement but takes a few weeks to update the knowledge to an operational level.

Now although he is talking about networking technologies; what he says is true about storage technologies and actually pretty much all of IT these days. You should be able to become productive on most technologies in a matter of days providing you have the fundamentals; spend your early days becoming knowledgeable about the underlying principles and avoid vendor-specific traps.

Try not to run a translation layer in your mind; too many storage admins are translating back to the first array that they worked on; they try to turn hypers and metas into aggregates, they worry about fan-outs without understanding why you have to in some architectures and not necessarily so in others.

Understanding the underlying principles means that you can evaluate new products that much quicker; you are not working why product ‘A’ is better than product ‘B’, this often results in biases. You understand why product ‘A’ is a good fit for your requirement and you also understand why neither product is a good fit.

Instead of iSCSI bad, FC good…you will develop an idea as to the appropriate use-case for either.

You will become more useful…and you will find that you are less resistant to change; it becomes less stressful and easier to manage. Don’t become an EMC dude, become a Storagebod…Don’t become a Linux SysAdmin, become a SysAdmin.

Am I advocating generalism? To a certain extent, yes but you can become expert within a domain and not a savant for a specific technology.

And a final bit of advice; follow Etherealmind….he talks sense for a network guy!



A two question RFP….

Is it easy?

Is is cheap?

Pretty much these are the only two questions which interest me when talking to a vendor these days; after years of worrying about technology, it has all boiled down to those two questions. Of course, if I was to produce an RFx document with simply those two questions, I’d probably be out of a job fairly swiftly.

But those two questions are not really that simple to answer for many vendors.

Is it easy? How simply can I get your product to meet my requirements and business need? My business need may be to provide massive capacity; it could be to support many thousands of VMs, it could be to provide sub-millisecond latency.  This all needs to be simple.

It doesn’t matter if you provide me with the richest feature-set, simplest GUI or backwards compatibility with the ENIAC  if it is going to take a cast of thousands to do this. Yet still vendors struggle to answer the questions posed and you often get the response to a question you didn’t ask but the vendor wants to answer.

Is it cheap? This question is even more complicated as the vendor likes to try to hide all kinds of things but I can tell you; if you are not upfront with your costs and you start to present me with surprises, this is not good.

Of course features like deduplication and compression mean that the capacity costs are even more opaque but we are beginning to head towards the idea that capacity is free; performance costs. But as capacity becomes cheaper, the real value of primary storage dedupe and compression for your non-active set that sits on SATA and the likes begins to diminish.

So just make it easy, just make it cheap and make my costs predictable.

Be honest, be up-front and answer the damn questions….

Service Power..

Getting IT departments to start thinking like service providers is an up-hill struggle; getting beyond cost to value seems to be a leap too far for many. I wonder if it is a psychological thing driven by fear of change but also a fear of assessing value.

How do you assess the value of a service; well, arguably, it is quite is simple…it is worth whatever someone is willing to pay for it. And with the increase prevalence of service providers vying with internal IT departments; it should be relatively simple. They’ve pretty much set the base-line.

And then there are the things that the internal IT department just should be able to do better; they should be able to assess Business need better than external. They should know the Business and be listening to the ‘water cooler’ conversations.

They should become experts in what their company does; understand the frustrations and come up with ways of doing things better.

Yet there is often a fear of presenting the Business with innovative and better services. I think it is a fear of going to the Business and presenting a costed solution; there is a fear of asking for money. And there is certainly a fear of Finance but present the costs to the Business users first and get them to come to the table with you.

So we offer the same old services and wonder why the Business are going elsewhere to do the innovative stuff and while they are at it; they start procuring the services we used to provide. Quite frankly, many Corporate IT departments are in a death spiral; trying to hang-on to things that they could let go.

Don’t think I can’t ask the Business for this much money to provide this new service…think, what if the Business want this service and ask someone else? At least you are going to be bidding on your own terms and not being forced into a competitive bid against an external service provide; when it comes down to it, the external provider almost certainly employees a better sales-team than you.

By proposing new services yourself or perhaps even taking existing ‘products’ and turning them into a service; you are choosing the battle-ground yourselves…you can find the high ground and fight from a position of power.

5 Minutes

One of the frustrations when dealing with vendors is actually getting real availability figures for their kit; you will get generalisation,s like it is designed to be 99.999% available or perhaps 99.9999% available. But what do those figures really mean to you and how significant are they?

Well, 99.999% available equates to a bit over 5 minutes of downtime and 99.9999% equates to a bit over 30 seconds downtime over a year. And in the scheme of things, that sounds pretty good.

However, these are design criteria and aims; what are the real world figures? Vendors, you will find are very coy about this; in fact, every presentation I have had with regards to availability are under very strict NDA and sometimes not even notes are allowed to be taken. Presentations are never allowed to be taken away.

Yet, there’s a funny thing….I’ve never known a presentation where the design criteria are not met or even significantly exceeded. So why are the vendors so coy about their figures? I have never been entirely sure; it may be that their ‘mid-range’ arrays display very similar real world availability figures to their more ‘Enterprise’ arrays…or it might be that once you have real world availability figures, you might start ask some harder questions.

Sample size; raw availability figures are not especially useful if you don’t know the sample size. Availability figures are almost always quoted as an average and unless you’ve got a real bad design; more arrays can skew figures.

Sample characteristics; I’ve known vendors when backed into a corner to provide figures do some really sneaky things; for example, they may provide figures for a specific model and software release. This is often done to hide a bad release for example. You should always try to ask for the figures for the entire life of a product; this will allow you to judge the quality of the code. If possible as for a breakdown on a month-by-month basis annotated with the code release schedule.

There are many tricks that vendors try to pull to hide causes of downtime and non-availability but instead of focusing on the availability figures; as a customer, it is sometimes better to ask different specific questions.

What is the longest outage that you have suffered on one of your arrays? What was the root cause? How much data loss was sustained? Did the customer have to invoke disaster recovery or any recovery procedures? What is the average length of outage on an array that has gone down?

Do not believe a vendor when they tell you that they don’t have these figures and information closely and easily to hand. They do and if they don’t; they are pretty negligent about their QC and analytics. Surely they don’t just use all their Big Data capability to crunch marketing stats? Scrub that, they probably do.

Another nasty thing that vendors are in the habit of doing is forcing customers to not disclose to other customers that they have had issues and what they were. And of course we all comply and never discuss such things.

So 5 minutes…it’s about long enough to ask some awkward questions.

Patience is a Virtue?

Or is patience just an acceptance of latency and friction? A criticism oft made of today’s generation is that they expect everything now and this is a bad thing but is it really?

If a bottle of fine wine could mature in an instant and be good as a ’61; would this be a bad thing? If you could produce a Michelin quality meal in a microwave, would it be a bad thing?

Yes, today we do have to accept that such things take time but is it really a virtue? Is there anything wrong with aspiring to do things quicker whilst maintaining quality?

We should not just accept that latency and friction in process is inevitable; we should work to try to remove them from the way that we work.

For example, change management is considered to be a necessary ITIL process but does it have to be the lengthy bureaucratic process that it is? If your infrastructure is dynamic, surely your change process should be dynamic too? If you are installing a new server, should you have to raise a change

1) to rack and stack
2) to configure the network
3) to install the operating system
4) to present the storage
5) to add the new server to the monitoring solution etc, etc

Each of these being an individual change being raised by separate teams. Or should you be able to do this all programmatically? Now obviously in a traditional data-centre, some of these require physical work but once the server has been physically commissioned, there is nothing there which should not be able to be done programmatically and pretty much automatically.

And so it goes for many of the traditional IT processes; they simply introduce friction and latency to reduce the risk of the IT department smacking into a wall. This is often deeply resented by the Business who simply want to get their services up and running, it is also resented by the people who are following the processes and then it is thrown away in an emergency (which happens more often than you would possibly expect 😉 ).

This is not a rant against ITIL, it was tool for a more sedate time but in a time when Patience is no longer really a we need a better way. Or perhaps something like an IT Infrastructure API?

Don’t throw away the rule-book but replace it with something better.

p.s Patience was actually my grand-mother; she had her vices but we loved her very much.

Flash Changed My Life

All the noise about all flash arrays and acquisitions set me thinking a bit about SSDs and flash; how it has changed things for me.

To be honest, the flash discussions haven’t yet really impinged on my reality in my day-to-day job, we do have the odd discussion about moving metadata onto flash but we don’t need it quite yet; most of the stuff we do is sequential large I/O and spinning rust is mostly adequate. Streaming rust i.e tape is actually adequate for a great proportion of our workload. But we keep a watching-eye on the market and where the various manufacturers are going with flash.

But flash has made a big difference to the way that I use my personal machines and if I was going to deploy flash in a way that would make the largest material difference to my user-base, I would probably put it in their desktops.

Firstly, I now turn my desktop off; I never used to unless I really had to but waiting for it to boot or even awake from sleep was at times painful. And sleep had a habit of not sleeping or flipping out on a restart; turning the damn thing off is much better. This has had the consequence that I now have my desktops on an eco-plug which turns off all the peripherals as well; good for the planet and good for my bills.

Secondly, the fact that the SSD is smaller means that I keep less crap on it and am a bit more sensible about what I install. Much of my data is now stored on the home NAS environment which means I am reducing the number of copies I hold; I find myself storing less data. There is another contributing factor; fast Internet access means that I tend to keep less stuff backed-up and can stream a lot from the Cloud.

Although the SSD is smaller and probably needs a little more disciplined house-keeping; running a full virus check which I do on occasion is a damn sight quicker and there are no more lengthy defrags to worry about.

Thirdly, applications load a lot faster; although my desktop has lots of ‘chuff’ and can cope with lots of applications open, I am more disciplined about not keeping applications open because their loading times are that much shorter. This helps keeping my system running snappily, as does shutting down nightly I guess.

I often find on my non-SSD work laptop that I have stupid numbers of documents open; some have been open for days and even weeks. This never happens on my desktop.

So all-in-all; I think if you really want bang-for-buck and to put smiles on many of your users’ faces; the first thing you’ll do is flash-enable the stuff that they do everyday.



No Pain, No Gain?

I always reserve my right to change my mind and I am almost at the stage that I have changed my mind on blocks/stacks or whatever you want to call them? And for non-technical and non-TCO related reasons.

I think in general componentised and commodity-based stacks make huge sense; whether you are building out private or a public infrastructure; a building block approach is the only really scalable and sustainable approach. And I wrote internal design documents detailing this approach eight or nine years ago; I know I’m not the only one and we didn’t call it cloud…we called it governance and sensible.

But where I have changed my opinions is on the pre-integrated vendor stacks; I think that they are an expensive way of achieving a standardised approach to deploying infrastructure and I have not changed from this opinion.

However I think that this cost may well be the important catalyst for change; if you can convince a CFO/CEO/CIO/CTO etc that this cost is actually an investment but to see a return on the investment that you need to re-organise and change the culture of IT, it might well worth be paying.

If you can convince them that without the cultural change, they will fail….you might have done us all a favour. If it doesn’t hurt, it probably won’t work. If it is too easy to write things off when it’s tough…it’ll be too easy to fall back into the rut.

So EMC, VCE, Cisco, IBM, NetApp, HP etc….make it eye-wateringly expensive but very compelling please. Of course, once we’ve made the hard yards, we reserve the right to go and do the infrastructure right and cheap as well.