Storagebod Rotating Header Image

Big Ideas

Software Sucks!

Every now and then, I write a blog article that could probably get me sued, sacked or both; this started off as one of those and has been heavily edited as to avoid naming names…

Software Quality Sucks; the ‘Release Early, Release Often’ meme appears to have permeated into every level of the IT stack; from the buggy applications to the foundational infrastructure, it appears that it is acceptable to foist beta quality code on your customers as a stable release.

Having run a test team for the past few years has been eye-opening; by the time my team gets hands on your code…there should be no P1s and very few P2s but the amount of fundamentally broken code that has made it to us is scary.

And then also running an infrastructure team, this is beyond scary and heading into realms of terror and just to make things nice and frightening, every now and then, I ‘like’ to search vendor patch/bug databases for terms like ‘data corruption’, ‘data loss’ and other such cheery terms; don’t do this if you want to sleep well at night.

Recently I have come across such wonderful phenomena as a performance monitoring tool which slows your system down the longer it runs; clocks that drift for no explicable reason and can lock out authentication; reboots which can take hours; non-disruptive upgrades which are only non-disruptive if run at a quiet time; errors that you should ignore most of the time but sometimes they might be real; files that disappear on renaming; updates replacing a update which makes a severity 1 problem worse..even installing fixes seems to be fraught with risk.

Obviously no-one in their right minds ever takes a new vendor code release into production; certainly your sanity needs questioning if you put a new product which has less than two year’s GA into production. Yet often the demands are that we do so.

But it does lead me wondering, has software quality really got worse? It certainly feels that it has? So what are the possible reasons, especially in the realms of infrastructure?

Complexity? Yes, infrastructure devices are trying to do more; no-where is this more obvious than in the realms of storage where both capabilities and integration points have multiplied significantly. It is no longer enough to support the FC protocol; you must support SMB, NFS, iSCSI and integration points with VMware and Hyper-V. And with VMware on an 12 month refresh cycle pretty much, it is getting tougher for vendors and users to decide which version to settle on.

The Internet? How could this cause a reduction in software quality? Actually, the Internet as a distribution method has made it a lot easier and cheaper to release fixes; before if you had a serious bug, you would find yourself having to distribute physical media and often in the case of infrastructure, mobilising a force of Engineers to upgrade software. This cost money, took time and generally you did not want to do it; it was a big hassle. Now, send out an advisory notice with a link and  let your customers get on with it.

End-users? We are a lot more accepting of poor quality code; we are used to patching everything from our PC to our Consoles to our Cameras to our TVs; especially, those of us who work in IT and find it relatively easy to do so.

Perhaps it is time to start a ‘Slow Software Movement’ which focuses on delivering things right first time?

Why So Large?

One of the most impressive demonstrations I saw at SNW Europe was from the guys at Amplidata; on their stand, they had a tiny implementation of Amplistor with the back-end storage being USB memory-sticks. This enabled a quick and effect demonstration of their erasure encoding protect and the different protection levels on offer; pull one stick and both video streams kept working, pull another one and one stopped, the other kept playing.

It was a nice little demonstration of the power of their solution; well I liked it.

But it did start me thinking, why do we assume that object-stores should be large? Why do the start-ups really only target petabyte+ requirements? Certainly those who are putting together hardware appliances seem to want to play in that space?

Is there not a market for a consumer-level device? Actually, as we move to a mixed-tier environment even at the consumer level with SSD for application/operating system and SATA for content, this might start to make a lot of sense.

We could start to choose protection levels for content appropriate to the content; so we might have a much higher level of protection for our unique content, think photos and videos of the kids; we might even look at some kind of Cloud storage integration for off-site.

And then I started to think some more; is there not a market for an consumer device which talks NFS, SMB and S3? Probably not yet but there may well be in the future as applications begin to support things like S3 natively. I can see this playing especially well for consumers who use tablets as their primary computing device, many apps already talk to the various cloud storage providers and it is not a stretch to think that they might be able to talk to a local cloud/object store as well.

I have seen home NAS boxes which support S3 as a back-up target; actually another device that I saw at SNW which is more a SMB device than a home NAS supports a plethora of Cloud Storage options. The Imation Dataguard Data Protection Device looks very interesting from that point of view. So when will we see the likes of Synology, Drobo and competitors serve object storage and not just use it as a back-up target?

I think it will happen but the question is, will Microsoft, Apple etc beat the object storage vendors to the punch and integrate it into the operating system?

Good Enough Isn’t?

One of the impacts of the global slowdown has been that many companies have been focussing on services and infrastructure which is just good enough. For some time now, many of the mainstream arrays have been considered to be good enough. But the impact of SSD and Flash may change our thinking and in fact I hope it does.

So perhaps Good Enough Isn’t Really Good Enough? Good Enough is only really Good Enough if you are prepared to stagnate and not change; if we look at many enterprise infrastructures, they haven’t really changed that much in over the past 20 years and the thinking behind them has not changed dramatically. Even virtualisation has not really changed our thinking because still despite the many pundits and bloggers like me who witter on about service thinking and Business alignment; for many it is just hot-air.

There appears to be a lack of imagination that permeates our whole business; if a vendor turns up and says ‘I have a solution which can reduce your back-up windows by 50%’, the IT manager could think ‘Well, I don’t have a problem with my back-up windows; they all run perfectly well and everyone is happy…’. What they don’t tend to ask is ‘If my back-up windows are reduced by 50%, what can I do with the time that I have saved; what new service can be offered to the Business?’

Over the past few years, the focus has been on Good Enough; we need to get out of this rut and start to believe that we can do things better.

As storage people, we have been beaten up by everyone with regards to cost and yet I still hear it time and time again that storage is the bottle-neck in all infrastructures; time to provision, performance, and capacity; yet we are still happy to sit comfortably talking about ‘Good Enough Storage’.

Well, let me tell you that it isn’t ‘Good Enough’ and we need to be a lot more vocal in articulating why it isn’t and why doing things differently would be better; working a lot closer with our customers in explaining the impact of ‘Good Enough’ and letting them decide what is ‘Good Enough’.

Big Answers Need Big Data?

At a SNW briefing session today, X-IO (Xiotech) talked a lot of sense about Big Data; in fact it was almost the most sense that I have heard spoken about Big Data in a long time. The fact is that most Big Data isn’t really that big and the data-sets are not huge; there are exceptions but most big data-sets that many companies will use can be measured in a few terabytes and not the tens or hundreds of terabytes that the big storage vendors want to talk about.

Sentiment data which can derived from social networking, these are not necessarily big data sets. A tweet for example is 140 characters, so 140 bytes…a terabyte is 1 099 511 627 776 bytes; we can store a lot of tweets in a terabyte and within that data, there is a lot of information that can be extracted.

In fact, there are probably some Big Answers in that not so Big Data but we need to get rid of the noise; in order to do this, we need to be able to process this data differently and directly. The most important thing that the storage can do is to vanish and become invisible; allow data processing to be carried out in the most natural way and not require various work-arounds which hide the deficiencies of the storage.

If your storage vendor spends all their time talking about the bigness of data; then perhaps they might be the wrong vendor.

Wellies!

I was watching the iPhone 5 announcement with a sinking feeling; I am at the stage where I am thinking about upgrading my phone and have been thinking about coming back to Apple and I really wanted Apple to smash the ball over the pavilion and into the car-park (no baseball metaphors for me). But they didn’t, it’s a perfectly decent upgrade but nothing which has made my mind up for me.

I am now at the situation where I am considering another Android phone, an iPhone or even the Lumia 920 and there’s little to choose between them; I don’t especially want any of them, they’ll all do the job. I just want someone to do something new in the smartphone market but perhaps there’s nothing new to do.

And so this brings me onto storage; we are in the same place with the general purpose corporate storage; you could choose EMC, NetApp, HDS, HP or even IBM for your general purpose environment and it’d do the job. Even price-wise, once you have been through the interminable negotiations mean that there is little between them. TCO, you choose the model which supports your decision; you can make it look good or bad as you want. There’s not even a really disruptive entry to the market; yes, Nexanta are getting some traction but there’s no big market swing.

I don’t get the feeling that there is a big desire for change in this space. The big boys are packaging their boring storage with servers and networking and trying to make it look interesting and revolutionary. It’s not.

And yet, there are more storage start-ups in storage than ever before but they are all focused around some very specific niches and we seeing these niches becoming mainstream or gaining mainstream attention.

SSD and flash-accelerated devices aimed at the virtualisation market; there’s a proliferation of these appearing from players large and small. These are aimed at VMware environments generally, once I see them appearing for Hyper-V and other rivals; then I’ll believe that VMware is really being challenged in the virtualisation space.

Scalable bulk storage; be it Object or traditional file protocols; we see more and more players in this space. And there’s no real feeling of a winner or a dominant player; this is especially true in the Object space where the lack of or even the perceived lack of a standard is hampering adoption by many who would really be the logical customers.

And then there is the real growth where the exciting stuff is happening; this is the like of Dropbox, Evernote and others; this is really where the interesting stuff is happening, it is all about the application and the API access. This is kind of odd, people seem to be willing to build applications, services and apps around these proprietary protocols in a way that people feel unwilling to do so with the Object Storage vendors. Selling an infrastructure product is hard, selling an infrastructure product masquerading as a useful app….maybe that is the way to go.

It is funny that some of the most significant changes in the way that we will do infrastructure and related services in the future is being driven from completely non-traditional spaces..but this kind of brings me back round to mobile phones, Nokia didn’t start as a mobile company and who knows perhaps it’ll go back to making rubber boots again.

Start-Ups Galore

Recently it seems that there are more storage start-ups than ever before; be it flash-based storage, object storage, storage aimed at virtual environments, cloud storage, storage as software, storage appliances; it seems that every day more and more press releases announcing yet another innovation in the storage space hit my email address.

How many of these are truly innovative, not so many I guess but it seems that the storage start-up industry is in rude health. It seems that the barrier to entry into the market has significantly dropped and that the introduction of commodity-based hardware and software has really changed things.

And yet we still see the doom merchants predicting the end of the storage administrator and to be fair, a few years ago, I might have been in agreement but the sheer diversity of storage infrastructures, big data growth and just general growth leads me to feel that the storage administrator role still has life. Yes, it will change the role and the role will evolve much as storage has evolved and the role may become more virtualisation focussed but there will still be storage specialists and there will probably be as many as ever.

I am going to do my bit to ensure that the role of the ‘Storage Bod’ continues and encourage the diversity which will drive more complexity; I am a judge for the Tech Trailblazers awards, so if you are a new storage start-up and your product can further drive the complexity into the storage environment, you should enter. But if your product is really simple, just works and makes lives easier, please don’t bother….we want the environment to stay complex and a black-art.

Of course I am probably in the minority and some of the judges will be looking for more sensible things, so I guess start-ups with products both complex and simple should probably enter. There’s some good prizes, some great sponsors and excellent judges (well, better qualified than me anyway).

As I say the barrier for entry to the market seems to have fallen somewhat but some extra cash and help is always handy.

Patience is a Virtue?

Or is patience just an acceptance of latency and friction? A criticism oft made of today’s generation is that they expect everything now and this is a bad thing but is it really?

If a bottle of fine wine could mature in an instant and be good as a ’61; would this be a bad thing? If you could produce a Michelin quality meal in a microwave, would it be a bad thing?

Yes, today we do have to accept that such things take time but is it really a virtue? Is there anything wrong with aspiring to do things quicker whilst maintaining quality?

We should not just accept that latency and friction in process is inevitable; we should work to try to remove them from the way that we work.

For example, change management is considered to be a necessary ITIL process but does it have to be the lengthy bureaucratic process that it is? If your infrastructure is dynamic, surely your change process should be dynamic too? If you are installing a new server, should you have to raise a change

1) to rack and stack
2) to configure the network
3) to install the operating system
4) to present the storage
5) to add the new server to the monitoring solution etc, etc

Each of these being an individual change being raised by separate teams. Or should you be able to do this all programmatically? Now obviously in a traditional data-centre, some of these require physical work but once the server has been physically commissioned, there is nothing there which should not be able to be done programmatically and pretty much automatically.

And so it goes for many of the traditional IT processes; they simply introduce friction and latency to reduce the risk of the IT department smacking into a wall. This is often deeply resented by the Business who simply want to get their services up and running, it is also resented by the people who are following the processes and then it is thrown away in an emergency (which happens more often than you would possibly expect ;-) ).

This is not a rant against ITIL, it was tool for a more sedate time but in a time when Patience is no longer really a virtue..do we need a better way. Or perhaps something like an IT Infrastructure API?

Don’t throw away the rule-book but replace it with something better.

p.s Patience was actually my grand-mother; she had her vices but we loved her very much.

Not Special

As we grow up, there are a various times in our lives when we realise that we are not as special as we always believed that we are; or certainly that we are less important than we thought. This can be the arrival of a younger sibling or the birth of a child, these sort of events can effect us greatly and the feelings resulting from them can be quite painful but it is all a necessary part of growing, learning who we are and changing our perspective.

And as it is for people, so it is for Business and Business function; the moment you believe that you are special as a matter of right, something is going to come along and disrupt that centre.

Internal IT functions have for a long time believed that they are special, we all know that they are not. But so do many Businesses and other functions; I’ve lost track of the number times that someone has tried to convince me that they don’t have to follow a process because they are special. And yet we find ourselves kow-towing to that attitude all the time; internally and externally we find ourselves making exceptions to rules….whether it is to the Mega-Corporation who does not want to pay tax or the the Senior Manager who believes that they should not have to follow the internal IT policy.

However, I do believe that we should embrace difference; the department that wants to work differently because it supports their processes, they should be supported. You change the rules but don’t make exceptions; if the rules don’t work, don’t ignore the rules but change them. And at times, don’t be afraid to tear up the rule book and come up with a completely new set of rules; pick up that ball and run with it.

I look around at the moment and I see so many people and companies trying to put in exceptions and workarounds to fit their business models and activities; trying to foreclose on the potential disruption that is coming…believing that they are special; from banking to broadcast…when they might be better tearing up their play-book and starting again.

No-one believed that you could win a major Football tournament without strikers, Spain showed that you can…you just have to play differently.

Meltdown

The recent RBS systems meltdown and the rumoured reasons for it are a salutary reminder to all as to how much we are all reliant on the continued availability of core IT systems; these systems are pretty much essential to modern life. Yet arguably the corporations that run these systems have become incredibly cavalier and negligent about these systems; their maintenance and long-term sustainability even in supposedly heavily regulated sectors such as Banking is woeful.

There is a ‘It Aint Broke, So Don’t Fix It’ mentality that has led to systems that are unbelievably complex and tightly coupled; this is especially true of those early adopters of IT technologies such as the Banking sectors.

I spent my early IT years working for a retail bank in the UK and even twenty years ago, this mentality was prevalent and dangerous; code that no-one understood sat at the core of systems, wrappers written to try to hide the ancient code meant that you needed to be half-coder, half-historian to stand a chance of working out exactly what it did.

If we add another twenty years to this, twenty years of rapid change where we have seen the rise of the Internet, 24 hour access to information and services, mobile computing and a financial collapse; you have almost a perfect storm. Rapidly changing technology coupled with intense pressure on costs has led to under-investment on core infrastructure whilst Business chases the new. Experience has oft been replaced with expedience.

There is simply no easy Business Case that flies that justifies the re-writing and redevelopment of your core legacy applications, even if you still understand them; well, there wasn’t until last week. If you don’t do this and if you don’t start to understand your core infrastucture and applications; you might well find yourself in the same position that the guys in RBS have.

Systems that have become too complex and are hacked together to do things that they were never supposed to do; systems which if I’m being generous were developed in the 80s but more likely the 70s trying to cope with the demands of the 24 hour generation; systems which are carrying out more processing in realtime and yet are at their heart, batch systems.

If we continue with this route, there will be more failures and yet more questions to be answered. Dealing with legacy should no longer be ‘It Aint Broke, So Don’t Fix It’ but ‘It Probably Is Broke, You Don’t know It…yet!’ Look at your Business, if it has changed out of all recognition, if your processes and products no longer resemble those of twenty years ago, it is unlikely that IT systems designed twenty years are fit for purpose. And if you’ve stuck twenty years worth of sticking plaster on them to try and make them fit for purpose; it’s going to hurt when you try to remove the sticking plaster.

This is not a religious argument about Cloud, Distributed Systems, Mainframe but one about understanding the importance of IT to your Business and investing in it appropriately.

IT may not be your Business but IT makes your Business…you probably wouldn’t leave your offices to fall into disrepair, patching over the cracks until it falls down…don’t do the same your IT.

The Last of the Dinosaurs?

Myself and Chris ‘The Storage Architect’ Evans were having a twitter conversation during the EMC keynote where they announced the VMAX 40K; Chris was watching the live-stream and I was watching the Chelsea Flower Show, from Chris’ comments, I think that I got the better deal.

But we got to talking about the relevance of the VMAX and the whole bigger is better thing. Every refresh, the VMAX just gets bigger and bigger, more spindles and more capacity. Of course EMC are not the only company guilty of the bigger is better hubris.

VMAX and the like are the ‘Big Iron’ of the storage world; they are the choice of the lazy architect, the infrastructure patterns that they support are incredibly well understood and text-book but do they really support Cloud-like infrastructures going forward?

Now, there is no doubt in my mind that you could implement something which resembles a cloud or let’s say a virtual data-centre based on VMAX and it’s competitors. Certainly if you were a Service Provider which aspirations to move into the space; it’s an accelerated on-ramp to a new business model.

Yet just because you can, does that mean you should? EMC have done a huge amount of work to make it attractive, an API to enable to you to programmatically deploy and manage storage allows portals to be built to encourage self-service model. Perhaps you believe that this will allow light-touch administration and the end of the storage administrator.

And then myself and Chris started to talk about some of the realities; change control on a box of this size is going to be horrendous; in your own data-centre, co-ordination is going to be horrible but as a service provider? Well, that’s going to be some interesting terms and conditions.

Migration, in your own environment,  to migrate a petabyte array in a year means migrating 20 terabytes a week more or less. Now, depending on your workload, year-ends, quarter-ends and known peaks, your window for migrations could be quite small. And depending how you do it, it is not necessarily non-service impacting; mirroring at the host level means significantly increasing your host workload.

As a service provider; you have to know a lot about the workloads that you don’t really influence and don’t necessarily understand. As a service provider customer, you have to have a lot of faith in your service provider. When you are talking about massively-shared pieces of infrastructure, this becomes yet more problematic. You are going to have to reserve capacity and capability to support migration; if you find yourself overcommitting on performance i.e you make assumptions that peaks don’t all happen at once, you have to understand the workload impact of migration.

I am just not convinced that these massively monolithic arrays are entirely sensible; you can certainly provide secure multi-tenancy but can you prevent behaviours impacting the availability and performance of your data? And can you do it in all circumstances, such as code-level changes and migrations.

And if you’ve ever seen the back-out plan for a failed Enginuity upgrade; well the last time I saw one, it was terrifying.

I guess the phrase ‘Eggs and Baskets’ comes to mind; yet we still believe that bigger is better when we talk about arrays.

I think we need to have some serious discussion about optimum array sizes to cope with exceptions and when things go wrong. And then some discussion about the migration conundrum. Currently I’m thinking that a petabyte is as large as I want to go and as for the number of hosts/virtual hosts attached, I’m not sure. Although it might be better to think about the number of services an array supports and what can co-exist, both performance-wise but also availability window-wise.

No, the role of the Storage Admin is far from dead; it’s just become about administering and managing services as opposed to LUNs. Yet, the long-term future of the Big Iron array is limited for most people.

If you as an architect continue to architect all your solutions around Big Iron storage…you could be limiting your own future and the future of your company.

And you know what? I think EMC know this…but they don’t want to scare the horses!