Storagebod Rotating Header Image

January 20th, 2012:

Meaningless Metrics!

Recently we’ve had a bit of a senior management reshuffle and as a result, people are in ;prove that their teams are valuable’ mode again. Inevitably this means producing metrics to show why your team is brilliant, so my manager came along and asked me to produce some metrics about our storage estate.

So off I went and produced some pretty graphs showing storage growth, increase in number of technologies supported and some other things that I thought might show why my team does an excellent job. One of the problems with running a very good team is that they tend to have a relatively low visibility; they don’t cause problems and things don’t often break. Most of the time, people don’t know that we are here.

Anyway, as is the way of these things; the usual comment comes back; how many terabytes per head do we manage and what is the industry average?  Now with over two petabytes of active data per head under management, I could claim that my team is great but to be honest no-one really knows what the industry average is and would it be meaningful anyway? I’ve seen from 50Tb to 1 Petabyte quoted but with a figure of 150-200Tb most oft quoted; so my team could be good, great or downright amazing (it’s somewhere between the last two).

However, this is all meaningless and becomes more meaningless the more that industry changes. For example, we are managing what is closer to a big-data environment; big data environments have large infrastructures but if I am being honest, they are not especially hard to manage.

We rely on a fair amount of automation and standardisation; applications often do a lot of the storage management function and although the storage infrastructure grows, it tends not to change massively. Allocations are large but they tend to be relatively static; in that once allocated, it does not move around a lot; we make a lot of use of clustered file-systems and most of the work we do is non-disruptive. We add nodes in and even if node fails, it tends not to take every thing with it; we can live with a node down for weeks; the applications are resilient and services can generally cope with failures.

We have our busy times but it does generally run pretty smoothly; most of our time is spent on working out how we can make it run even more smoothly and how we improve the service, which in my opinion is exactly how it should be. The best support teams look busy but not stressed; hero cultures are not where it’s at.

So I’ve given my boss a figure but I am really not sure that it has a lot of value. Lies, Damn Lies and Metrics!