Meaningless Metrics!

Recently we’ve had a bit of a senior management reshuffle and as a result, people are in ;prove that their teams are valuable’ mode again. Inevitably this means producing metrics to show why your team is brilliant, so my manager came along and asked me to produce some metrics about our storage estate.

So off I went and produced some pretty graphs showing storage growth, increase in number of technologies supported and some other things that I thought might show why my team does an excellent job. One of the problems with running a very good team is that they tend to have a relatively low visibility; they don’t cause problems and things don’t often break. Most of the time, people don’t know that we are here.

Anyway, as is the way of these things; the usual comment comes back; how many terabytes per head do we manage and what is the industry average? Now with over two petabytes of active data per head under management, I could claim that my team is great but to be honest no-one really knows what the industry average is and would it be meaningful anyway? I’ve seen from 50Tb to 1 Petabyte quoted but with a figure of 150-200Tb most oft quoted; so my team could be good, great or downright amazing (it’s somewhere between the last two).

However, this is all meaningless and becomes more meaningless the more that industry changes. For example, we are managing what is closer to a big-data environment; big data environments have large infrastructures but if I am being honest, they are not especially hard to manage.

We rely on a fair amount of automation and standardisation; applications often do a lot of the storage management function and although the storage infrastructure grows, it tends not to change massively. Allocations are large but they tend to be relatively static; in that once allocated, it does not move around a lot; we make a lot of use of clustered file-systems and most of the work we do is non-disruptive. We add nodes in and even if node fails, it tends not to take every thing with it; we can live with a node down for weeks; the applications are resilient and services can generally cope with failures.

We have our busy times but it does generally run pretty smoothly; most of our time is spent on working out how we can make it run even more smoothly and how we improve the service, which in my opinion is exactly how it should be. The best support teams look busy but not stressed; hero cultures are not where it’s at.

So I’ve given my boss a figure but I am really not sure that it has a lot of value. Lies, Damn Lies and Metrics!

3 Comments

Sebastian Thäle says:

January 22, 2012 at 3:48 pm

Hi Martin,
So it seems you run one of the few teams not affected by the drawback of easy management interfaces. Congratulations 🙂 I’m interested in what metrics you actuelly used then. Latency? SLA violations? Duration of provisions? Compare money spent over the years? Which metrics would have a meaning in your opinion?
Cheers seb

1. Martin Glassborow says:
  
  January 22, 2012 at 5:31 pm
  
  What metrics? Data-growth expressed in the terms of hours of HD stored; the number of different technologies to show that complexity is increasing and not simplifying at the present time and a few others. Very high level and easily relateable to by a layman.
  
brerrabbit says:

January 23, 2012 at 10:31 pm

I agree wholeheartedly with your post (as is often the case), and this one is particularly timely for me as I’m preparing for a justification for a new major purchase. Part of what I’m trying to capture in the justification is how one of our goals in selecting a solution ought to be minimizing any increases in the complexity of our environment, given that my staff have their hands-full already.

I’ve seen a few of the same values for storage environments that you mentioned, x TB/FTE and I agree that they are worthless. There is absolutely no context to that number. There are at least two aspects to storage environments that are completely ignored:

1) It assumes that all storage shops have exactly the same feature set / level of complexity in play, and

2) It assumes that all storage shops do the same jobs and do them equally well.

We’ve all encountered or heard of storage shops that are fairly straightforward … some shared storage for the virtualization environment or some SQL clusters, maybe a NAS to manage … while others push their operations to the extremes of complexity with multiple SAN types, elaborate replication schemes, integrated snapshots with data protection software, etc.

Likewise, some storage shops are extremely aggressive with code updates, failover testing, performance analysis or tuning, capacity management or the like, while others are content to leave things alone as long as the data is available (probably so that the storage admins can spend time on the 12 other things that they’ve been tasked with “managing”).

The whole implication of the TB / FTE metric is that it somehow measures….competence? Efficiency? “Company XYZ is managing 900 TB / FTE, why are our numbers so low?!?”

Questions like that are great! And important. I would love to be able to objectively evaluate my team, I imagine a lot of storage admins would. But clearly we need something more than one isolated number.

M	T	W	T	F	S	S
« May
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Meaningless Metrics!

3 Comments

Leave a Reply Cancel reply

Categories

Blogroll

Google Ads