Just a quick thought…

Jun 26th, 2010

by Martin Glassborow.

Everything can break, you need to remind yourself this whenever a vendor is talking to you.

Over the past year, I have personally come across the following:

1) A triple disk failure in a RAID6 environment which resulted in data loss

2) Data-loss due to an a bug on an array

3) Data-loss due to an application bug

4) Data-loss due to failed back-ups

Of these, only the first three are in anyway partially excuseable. Hardware will fail, software inevitably has bugs; we would hope that in general that all scenarios are tested but anyone who has been involved in testing knows that sometimes things get concessioned or sometimes simply missed.

But the last one is pretty much inexcusable, failed back-ups should be caught and fixed before they become a problem. The only acceptable SLA/OLA for a back-up environment is 100%; if you are willing accept that there is a chance that you might loose that data, perhaps you shouldn't be backing it up in the first place.

Posted in: Storage.

8 Comments

Chris M Evans says:

June 26, 2010 at 11:55 am

Martin
Hence the point of replication of multiple copies; all depends on how valuable that data is. If it is the life of your company, 2/3/4 copies is worth the investment.

Reply
Martin G says:

June 26, 2010 at 12:18 pm

And the number of stories I can tell you about messed up replication; disks added to the primary but not to the replicas etc.
Manage, monitor and audit….often the missing processes after implement.

Reply
sanaddict says:

June 26, 2010 at 3:02 pm

Data can be not available, but no justification to lose the data! Completely agree.

Reply
Michael S. says:

June 26, 2010 at 9:50 pm

Agreed! Nothing is “Set it and forget it” (SIFI) and failed backups should always be addressed. Secondly, whatever is making you need to do constant recoveries should be addressed.

Reply
Martin G says:

June 26, 2010 at 10:18 pm

I don’t see anything which talks about constant recoveries but to be honest, in any reasonable sized estate with a reasonable number of users; you expect to be recovering files in some manner on a regular basis.

Reply
Andrew Miller says:

June 27, 2010 at 1:43 am

I’d really be curious to hear the details around the triple disk failure….what storage array, RAID technology, rebuild times, all that.
Not looking to start a bashing session on whichever vendor….but just really curious to know the details (as some vendors do have stuff to make triple disk failure less likely and curious what technologies were in play).

Reply
Dominic Cody says:

June 28, 2010 at 12:22 pm

I have seen the triple disk failure where I currently work, in fact the same Array has single disk failures every week and the vendor can;t explain why. For info the Array is HP XP24000 series Array.

Reply
Digger says:

June 28, 2010 at 5:29 pm

Spill the beans – what was the array with the triple disk failure, what was the array with the bug (and what was the bug), what was the application and what was the back up software?
Not really relevant to your point (the above could happen with any array / software) just curious…

Reply

M	T	W	T	F	S	S
« May
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Just a quick thought…

8 Comments

Leave a Reply Cancel reply

Categories

Blogroll

Google Ads