Storagebod Rotating Header Image

Failure is an Option…

I really like this comment by Ian on his blog as part of his questions/response to Val; this displays a different approach to engineering a service architecture and instead of focusing on the components; you focus on the service and how you make the service available.

"I see nothing relating to technology more aligned to a 'recovery
orientated architecture' where 'tin/software' failure is expected, and
as such component availability as a requirement is reduced in priority
(in favour of price point) and the architecture of the service deals
automatically with the regular failure of components."

Perhaps part of the Cloud is like routing for services and we simply design services that route round failure. I'm not sure exactly where this takes us but it's an interesting and powerful idea. 


  1. Martin,
    This notion of service availability decoupled from hardware and even application instance availability is something that I believe in.
    I’ve been noodling about service availability for some time and it’ impact on storage. I believe that the net effect will be to dramatically transform storge infrastructure.
    The specific post on services can be found here…
    The summary of the DAS disruption can be found here:
    And the impact on storage

  2. Bas Raayman says:

    Actually, this idea is already being implemented on a bare metal level, isn’t it? If you check out hardware RAID on the CISC. It’s all based on the conception of allowing a service to continue, we have that on a hardware based level, and had that for years.
    In my opinion, in a “cloud” environment we just need people to pick up on that idea, but it’s not that new if you ask me.

  3. Martin G says:

    There are very few new ideas; for example, for many years we have been building web-servers which scale horizontally and in the event of failure, just ignore the one that failed; retry the transaction and off you go.
    But can you design entire services and architectures which instead of having complex failover requirements actually tolerate failures by routing round them? Can I build fault tolerance in a different level?

Leave a Reply

Your email address will not be published. Required fields are marked *