Storagebod Rotating Header Image

Simple Scalability

As more and more organisations are moving into petascale environments; driven by big data, unstructured explosions, day-to-day growth and generally poor data management; the ability to manage at scale is becoming increasing important.

Now from a vendor point of view, there has been a focus on getting to scale but that is less than half the story; if it is hard to implement and manage that scale, hard-pressed data-management teams are going to start looking elsewhere.  Managing at scale needs to be as easy and seamless as managing a single array or filer.

Implementation and expansion needs to be quick and painless; the ability to expand without little effort is a major show-stopper for many scalable implementations. I need to be able to add capacity to my systems to support I/O or data-growth but it needs to be transparent and non-disruptive; it needs to be automatic in its optimisation; quite frankly no-one has the time to re-layout a multi-petabyte environment manually with the almost inevitable disruption that brings.

Petascale computing almost always comes with a 24x7x365 availability requirement; Big Data analysis often involves long-running jobs.

But these huge environments bring other challenges as well; you will have large files, small files, tiny files, spread throughout your systems; access characteristics are different, some will be random, some will be sequential and in some cases, you might find both. Some files will have a single user and some will have hundreds of users. However, the data-management team will want to manage these all in a consistent and seamless manner; yet again, they will want to do this with a minimum amount of intervention.

Let’s think about the impact of a self-service environment where teams can throw up new environments; the data-management team will have little control of the files and type of files that these applications create. The provisioning tool may ask questions, will you produce large files or small files but in an agile environment, the answer given yesterday may not reflect the reality of the code written today.

This all leads us to a key requirement and feature for anyone who wants to sell Petascale data-management and storage tools; ‘Simple Scalability’. Yes, it is important that it is fast but it is equally important that it is simple to support and manage throughout its life-cycle.

Lets not kid ourselves; as we move to petascale and beyond; these environments are going to life-spans which far outstretch those of our current SAN environments because the practical realities of migrating petabytes of data stored in single system being accessed by many services is going drive this.

So the next time you are benchmarketing a system; ask yourself, is really practical or is it just a ‘My Dad is bigger than your Dad’ playground argument?

 


  • https://twitter.com/mastachand Marc Villemade

    Martin,
    [Disclaimer: i work for Scality where simplifying storage management at PetaScale is our motto]

    Thanks for this post. You’re totally right that petabyte scale nowadays most likely involves painful operations. Another big issue is the fact that a lot of the SAN environments are locked on the hardware which also creates a situation that when the hardware goes EOL, a painful migration has to happen. And at that scale, it’s almost practically impossible. In a year, when we won’t be talking PetaScale but ExaScale, it will be even more unmanageable.

    Getting away from Hardware lock-in is pretty much key at this scale. I thought i’d mention that as well.

    Again, Spot on on your post. Thanks for writing it.

    -marc