Storagebod Rotating Header Image

Seeding the Media Cloud – Part 2

In Part 1; I described GPFS, focussing on it's ability to massively scale but scale alone doesn't make a Cloud; whatever anyone would have you believe.

GPFS has two features which could allow it to become almost Cloud like. 

Firstly the move to Linux allows it to be based on commodity hardware. GPFS actually doesn't care what it's back-end disk is; it could be direct attached SATA, it could be V-MAX attached via a SAN, it could be iSCSI or it could be EFDs; it really doesn't care. As long as it appears as a block-device, it should be able to use it.

As for network, as long as it's IP; as far as I can tell, it doesn't care either. The faster the better obviously!!

So you can scale-out on fairly cheap hardware.

But it is GPFS's ability to move data around for you which could enable the most Cloud-like attributes.

GPFS like other IBM storage products supports the concept of storage pools. A Storage Pool is simply a group of storage devices, likely with similar characteristics; performance and reliability come to mind. A file-system must consist of a least one storage pool but may be made up of up to eight storage pools including the ability to define a storage pool external to GPFS such as TSM to allow data to be migrated to tape.

There is also the concept of a fileset which is basically a sub-tree of the GPFS global namespace; it allows administrative operations to be granted on a portion of a the filesystem and each fileset has it's own root directory and all files belonging to the fileset are only accessible via the root directory. Please note, this does not allow secure multi-tenancy but conceivably they could become the foundation of a secure multi-tenancy capability.  

GPFS includes within it a policy engine which allows it to automatically manage file data using a set of rules. This allows you to control the initial placement of files at creation time but also move these files over time. 

Placement rules are obviously run when a file is created.

Migration rules are run from the command line on demand but normally from a job scheduler like cron.

The rules are written using an SQL like language and can act on a number of attributes depending whether it is a placement rule or a migration rule; placement rules can work on user, group, fileset/sub-directory or filename. 

So you could do something like

Rule 'mp3' SET POOL 'SATA' WHERE UPPER(NAME) LIKE '%MP3'

Rule 'db' SET POOL 'FC' FOR FILESET ('database')

Rule 'sox' SET POOL 'Encrypted-disk' REPLICATE (2) FOR FILESET ('finance')

Migration rules run on things like Last Modified Time, Last Accessed Time, File Size as well as all of the attributes available placement rules.

So you can do things like

Rule 'logfiles' MIGRATE TO POOL 'sata' WHERE UPPER(name) LIKE '%LOG'

Rule 'core' DELETE WHERE UPPER(name) LIKE 'CORE'

You can also carry out actions based on a date; so you could move all the end of year reporting files onto your fastest disk before they were needed and then back off again the next week.

IBM in their typical fashion call this ILM and hence everyone ignores it because ILM is deeply unsexy. But they have the basis of a simple but powerful policy engine. And yet they don't tell anyone about it.

So do IBM have a storage cloud? No, not yet but they are close. The policy engine is simply not powerful enough yet but it could evolve into something and it's been around for long enough that you should be able to trust your data to it. 

And it does have some other nice features, like replication, snapshots, multi-clusters, GPFS over a WAN. I just wish IBM would make it easier for you guys to play with it but if you've got an IBM account manager, hassle them for an evaluation copy. 

If they try to tell you how complex it is…they're living in the past. It's much easier than it used to be and you can build yourself a small virtual cluster, quickly and easily in the virtualisation environment of your choice. I'd suggest building on top of Centos 5 but just remember to edit the right files to get it to pretend to be proper RedHat. 

As I rediscover this well-hidden IBM product, I'll be sure to share my findings.


One Comment

  1. Rob says:

    “And it does have some other nice features, like replication, snapshots, multi-clusters, GPFS over a WAN. I just wish IBM would make it easier for you guys to play with it but if you’ve got an IBM account manager, hassle them for an evaluation copy.”
    To avoid or fix hot-spotting, mmrestripefs with -b
    http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.gpfs32.basicadm.doc/bl1adm_mmrestr.html
    -b
    Rebalances all files across all disks that are not suspended, even if they are stopped. Although blocks are allocated on a stopped disk, they are not written to a stopped disk, nor are reads allowed from a stopped disk, until that disk is started and replicated data is copied onto it. The mmrestripefs command rebalances and restripes the file system. Use this option to rebalance the file system after adding, changing, or deleting disks in a file system.

Leave a Reply

Your email address will not be published. Required fields are marked *