Google for the Infrastructure

I've been thinking about FAST and especially FAST v2 but not entirely from a storage point of view. FAST v2 and indeed any automated storage tiering product has some interesting uses beyond storage and could be a basis for a whole new way of managing IT as a service. In fact, it finally enables storage and beyond to managed as a service. BTW I'm going to use FAST as shorthand for any automated storage product; so please don't take this as only being about EMC.

In order for FAST to work, it needs to gather and react to a lot of information from the array itself. In fact for FAST to be truly useful, it needs to gather, react and store alot of information about what is going on the array.

Take a typical corporate accounting application; most of the time it can be pretty quiet and non-performance intensive but at certain times of the year, it will be a very intensive workload. During these times, you might want it all to be on the fastest, most performant tier; now FAST will react to a sudden increase in workload and move the application when it sees the demand increase but will FAST be able to move this quickly enough? So perhaps, we need to give the array some hints as to when to prime the load?

These sort of peaks are very predictable and we know when they will happen but not all peaks are quite as predictable; or at least we don't think they are. FAST will be gathering stats all the time and by analysing this data; it might be able to do the predictive analysis a lot quicker and spot things that we can't or at least don't have the time for. It may pick up on relationships between applications, application X runs hot at a certain time which causes application Y to become busy at some period later; for example, certain types of activity may cause a reporting job to be run at a later date.

You see from our storage infrastructure, we can start to gather a lot of information about our whole estate. But EMC could go further, they have things like nLayers and Smarts to leverage; they could start to pull information from VMware and do a whole lot of analysis on this. NetApp have SanScreen; HP have a zillion tools as do IBM.

Once you've got that information, you need to start turning that into something the business understands so that you can sit with the business and do what-if modelling, show conflicts and clashes where multiple services are demanding the same high-performance infrastructure at the same time. Perhaps the business owner needs to prioritise or purchase more infrastructure. Perhaps they need less, perhaps they can shift some stuff into the Public Cloud and just pull it back when they need too.

So FAST could be rather more than just a way optimising your storage infrastructure; if you data-mine this in the same way Google data-mine statistics, you can find out a lot of stuff which you didn't realise and probably completely change the way you look at your infrastructure.

So when EMC talk about FAST being a foundational technology, they aren't wrong…actually, like Virtual Provisioning, it is so important….it should be Free! Actually they could fund this by getting rid of half their account managers; FAST could literally sell itself.

2 Comments

Len Rosenthal says:

December 8, 2009 at 8:54 pm

The key to intelligent tiering and overall better utilization of storage and SAN infrastructure is monitoring and measuring actual performance and usage on a real-time basis. FAST is a good start, but, as you correctly point out, it currently only looks at the arrays. Tools like VirtualWisdom from Virtual Instruments (www.virtualinstruments.com) provide real-time and historical performance (READ/WRITE latency across the SAN) and detailed usage info by looking at across infrastructure domains including: application, VM, server, HBA. FC switch and LUN. A more comprehensive approach will lead to better tiering and utilization decisions and lower costs.

the storage anarchist says:

December 9, 2009 at 3:00 pm

Len –
While indeed valuable to have an end-to-end view, I’ve yet to see a “comprehensive view” that can change the physical storage and/or cache allocations applied by ANY storage array. Such dynamic relocation of data can’t easily be performed from outside the array, because the “outside” view is merely a list of virtual LBAs that bear no identity of their physical location.
The best I can imagine is that applications (and tools such as you describe) could provide hints about observed issues and/or the indended (future) use of specific LBA ranges. FAST (and implementations like it) might then use these hints to prestage data onto appropriate tiers in advance of demand.
But as is evidenced by Symmetrix DRAM caching algorithms (only as an example), detailed application knowledge is not necessary to acheive high hit rates – much (and I mean VERY much) can be derived from the arriving I/O patterns and historical observations.

M	T	W	T	F	S	S
« May
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Google for the Infrastructure

2 Comments

Leave a Reply Cancel reply

Categories

Blogroll

Google Ads