On the Open Road with FCoE

Intel's announcement of Open FCoE running on their 10 GbE products is yet another manifestation of the 'Infrastructure as Software' trope; Open FCoE has been officially part of the Linux kernel since November and Intel's announcement brings Windows Server into the fold.

This announcement means that you can run iSCSI, NAS and FCoE on a single adapter; big deal I hear you cry especially if you work for one of the traditional HBA vendors; you can already do this using a CNA but Intel's announcement is important because Intel is currently the top-selling vendor of 10 GbE cards and this will allow you to use those cards which may already be shipping in your standard server build to run FCoE if you so wish.

Obviously doing FCoE in software does come at cost and that is increased CPU utilisation but this has not really hampered the iSCSI software stacks. And like iSCSI, I suspect that we will find the predominant deployments of FCoE will be utilising a software implementation and I expect that other 10 GbE adapters will come to be supported by the Open FCoE stack.

One place where I will take issue with many of the commentators is this feeling that we will not have a separate Storage/Network infrastructure; it is extremely likely that we will continue to have some kind of separation between Storage and Network infrastructures; this might be physical or it might be logical. But I do expect that in many cases that we will continue to have 'storage ports', 'backup ports' and 'network ports'; this separation will be due to both the political realities of most Infrastructure teams but also simply because it makes sense. You don't necessarily want your back-up traffic or storage traffic stomping all over your transactional traffic or even each other; yes, QoS can help but it just might be simpler to keep them separate.

Yes Ethernet has pretty much won and Open FCoE makes victory even more certain but don't expect storage networks to go away anytime soon and don't expect FC to go away any time soon; 10 GbE ports are still relatively expensive and I'm not seeing a massive appetite for rip and replace at present. The long march to converged network will be a long march indeed but tools like Open FCoE might shorten it slightly.

I guess the big question for many of us is when will VMware see a software implementation of FCoE?

Posted in: Storage.

4 Comments

Stuart Miniman says:

January 31, 2011 at 2:46 pm

Martin,
I understand that management is the reason why customers separate storage, network and backup ports. The challenge is that servers can’t power enough 10Gb adapters to allow for the continued physical separation of these functions. Regardless of protocol, the number of 10Gb adapters needs to be much less than what previously was deployed with 1Gb Ethernet plus Fibre Channel. Also, the cost of the new generation adapters will continue to be much higher until more multi-port LOM solutions are available.

Martin G says:

January 31, 2011 at 3:32 pm

Most servers/workloads struggle to flood fibre channel cards but it still does not prevent us having multiple storage network adapters on the same server; tape for example. I’m not sure it is that different with 10 GbE but I agree that we will a certain amount of network convergence but not the amount many are predicting.

InsaneGeek says:

January 31, 2011 at 3:43 pm

I so agree with your post on this. I have vendors constantly making the pitch about how we can save so much money by combining the data & access networks together. My experience over the years makes me want to run screaming in terror. My FC environment is the most stable, solid environment; it never has an issue. It’s not that way because of any magical FC properties, but because it’s such a simple environment. My next stable environment is the private networks we use for iscsi and for Oracle RAC interconnects. Again they never get touched, never thought about, they just work; there are 2x switches there and they aren’t uplinked to the rest of the world (outside of mgmt).
Now the regular IP ethernet network, that has constant “oddness” occurring, it’s extremely dynamic and variable. In one location with have probably 20-30 VLAN’s, multiple load balancers, firewalls, trunks, channels, etc. Not because ethernet is in itself forcing complexity upon us, but because of our internal needs it is; and I suspect that most ether networks of any size tend to be complex. This tends to be where the rubber hits the road on badness.
I have some very recent examples of this: a network port is configured as a truck port (containing lots of VLAN’s), what would happen if someone puts a server in that port but doesn’t have an OS on it? System powers up, tries to PXE boot, timesout and reboots itself repeating the cycle over and over. Well what happened was a constant spanning tree event every 80 sec or so, preventing anybody from talking on *all* the VLAN’s (trunk port was participating in spanning tree). Another was when some new Cisco 10Gig DFC cards would start dropping MAC addresses from their cam tables. So randomly the switch would flip all the ports to flooding in the network. You could actually run tcpdump and catch unicast traffic turning our very expensive switches into network hubs causing massive packet loss because of network port overruns. We had Cisco case open for over a month trying to figure it out, eventually finding a VLAN with a lower bridge priority (don’t know why it only came into affect when we added the new DFC line cards in).
These two examples generally can be handled by applications, but filesystems don’t like this. Delay a filesystem write for a while and see if it freaks out. Take hundreds of servers (maybe throw in virtualization and jump it up to thousands?) and think about what kind of a fun day you’d have trying to get all of them to back and happy after their filesystems have shutdown or flipped to readonly mode because they couldn’t write to their filesystem for 30sec while rapid spanning tree kicked in.
I will add that having a “A” & “B” network with full physical separation and load balancing software like most FC implementations have would help reduce this issue, but network people aren’t doing that configuration (as a standard). They are bringing in redundant network connections from different switches, but they are physically connected together and the host runs lacp or active/passive connections and the host shares an ip for both connections. For full physical ether separation to ride through the problems I had above you’d need to have separate host ip’s and networks and some scsi layer load balancing software. None of the FCOE vendors are pitching this and the network people aren’t liking the idea of physically separate identical network infrastructure. When this becomes the standard for infrastructure then I’ll be willing to share block storage with general traffic on an ether network link.

Andrew Fidel says:

January 31, 2011 at 8:13 pm

FC is not immune to such incidents, I’ve had buffer credit exhaustion issues, FLOGI storms, etc on FC that all caused fabric disruptions. As you point out one difference is that most FC networks have two complete separated fabrics so such issues on one fabric won’t usually cause such issues on the other.

M	T	W	T	F	S	S
« May
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

On the Open Road with FCoE

4 Comments

Leave a Reply Cancel reply

Categories

Blogroll

Google Ads