CONNECTING tens, hundreds or even thousands of compute server nodes together into clusters for high performance computing (HPC) purposes always had the interconnects between nodes as one of its biggest challenges.
Providing maximum bandwidth between all nodes - both node to node and also all to all nodes at the same time - is already a big enough challenge, requiring very high speed switches or exotic 3D or 4D torus, hypercube and such topologies. But then, equally importantly - and unlike in a typical commercial datacentre - the latency between nodes is just as important, since many HPC apps have some amount of dependency between the threads running on different nodes.
And then, each interconnect must have as much of its own intelligence and protocol handling capability as possible, so as not to disturb the CPUs that, no matter how many of them there are, are always expected to run at close to 100 per cent busy in this kind of system. Finally, all of that has to be supported by the applications, where the user can't recompile the code himself, and be affordable - all of which is not easy to achieve.
There are quite a few proprietary interconnects with varying levels of performance. HPC users will be familiar with Myrinet from Myricom, which was the mainstay of cluster connections in the early part of the past decade, Qsnet from Quadrics right there in Bristol, which offered the highest bandwidth and lowest latency with native shared memory capability, thus avoiding the need for message passing, and the Dolphin - now Numascale - switchless torus connect, with some of the protocol capabilities similar to Qsnet.
However, even though they may be faster or more feature rich, these interconnects have been pushed aside by 10 Gigabit Ethernet - 10Gb Ethernet or 10GE for short - as well as Infiniband, or IB. 10GE is obviously a faster version of the existing Ethernet, preserving critical full application compatibility through the TCP/IP stack while providing higher bandwidth, but not really lowering latency.
Infiniband, originally created by Intel, has become a quasi standard supported by a group of network product vendors and provides very high bandwidth, up to 40Gbps in each direction in the QDR version. Its latency is also much lower than that of Ethernet, going even below 2 microseconds for remote message send on the fastest adapter cum switch combinations, or some 10 times better than the usual 10GE before the latter's 'acceleration'. IB has very decent application support in high performance computing these days, however its protocol stack is fattened by its envisioned need to act as a common fabric for everything from storage access to networking and clustering, which naturally increases CPU load and latency.
So, if you really want a common single interconnect architecture for your datacentre or supercomputer, 10GE might make more sense, since all applications you might ever think of run on it anyway. Well, if it's to happen in a supercomputer or financial data system, or even in a large database facility, the latency problem has to be solved, not unlike in the equally latency bound massive multiplayer gaming configuration, just on a vastly different scale.
Among the vendors coming in, Intel was there as well with Neteffect. It put together the Iwarp Neteffect PCIe x4 adapter with local processing power and remote direct memoray access (RDMA )capability, basically providing direct app to app communication, bypassing the OS and a lot of TCP/IP protocol baggage, too. The latest adapter, discussed at IDF 2010, provides latency comparable to mainstream Infiniband at the scale of a few microseconds, but with all the standard environments and software running, no questions asked. And, 10Gbps peak bandwidth itself seems to be good enough per node for most current high performance uses.
Now, Intel, the creator of Infiniband, has seemingly distanced itself from its own 'child' over the past few years. And promoting these improved 10GE adapters will of course take a chunk of the IB market. 10GE, no matter how much accelerated, will always be 'a little slower' than IB, so why support it instead of IB? Well, up to now Intel didn't elect to support interconnects better than IB such as QsNet or NumaScale either, so what's the surprise?
There seems to be a hint of a plan in all this, and you have to look no further than Intel's competitor AMD and its Hypertransport Consortium. Last year it announced the High Node Count Hypertransport specification, together with a new cabling scheme that enables Hypertransport CPU links to be brought out directly to other nodes at near full bandwidth using Infiniband physical cabling, but not the Infiniband interconnect protocol or controllers. This, of course, enables even faster, ultra low latency, very scalable direct interconnection into large NUMA-like shared memory systems, without all the headaches of a message passing only approach.
With AMD doing that, Intel already has Optical quick path interconnect (QPI) in the works, and the QPI 2 update, expected for the Sandy Bridge CPU generation early next year, should allow for external QPI linking not dissimilar to the updated Hypertransport capability.
So, assuming things happen as expected, Infiniband will, at that point, start getting squeezed between a rock and a hard place. From the top, the much faster, more direct and more elegant - without the protocol overhead - direct QPI and HTX connections linking server nodes into near seamless shared memory machines just like LEGO bricks, and quite possibly even running one OS together if needed. And, from the bottom, a little slower, but equal or cheaper priced and guaranteed 100 per cent compatible with applications, 10Gb Ethernet and its successors.
Infiniband could still fight a price for performance battle against both approaches, but it seems to me that, with adapters like Iwarp Neteffect, Intel has taken its position firmly at the 10Gb Ethernet side. µ
Or so says the rumour mill ...
Hello, feeling lucky? Sorry. What's your emergency?
Arrives just days after firm slams Android security as 'lacking' compared to BB10