Supercomputing clusters need room to talk
16 Nov 2007 | 15:18 GMT
Breaking the Gigglebit Ethernet bottleneck
YOU HAVE TWO ways of dealing with large computational problems. One, the classical approach since ENIAC, was to make a single super system, as big as the technology and/or budget allow, and be done with it. A second option, in fashion for the past decade, is to use dozens or hundreds, and now thousands, of generic servers, sometimes specially optimised in configuration, put together in a huge cluster.
The cluster approach has the advantage of being overall cheaper, easier to service and less risky. Downtime losses as occasional node failures can be tolerated in a large cluster with spare resources. Also, clusters are perfect for centralised "throughput supercomputing" in large HPC datacentres. Such institutions where hundreds of users submit many small jobs to run in parallel on a single, managed resource. Each job may be serial or multi-threaded, spread across one or more systems.
However, if you want to run just a few very large problems on such a cluster, and many threads within such problems have a nasty habit of yakkety-yakking between each other, you got a problem - generic interconnects like Gigabit Ethernet become a bottleneck.
Even its faster and way dearer cousin, 10 Gigabit Ethernet, still has a latency problem - the differential between inter-node and intra-node latencies is almost like between hard disks and RAM.
Infiniband, in its various speed (SDR, DDR, QDR) and controller incarnations, is faster, but still far from the required performance levels. UK-designed Quadrics QsNet is among the fastest and has a 'virtual shared remote memory' approach in hardware, but interested users have been awaiting the next iteration, QsNet III.
At this year's SuperComputing 2007 conference in Reno, Nevada, Quadrics said QsNet III, is finally expected this coming year. Its 64-bit network processor creates so much throughput over its two channels to require PCI-E x16 (v2 at that) connection, and it has seven 64-bit CPU cores on its chip - just like the Cell CPU. This delivers more than message passing between nodes, but virtual global shared memory between them, and a bit more real speed.
Sun Microsystems showed off the world's largest InfiniBand interconnect switch, the DataCenter Switch 3456, where the number represents total ports on this gigantic switch unit - imagine 3,456 computers connected at once, all able to communicate simultaneously. Usually, you'd need a fat tree network of a three dozen smaller switches to get this done.
Finally, the first 40Gigabit Ethernet solutions are shyly appearing, even though 10Gigabit Ethernet still didn't get out of standardisation doldrums between CX4, Cat7 and optical cabling choices. By that, we mean failure of 10GE appearance on any server mainboards, for instance. Mobo vendors may be unwilling to make a wrong connector choice, or saddle themselves with multi-connector daughtercards. And, that latency problem is still there too.
On the other hand, through the efforts of chip vendors like Fujitsu and Fulcrum, 10GE switches have broken the 300 US$ per port barrier, becoming affordable for corporate consolidation environments too - more on this as we look at some specific products soon. µ
© 2007 Incisive Media Investments Ltd. 2007