Mon 12 May 2008

RSS Feed

Edited by Paul Hales

Published by Incisive Media Investments Ltd.

Terms and Conditions of use.

To advertise in Europe e-mail here

To advertise in Asia email here.

To advertise in North America email here.

Join the INQbot Mail List for a weekly guide to our news stories:

Subscribe

Supercomputing clusters need room to talk

Breaking the Gigglebit Ethernet bottleneck

YOU HAVE TWO ways of dealing with large computational problems. One, the classical approach since ENIAC, was to make a single super system, as big as the technology and/or budget allow, and be done with it. A second option, in fashion for the past decade, is to use dozens or hundreds, and now thousands, of generic servers, sometimes specially optimised in configuration, put together in a huge cluster.

The cluster approach has the advantage of being overall cheaper, easier to service and less risky. Downtime losses as occasional node failures can be tolerated in a large cluster with spare resources. Also, clusters are perfect for centralised "throughput supercomputing" in large HPC datacentres. Such institutions where hundreds of users submit many small jobs to run in parallel on a single, managed resource. Each job may be serial or multi-threaded, spread across one or more systems.

However, if you want to run just a few very large problems on such a cluster, and many threads within such problems have a nasty habit of yakkety-yakking between each other, you got a problem - generic interconnects like Gigabit Ethernet become a bottleneck.

Even its faster and way dearer cousin, 10 Gigabit Ethernet, still has a latency problem - the differential between inter-node and intra-node latencies is almost like between hard disks and RAM.

Infiniband, in its various speed (SDR, DDR, QDR) and controller incarnations, is faster, but still far from the required performance levels. UK-designed Quadrics QsNet is among the fastest and has a 'virtual shared remote memory' approach in hardware, but interested users have been awaiting the next iteration, QsNet III.

At this year's SuperComputing 2007 conference in Reno, Nevada, Quadrics said QsNet III, is finally expected this coming year. Its 64-bit network processor creates so much throughput over its two channels to require PCI-E x16 (v2 at that) connection, and it has seven 64-bit CPU cores on its chip - just like the Cell CPU. This delivers more than message passing between nodes, but virtual global shared memory between them, and a bit more real speed.

Sun Microsystems showed off the world's largest InfiniBand interconnect switch, the DataCenter Switch 3456, where the number represents total ports on this gigantic switch unit - imagine 3,456 computers connected at once, all able to communicate simultaneously. Usually, you'd need a fat tree network of a three dozen smaller switches to get this done.

Finally, the first 40Gigabit Ethernet solutions are shyly appearing, even though 10Gigabit Ethernet still didn't get out of standardisation doldrums between CX4, Cat7 and optical cabling choices. By that, we mean failure of 10GE appearance on any server mainboards, for instance. Mobo vendors may be unwilling to make a wrong connector choice, or saddle themselves with multi-connector daughtercards. And, that latency problem is still there too.

On the other hand, through the efforts of chip vendors like Fujitsu and Fulcrum, 10GE switches have broken the 300 US$ per port barrier, becoming affordable for corporate consolidation environments too - more on this as we look at some specific products soon. µ

Comments

still not good

Hi!

No Matter what throughput a cluster uses, it is still not going to be enough.

Just wait until HT3 slots, and QuickPath slots appear on mobos.

They will be MUCH faster, and - there is no need to compare the latency when we are talking direct memory access....

So when these slots appear on mobos, THAT will be the time when clusters will see a performance jump.

The only thing i am still wondering about is when will the chip companies realize, that they will need to integrate lots of these on the chip itself........

Like about 9 for a 8xxx opteron or
5 for the upcoming 4S Nehalem chip....

bei..

Les
posted by : Laszlo Balogh, 16 November 2007

HTX?

I thought that several server motherboard manufacturers already included HTX slots on the board and I know of at least one HTX to Infiniband producer as well.

I would think that combo would far outstrip what 10GE is capable of in a clustered environment.
posted by : Dave, 16 November 2007

Very Cool

Combine this story with the release of Corning Optics bendable glass fiber cable and the
micro-Beuwolf cluster idea.

The uWolf clusters yield a GigaFlop and cost around $2000 USD.
Household Supercomputing seems a reality.
CAD ,CAE and animation have just become
much more plausible for the average joe.
posted by : Idgaf, 17 November 2007
IThound
Search for solutions, reports & analysis

Newsletter signup