The Inquirer-Home

Taiwanese supercomputing becomes realité, verité

Computex 2005 post mortem
Wed Jun 08 2005, 10:04
OVER THE YEARS, the most developed Chinese "province", also known as Taiwan ROC, had its fingers in nearly every type of computer hardware, from palmtops to servers. With this Computex, it has set its sights for the first time on the last frontier: supercomputers - on a small scale, through the back door - but yes, the capability is there.

The starting point was at the Iwill booth - a high-end server and workstation board vendor with checkered history of deliveries, but generally a gung-ho attitude towards new product types (like the first to announce a customised 1U quad-socket Opteron board, or an eight socket modular 5U Opteron server) and growing strength in pushing those new products forward.

That last product, a 8-socket/16-way (if using dual-core CPUs) server is interesting for few reasons - first, it uses multichannel HTX-Pro (Direct Hypertransport daughtercard link) for all its I/O - so, today's card may have say four PCI-X and two PCI-E x16 slots, and then be exchanged for one tomorrow that may have two HTX and a mix of PCI-X & PCI-E slots.

So what? Well, a few things come to mind here... on second thoughts, that monster can pack quite a bit of wallop in there in each box - 128GB of DDR400 registered ECC RAM and, if using the upcoming 2.4 GHz dual-core Opterons, you have 76.8GFLOPs in that box, too.

Then comes the I/O - what to do with so many parallel PCI-X channels even in the current box version? There is one good use - interconnect many of these boxes together - tightly, over parallel channels.

What for? Just to pass messages to each other? Not necessarily - what about global shared virtual memory across the whole cluster of boxen, so that, even though each box runs its own OS copy (to enable standard OSs and no headaches trying to make single system image across), the application compiled for 'shmem' can see one large virtual memory space for itself across the whole cluster, without partitioning.

Four channels of Quadrics QsNet II, each on its own PCI-X lane with dedicated bandwidth, will then connect up to 4,096 of these boxes together into such a global shared memory computer - with latencies of just around microsecond, and over three gigabytes per second of sustained bandwidth, many CPUs can simultaneously communicate to many others across a shared virtual memory space with a network controller that does all the comms in hardware without bothering the CPUs, and yet also have a parallel file system to push same amount of data per second from ultrawide RAID arrays composed of standard SATA or SCSI disks using (again standard) Taiwan-made controllers.

OK yes, QsNet is a product from Old Blighty itself - a descendant of the Transputer legacy, but all the rest, including racks, is from the other beautiful island - Taiwan. So, with four simple slim eight port QsNet II switches (one for each channel), and eight of these Iwill boxen in a local Taiwanese rack, you have got yourself a 64-socket, 128-core, or 256-core once the quad-core Opterons are out next year, 64-bit supercomputer with half a teraflop of computing power and a full terabyte of memory space, seen as a single pool by the shmem-compiled code! A well-balanced solution for both CPU and memory-bound codes, for all flavours of message-passing and shared-memory and SMP applications, both within each node and across all the nodes. All possibly in one rack (if deep & high enough), and at a per-CPU cost not more than a normal server...

What are the next steps? Well, whether or not the good ship Itanic really runs aground for good, a possibility now not to be entirely dismissed, at least not for us who've seen how Alpha was murdered - no, buried alive, Hypertransport has an increasing chance of becoming the industry standard chipset-level interconnect (something along the lines of low-latency full CPU bandwidth, direct-connect, VESA local-bus from the i486 days of yore). And Hypertransport might just be the thing to give this island the final step to push into the highest-end computing heavens.

So, with the next-generation high-speed, low-latency cluster interconnect controllers becoming mainboard-ready, some of them might opt to use Hypertransport - the PathScale Infinipath running on a HTX slot, shown on the same Iwill booth, is a good example of the speedup HT gives, even though Infiniband is not exactly a supercomputer stuff, more like a storage interconnect still. Imagine what that will do for a high-performance shared-memory network then?

Or, more precisely, on-board native multi-channel shared-memory links a la QsNet, using HyperTransport on future 8-socket Opteron nodes, providing teraflop & terabyte class, simple yet well-balanced, single-rack supercomputers from inexpensive standard components, with standardised power supply, cooling and console control. How's that for high-end commodity market? µ

Share this:

Comments

There are no comments submitted yet. Do you have an interesting opinion? Then be the first to post a comment.

aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?