The Inquirer-Home
Comments
MPI tends to do better

The problem is synchronization of large shared memory multithreaded programs. MPI programmers tend to do better, because MPI queues decentralize contention. It's the same reason communist central planning doesn't work as well as a free market.

posted by : john1p, 10 November 2008 Complain about this comment
Vista?

Don't be stupid... NOTHING runs Vista smoothly, because Vista is such low quality code.

Vista cant thread properly on 2 cores let alone hundreds....

ROFL

posted by : 99flake, 08 August 2008 Complain about this comment
Re: New parallel programming languages already on the horizon

HPCS project began in 2003 ...
and Sun's Frotress dropped from HPCS in Phase III of the program.

See:
http://www.hpcwire.com/features/Suns_Fortress_Language_Parallelism_by_Default.html

I do not see any REAL new parallel programming languages in the horizon.


posted by : Ami, 01 August 2008 Complain about this comment
MPI Today, ??? Tomorrow

Bravo, nice conversation starter!

It has become clear that MPI, while effective and useful, presents a barrier to wider adoption of massively parallel systems. The PGAS languages may fix that: it is clear that something must be done. 

On the other hand, today's commodity interconnects aren't very good at moving small chunks of data (latency and overhead is high) don't perform well for random communication patterns, and frequently aren't scalable with respect to cost and reliability. And none of those problems have anything to do with the processors, except that faster processors place additional stress on the communication fabric, which is already overburdened in many clusters. 

Make no mistake, I like MPI -- the company I work for built a machine around the concept. But we've noticed that much of what you build for effective message passing is really important for the PGAS type programming models as well. (Conversely, if your fabric isn't very good at MPI, it probably won't deliver very good PGAS performance.)

(For more, take a look at http://www.bigNcomputing.org )


posted by : Matt Reilly, 31 July 2008 Complain about this comment
Do we get zero-point energy and cold fusion too?

A multi-petabyte, multi-petaflop shared memory system built from a few thousand COTS grade servers sounds great in *theory*. However, mixing a very large shared memory environment with COTS hardware is akin to using a road flare to peer inside a gas can. Shared memory systems to do not react well spontaneous loss of memory space, something that would happen fairly often with 10,000 COTS grade nodes strung up with IB or another hi-perf interconnect. 

Unless a very robust fault tolerance schema could be designed in (mirrored node address space, redundant MPIO, etc) one would be better off saving several million quid and just beat your head against the wall. Either way the result would be the same, might as well save the money.

posted by : Jeff Johnson, 30 July 2008 Complain about this comment
Wright Your Multi Thread Software....

Its Intresting that so many well written comments poped up on subject. Heres My lesser:

Arrghhh....Great White on drawing Board. My idea is, How about Multi Thread Software, Something Microsoft Research is Conferencing NOW. This is Much more Than Telecommunications Giant, especially name;Battlefield Mobi goes well with potential demands, In fact its more DMV tracker. yet how multi threaded can DMV Program Go?. 
Searching Out Great White Spermer anit easy, especially in Montana, where they'd thunked I stated that horrible term Spammer. Yet, Take this critter to Next level, Glacial & GO figure: Dunnington. Arrghhhh, with ALL Silver of West Glacier on Battlefield Mobi to Look Part of Pirate(HappyFace Pirate,Of Course), Them Deck Hands Be Amused. Perhaps Milatary Satelite System with Navigation & instant info/cross communications, worldwide.That'll Get Cloud Blower, Out There. 
Getting My Ahab Poon in Wineseller & use left side for Video, right side for Screams & Laughter with my final output one BIG Val. Arrghhhh. give me Super Mobi & give that Machine my dic, BET its worth bitty piece of eight.NEXT Battle:

Petaflop of spermers Vs. Petaflop of Nahalem. I, ahab still conquer, Its Dunnington Thing that Has all female Crew scared.

First Know Multi Threaded Commentos!!!

Via CRAY.Thanks Seymour, I Likes Name & I likes It White.
TS drashek

posted by : Sperm_Whaler, 29 July 2008 Complain about this comment
Not that simple

First, optimizing message passing is much easier than shared memory. Then there is the dirty memory issue. Oh and lets see, if a node fails the global memory breaks, the whole house of cards comes tumbling down. Ever calculate the MTBF for a 10,000 of anything?

Second, If you talk to most HPC users about your great idea, you will hear something like the following "Sigh, been there, done that, got the T-shirt, it don't work".

posted by : deadline, 29 July 2008 Complain about this comment
Analysis of... what?

This article is full of random facts, but doesn't seem to actually have a purpose or a conclusion. Did you mean to suggest a technology that would replace MPI?

posted by : Tim, 29 July 2008 Complain about this comment
Read Jack Dongarra's paper

http://www.netlib.org/utk/people/JackDongarra/PAPERS/adv-comp-darpa-08.pdf The conclusion I have come to is that one again Intel is behind the curve. As Jack notes in the above paper Linpack is now 15 years old and obsolete on a functional basis for super computing. HPCC is the current benchmark that measures 5 other benchmarks which have equal importance. Using Linpack alone which has been replaced by Global HPL is somewhat akin to timing a race hose with a sun dial. That is compounded by what appears at this time to be several ISA incompatibilities between Intel''s 64 bit architecture and X10, Fortress, and Chapel. Mike Wolfe gave an excellent paper at SC'06 on the compiler incompatibilities and how Intel specific optimization will cause IEEE754 compliant systems to crash. Error! Flie Reference Not Found. So software for Intel processors lacks transferability to other machines . That is something that DARPA requires. In Los Alamos studies , doubling the CPU or memory speed raises the heat dissipation load by 8. Power consumption of DDR3 is a real issue. Based on this summers power prices IBM's $1/watt is too conservative. Annual costs are headed for $1.5/watt based on the latest Western Area Power Administration price projections through March 2009. June hit $135/mwh at the generator. The last verified by a DOE National Lab number shows 12mflops/watt for Intel, 18mflops/watt for AMD and 100+ for he Blue GeneL/P series. Performance includes peripherals and supports services like AChttp://www.cs.berkeley.edu/~samw/research/papers/ipdps08.pdf. To be credible , it is going to have to demonstrate competitive performance in HPCC not just Linpack, it is going to have to meet the DARPA interoperability standards, and it is going to have to deliver much better power efficiency. That means performance at lower clock speeds. A frequency increase generates an exponential power increase.

posted by : Ed Hinders, 29 July 2008 Complain about this comment
Not so fast ...

Actually, MPI is still far better on "true" yet non-uniform shared memory machines, like SGI Altix or multi-socket Opterons. It really really really helps for the application to actually only allocate stuff in the nearby memory, and not somewhere 100ms down the network switch. Even Opterons work way better if MPI is used and which allocates stuff near this cpu, and not the other ones.

One cannot abstract away where the memory location is while keeping the performance, and so the programmer might as well be doing it him/herself.

posted by : hpc_user, 29 July 2008 Complain about this comment
do you really think so?

yah, like no one ever thought of using networks to virtualize shared memory before. but you seem to have dropped a few decimals in your estimates: memory is around 50ns, but networks are around 20x that. yes, modern networks make net-shared-memory (and ooold idea) more tolerable, but it still means pretty amazingly low performance with whole pages flying around willy-nilly. it's also worth remembering that current interconnects go to great lengths to avoid having to frig the MMU all the time, but net-sh-mem is hardly anything _but_ an MMU frigger.

but really, is your article just a teaser for ScaleMP?

posted by : mark hahn, 29 July 2008 Complain about this comment
Re:could it run Crysis and 3Dmark06 on Vista?

Yes, but not at the same time

;-)

posted by : Pascal Monett, 29 July 2008 Complain about this comment
New parallel programming languages already on the horizon

There already are a bunch of PGAS (partitioned global address space) languages (e.g., co-array Fortran, UPC) that could potentially displace MPI, but vendor support for them has been weak. 

More promising, perhaps, is DARPA's sponsorship of 3 new languages - X10 from IBM, Chapel from Cray, and Fortress from Sun. See http://www.hpcwire.com/features/17883329.html . At least 1 of these could become a new "standard". 

Otherwise, I don't get your "cause and effect" relationships here. MPI didn't popularise Infiniband, and virtual shared-memory doesn't enable faster interconnects. If anything, it's the other way round on both counts.

I wouldn't get too carried away by virtualization here. Memory will still be physically distributed, and data will still have to move across interconnects. HPC programmers will ignore those realities at their peril.

posted by : Enda, 29 July 2008 Complain about this comment
Supercomputing is solving the real problems now!

MPI or other message based models is still very good. But fast emulation of shared memory have not evolved at the same pace.

We are starting to see a pattern where memory will come in a lot of different types. From the tiny internal shared memory in a Cell CPU, to quite slow, but massive flash storage PCIe cards.

This means very complex and fast system designs all over the place.

posted by : V, 29 July 2008 Complain about this comment
hmmm...coincidence

Was just reading about X10 yesterday (IBM never could name things).

Has an even better model. Memory can be shared, but the language specifically distinguishes between local and remote (so that horrendous microsecond wait doesn't have to remove the peta from your flops).

posted by : Richard Henderson, 29 July 2008 Complain about this comment
But will it be able to run Crysis?

And what score will it get in 3Dmark06?

Will it be able to run Vista smoothly too?

A real challenge, could it run Crysis and 3Dmark06 on Vista?

posted by : interested_party, 29 July 2008 Complain about this comment
good

The MPI is a very unintuitive standard and programming model. It would be about time something better was put in it's place.

posted by : Eugen, 29 July 2008 Complain about this comment

Supercomputing moves beyond MPI

aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?