The Inquirer-Home

AMD turns up the heat on Nvidia's GPGPUs

Analysis The master is about to face an old pretender
Tue Aug 17 2010, 16:11

GRAPHICS CARDS are no longer just graphics cards thanks to Nvidia, but the firm that brought graphics chips to the server room is for the first time about to face some serious competition.

In the past five years we here at The INQUIRER have called Nvidia many things, however the accolade of high performance computing (HPC) innovator is also applicable. The company's focus on producing general purpose graphics processing units (GPGPUs) has lowered the cost barrier to HPC, allowing small companies, researchers and even hobbyists access to serious computing power.

So it seemed Nvidia dropped a bit of a clanger when it revealed that the number of cores on its Tesla board would decrease and the thermal design power (TDP) would be higher than first reported. That was followed by The INQUIRER revealing that staunch Nvidia supporter, Silicon Graphics International (SGI), was going to offer another vendor's GPGPU accelerator boards. After this, it became obvious that Nvidia had finally come up against competition.

The reason for Nvidia's dominance of the GPGPU accelerator market wasn't by chance or even due to the firm's own actions. The truth is, AMD simply didn't take using GPUs for HPC seriously. Perhaps it thought that its Opteron chips could cut the mustard or maybe it was just a lack of vision, but either way it let Nvidia take the HPC lead. Now it seems that both firms agree, however, that GPGPUs combined with standard x86 CPUs are the only way to enable exa-scale computing.

The soap opera running alongside GPGPU development has been Nvidia's insistence to publically go after Intel. Speaking to Nvidia, it's blatantly obvious that the firm needs Intel more than Intel needs the GPU designer. According to Nvidia's Tesla product line manager Sumit Gupta, all the firm wants to do is "get people to use the GPU". The only problem with that is that a CPU is required, as Gupta readily admits.

In Nvidia's recent press slides, it uses Tesla boards paired with Intel Xeon chips to demonstrate the performance gains of a CPU/GPU combination. So the question is, why bother attacking the devil, if you have to dance with it? Of course Nvidia could promote AMD's CPUs instead of Intel's but we're not sure even global warming can stop hell from freezing over before that will happen, after AMD bought ATI.

Nvidia's spat with Intel is an amusing sideshow at best. The more immediate problem is that at long last AMD is taking GPGPU computing seriously. For Nvidia, a company that has bet the farm on a chip that was geared towards GPGPU right from the start, it is clearly worrying that the stigma of low performance per Watt has been attached to its Fermi architecture.

Being fair to Nvidia, it does perform very well in the Green 500, a list that uses figures from the Top 500 list to calculate MFLOPS/Watt. The fourth place ranking of the Dawning Nebulae cluster is impressive, while the 57 per cent jump in performance per Watt between the Nvidia Tesla cluster and the three top ranked IBM Cell clusters is easily explained, according to Gupta. "It's all down to the size of the cluster, in bigger clusters the interconnects consume considerable power."

That explanation might seem a bit too simple, but there are publically available figures to back up Gupta's claim. The Top 500 states that the 'greenest' supercomputer, QPACE SFB TR Cluster comprises 4,608 cores, while the Dawning Nebulae has an astonishing 120,640 cores which breaks down to 4,640 Nvidia GPGPUs each mated with two hexa-core Intel X5650 2.66 GHz 'Westmere' chips. To highlight the potential of GPGPUs, the Nvidia cluster posted just over 492 MFLOPS/Watt, nearly 100 more than the top placed Xeon only cluster. 

So what about the heat? It's a case of matching the best of the worst. AMD's top end Firestream 9370 has a 225W TDP that Nvidia, after a little goading from The INQUIRER, said was the correct TDP of its top end Tesla M2070 board. Initially, as we reported, it had declared that the TDP of the Tesla M2070 was 247W, a figure it has since corrected.

The biggest problem for Nvidia is that AMD is able to offer a 150W TDP single slot board in the shape of the Firestream 9350. While it might not win any benchmarks outright, it does require significantly less power which should make it viable in a wide array of situations. Nvidia has told us that it doesn't have a similar board at this time, though it sees its Quadro line as a halfway house between consumer Geforce cards and full blown Tesla boards.

As for reasons why Tesla boards have such a perceived high power draw, one aspect could be the deployment of ECC memory. Gupta is adamant that ECC is "vital for acceptance in HPC" while AMD's director of stream computing Patricia Harrell says it's something AMD simply hasn't needed.

According to Harrell, the need for ECC is mitigated by testing done in AMD's labs prior to shipping boards but equally as important, she claims that should AMD incorporate ECC support it would "lose performance per watt benefit". Harrell adds that it is a "reasonable assumption" that enabling ECC results in a higher power draw, a claim that is borne out by looking at published research papers. Meanwhile Nvidia claims that ECC is not only vital but has "negligible impact" on power usage.

When the latest Top 500 list appeared, it was the Nvidia cluster that stole the headlines. Not just because it signalled the dawn of GPGPUs in HPC but the performance per Watt compared to the number one cluster, Jaguar, was tremendous. GPGPUs have arrived and even AMD squeezed in on another Chinese cluster, Tianhe-1, which uses ATI Radeon HD 4870 cards. That seemingly has gotten Nvidia a bit hot under the collar.

At times it was hard not to miss the sheer disdain in Gupta's voice when he was talking about AMD. The passion in his words was palpable and it was as if Gupta felt offended that the hard work he and his team did was not replicated by AMD. More than once Gupta referred to AMD as a company that has made "zero investment in GPGPUs".

The reason for this was simple, said Gupta. "GPGPUs are at the lowest priority" because AMD is "compelled to sell CPUs". Gupta continued his attack on AMD by saying that the firm is "completely torn internally" between selling its old cash cow, the x86 CPU, and the future of HPC, GPGPUs.

Not surprisingly, AMD's Harrell flatly denied this claim of internal strife, saying that the chip designer is "supportive of GPGPUs". She deftly batted away Gupta's point about attachment to the x86 architecture by saying that such an argument is "typical for a firm without an x86 business".

Harrell echoed Gupta's view that GPGPUs are "critical for success" in HPC and that AMD does not see GPGPUs as a replacement for its Opteron CPUs. On the subject of internal conflict, Harrell said that recently AMD's x86 server chip division merged with its GPGPU division, and she maintained that it, like Nvidia, sees the need for the two architectures to co-exist.

While Gupta's claim of AMD's 'zero investment' in GPGPU design is clearly an exaggeration, there is something to be said for AMD's tentative steps into the market. For independent observers it is obvious that greater competition in the market will not only increase innovation but will also result in standards for both hardware and software being set sooner. Even Harrell admits that industry standards are not moving fast enough, but the battle is not over raw chip speed but rather the development environment and specifically the language itself.

AMD is betting the server farm on OpenCL, an open language that according to Gupta is missing key functionality. Gupta points to OpenCL as a language that has been "over hyped by AMD" and is bereft of features such as recursion and pointers. These, among other things said Gupta, are barriers to the adoption of OpenCL in HPC. But Harrell denied that AMD's support for OpenCL is hurting the firm, and said that rather its higher level, cross platform functionality has proven popular among its clients. As a foil to Gupta's earlier zero investment claim, Harrell said that AMD is "investing heavily in making OpenCL succeed".

To Nvidia it is seemingly a source of annoyance that AMD is trying to paint itself firmly in the OpenCL camp, and Gupta said that AMD has "no credible OpenCL strategy". He went even further by stating outright that "they [AMD] don't support OpenCL" claiming that there are "no production OpenCL drivers from AMD". Harrell retorted by pointing to AMD's developer site. However Nvidia clarified its point by saying, "Nvidia has the only conformant, publically available, production OpenCL GPU drivers." It claims that while AMD's drivers are conformant, it does not include them within the standard driver download.

It would be easy to paint Nvidia and Gupta as Green Goblins in trying hard to undermine OpenCL but Gupta openly admitted that he doesn't care which language succeeds, whether it be Nvidia's own 'closed' CUDA or OpenCL. "We don't care what software is run on GPGPUs as long as it's an Nvidia GPU," said Gupta. It should also be noted that both AMD and Nvidia are members of the Khronos Group, the consortium that oversees the development of OpenCL, though one must wonder what is said at their meetings.

When asked what is stopping AMD from being able to run CUDA applications on its GPU boards, Gupta simply replied, "nothing". Gupta's straight answer can, surprisingly, be taken at face value because theoretically AMD could create a CUDA compliant driver that could run code on its GPUs. Of course there are licensing issues and the rather small matter of company pride at stake, but in theory it could be done.

For Harrell the problem isn't technological but rather ideological. She said, "CUDA is not running as an industry standard" and that Nvidia has "total control over the language". The problem for AMD is that while that may be true and the firm might assume the moral high ground, Nvidia and consequently CUDA are fast becoming the de facto standard in HPC and academia.

CUDA might not be open, or even a standard, but history tells us that such technicalities never stopped other languages from attaining widespread popularity. Being policed by IBM didn't stop Fortran from still being the numerical language, half a century after it first appeared. Even with Sun Microsystems' best efforts to create a cumbersome 'framework' and employ licensing peculiarities, Java's popularity has managed to surpass C. It has happened before and it's looking like history will repeat itself.

There are parallels between Java and CUDA proliferation, through universities offering courses on CUDA development. These are students who will be graduating with CUDA not OpenCL development skills and taking them into industry. Like years of computer science graduates were force fed Java development at the expense of C, Nvidia - thanks to AMD and others not taking GPGPU seriously - might end up with armies of coders who can exploit its hardware better than that of its competitors.

A quick look at what's coming out of academic research should dispel any misconceptions one might have as to how well Nvidia has done in this area. If you think GPGPUs are merely used for fancy graphics rendering or boring heavy duty matrix manipulations that appear in the annals of graphics conferences such as Siggraph, then you're in for a surprise.

Later this month at the ACM Sigcomm conference, widely revered as the top networking conference, a paper entitled 'Packershader: a GPU-accelerated software router' will be presented. The researchers show how a Geforce GTX480 can cope with shifting packets around. Before you laugh at the notion of one of the most power hungry graphics cards being used as a router, the authors conclude, "We believe that the increased power consumption is tolerable, considering the performance improvement from GPUs."

So while AMD and others are betting on OpenCL, Nvidia has not only got the jump but has hedged its bets by supporting both CUDA and OpenCL. Actually, Nvidia proudly boasts about its support for Java, Python, Fortran and Directcompute.

According to Gupta this wide range of support will mean that Nvidia will remain popular among developers. As for OpenCL, Gupta forecasts it being overtaken by Microsoft's Directcompute. He even suggests that OpenCL might get the same pummelling that OpenGL did against DirectX. Though it's hard to see that happening given the support OpenCL has, one can't doubt that, at this stage of the battle at least, Nvidia not only has the high ground but controls the heavy artillery.

Nvidia deserves credit for not only lowering the cost of HPC but achieving a lot in a short space of time. However some of that credit should also be taken by AMD, which has seemingly stood by and let Nvidia get such a formidable grip on the industry. Even Harrell admits that AMD still needs to do more with its software and even with marketing.

For AMD, it's current crop of Firestream cards that are about to be released represents one last chance to put up a real fight in the HPC market. If it doesn't, it is likely that Nvidia and CUDA will never look back. µ

Share this:

Comments
Green Computing???

You want green computing then buy an Abacus.

With rare earth mining, zero recycling and heat losses computing ain't green.

It is what it is.

Nvdia is at a disadvantage with gpu hpc. The mid range sale point of earlier designs will be replaced by the entry point for Sandy Bridge and Fusion.

The development costs have to increase due to that market loss.

AMD does not suffer that defficiency, they simply use the slightly oder design in the next product cycle refresh. The Top Radeon core will be a Fusion core within two tears. And they'll sell millions of them.

posted by : rv, 27 January 2011 Complain about this comment
Drashek time Inverter

"Yet, IT Can be easily Done with New Drashek time Inverter"

That is the funniest thing I've heard all day!

posted by : ChemicalSoup, 07 October 2010 Complain about this comment
Good Article

Although Nvidia has a lead, I still do not count AMD out. The HPC market at the lower end is very much in growth. Since the prices have come down, developers might even be able to dabble in both and play against the strengths. Sometimes cornering the market is not necessary for all. It is important for Nvidia as the need new markets because they have been shut out of one of their bigest besides the GPU, the chipset.

posted by : Kode, 27 September 2010 Complain about this comment
Need for ECC or not

There is a paper about GPU memory errors: "Haque IS and Pande VS. Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU. In Proceedings of 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing (CCGrid 2010), pp 691-696."
It does point out a need for better memory error checking and memory error recovery procedures, because the GPU memory errors are more as a norm than exception. In that light it can be argued that having ECC is a good thing.

There is a tool or two for the GPU memory testing: https://simtk.org/home/memtest/

posted by : A.B., 21 August 2010 Complain about this comment
Prophetic?

CUDA might not be open, or even a standard, but history tells us that such technicalities never stopped other languages from attaining widespread popularity. Being policed by IBM didn't stop Fortran from still being the numerical language, half a century after it first appeared. Even with Sun Microsystems' best efforts to create a cumbersome 'framework' and employ licensing peculiarities, Java's popularity has managed to surpass C. It has happened before and it's looking like history will repeat itself.
-Some may believe the above the grail of future prophetic wisdom - NOT- totally disregarding CUDA, the paragraph merely states the progression of languages from Fortran, C, to Java - each offering advantages over previous/ for specific tasks- If at all you believe language use defines the better language, the web site TIOBE.com keeps tabs of language usage..BTW CUDA which was introduced in '07, compared to OpenCl (08'), is not in the top 100- I recommend you check the use of OpenCL/GL in your cell phone.
asH

posted by : asH, 21 August 2010 Complain about this comment
ati stream

How many popular apllication use the ATI's Stream?

ONE ( ATI FoldingHome, at half the speed of any NVIDIA card ).

How many use CUDA, PhysX or NVIDIA's OpenCL? A lot ( Octane, Arion, Photoshop, xNormal, VRay, LuxRender, Badaboom, Nero, Roxio, PowerDVD, Lightworks, PantaRay, Just Cause 2, etc... ).

posted by : sfutu, 19 August 2010 Complain about this comment
nope

AMD's OpenCL SDK is completely immature.
AMD simply does not deliver for non-game apps. And, for games, their tessellator is ridicuously slow.

CUDA and NVIDIA are absolutely superior for GPGPU.

posted by : gogo, 19 August 2010 Complain about this comment
Jokes

"says it's something AMD simply hasn't needed"
The point of ECC is that it is a customer requirement and not anything to do with the manufacturer.
"the need for ECC is mitigated by testing done in AMD's labs prior to shipping boards".
Sure, it's good to know AMD has eliminated faults caused by alpha particles from the process and packaging they are using.
ECC is needed in larger systems and fault tolerant embedded systems where the probability of flipped bits and consequences of failing devices are causing unacceptably high risk. The marketing people of the AMD should be perfectly aware of this.

posted by : Anonymous Coward, 18 August 2010 Complain about this comment
China Set Night on Fire....Boarded theUNION Jack.

Moore intresting Facts. Take test strip roll & mark 1/100th of inch for 100 miles 1,200 x 5280= 6 million x 100 =600 million strokes thru machine in 1 sec = 100 x (1x 528,000 x 3,600) = about 1/10th speed of light. Now WE Know How Fast Av Computer Is Stroking FIRE. About as fast as average Brain.

Since WE Know ONE IS Lonelyest Number. Getting Mary Is Like Stepping Back IN Time. About 800 Trillion Years saving Light Hydrogen that surrounds Milky Way. thats Long time to wait to get back here, to theINQ.

Yet, IT Can be easily Done with New Drashek time Inverter. Using Above Art as Command Center, titan V Rocket with Sodium Peroxide wound around Chromium Peroxide, Sodium Starts Long Journey, Then electric ring of firing pins on thruster nozzle Bumps explosion back into vortex & fuel area, AfterBurning Chromium in one gasp, NuClear Fire & small force holds whole together, while forward thrust approachs light speed instantly. See You Then.

posted by : CHINA WHITE...., 18 August 2010 Complain about this comment
Change Cuda to OpenCL

"theoretically AMD could create a CUDA compliant driver that could run code on its GPUs"

You can also turn it around, what is stopping Nvidia to make is easy to convert to OpenCL so it work on both GPU's

I vote for OpenCL we have already enough standards that are under control of one company.

posted by : kedas, 17 August 2010 Complain about this comment
Wat?

a ... a... a decent article on the Inq? Surely you jest?

posted by : eimaiosatanas, 17 August 2010 Complain about this comment
OpenCL or Cuda

Where are those Universities that teach you only CUDA and not both CUDA/OpenCL? Around here we spend equal time with both languages.

posted by : Gunggel, 17 August 2010 Complain about this comment
nvidia Fanbois

Nvidia misses OpenGL 4.0 promises

http://www.semiaccurate.com/2010/04/12/nvidia-misses-opengl-40-promises/

Nvidia fanbois

posted by : asH9, 17 August 2010 Complain about this comment
where do they get you guys

The truth is, AMD simply didn't take using GPUs for HPC seriously. Perhaps it thought that its Opteron chips could cut the mustard or maybe it was just a lack of vision, but either way it let Nvidia take the HPC lead.

AMD wasn’t able to compete in the HPC arena because the parts were not in place- their APU platform required a meld between hardware and widely accepted Open(X)platform- that didn’t happen until this year...do your research!!
asH

posted by : asH, 17 August 2010 Complain about this comment
very good article!

nicely written and summarizes all the companies mentalities.

last time i checked (a couple of months back), amd doesn't seem to that much interested in gpgpus. just look at their forum support for stream/opencl (there is hardly anyone from their supprot replying)

posted by : aj, 17 August 2010 Complain about this comment
packer?

surely packetshader in place of packershader...

posted by : anon, 17 August 2010 Complain about this comment
aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?