GRAPHICS CARDS are no longer just graphics cards thanks to Nvidia, but the firm that brought graphics chips to the server room is for the first time about to face some serious competition.
In the past five years we here at The INQUIRER have called Nvidia many things, however the accolade of high performance computing (HPC) innovator is also applicable. The company's focus on producing general purpose graphics processing units (GPGPUs) has lowered the cost barrier to HPC, allowing small companies, researchers and even hobbyists access to serious computing power.
So it seemed Nvidia dropped a bit of a clanger when it revealed that the number of cores on its Tesla board would decrease and the thermal design power (TDP) would be higher than first reported. That was followed by The INQUIRER revealing that staunch Nvidia supporter, Silicon Graphics International (SGI), was going to offer another vendor's GPGPU accelerator boards. After this, it became obvious that Nvidia had finally come up against competition.
The reason for Nvidia's dominance of the GPGPU accelerator market wasn't by chance or even due to the firm's own actions. The truth is, AMD simply didn't take using GPUs for HPC seriously. Perhaps it thought that its Opteron chips could cut the mustard or maybe it was just a lack of vision, but either way it let Nvidia take the HPC lead. Now it seems that both firms agree, however, that GPGPUs combined with standard x86 CPUs are the only way to enable exa-scale computing.
The soap opera running alongside GPGPU development has been Nvidia's insistence to publically go after Intel. Speaking to Nvidia, it's blatantly obvious that the firm needs Intel more than Intel needs the GPU designer. According to Nvidia's Tesla product line manager Sumit Gupta, all the firm wants to do is "get people to use the GPU". The only problem with that is that a CPU is required, as Gupta readily admits.
In Nvidia's recent press slides, it uses Tesla boards paired with Intel Xeon chips to demonstrate the performance gains of a CPU/GPU combination. So the question is, why bother attacking the devil, if you have to dance with it? Of course Nvidia could promote AMD's CPUs instead of Intel's but we're not sure even global warming can stop hell from freezing over before that will happen, after AMD bought ATI.
Nvidia's spat with Intel is an amusing sideshow at best. The more immediate problem is that at long last AMD is taking GPGPU computing seriously. For Nvidia, a company that has bet the farm on a chip that was geared towards GPGPU right from the start, it is clearly worrying that the stigma of low performance per Watt has been attached to its Fermi architecture.
Being fair to Nvidia, it does perform very well in the Green 500, a list that uses figures from the Top 500 list to calculate MFLOPS/Watt. The fourth place ranking of the Dawning Nebulae cluster is impressive, while the 57 per cent jump in performance per Watt between the Nvidia Tesla cluster and the three top ranked IBM Cell clusters is easily explained, according to Gupta. "It's all down to the size of the cluster, in bigger clusters the interconnects consume considerable power."
That explanation might seem a bit too simple, but there are publically available figures to back up Gupta's claim. The Top 500 states that the 'greenest' supercomputer, QPACE SFB TR Cluster comprises 4,608 cores, while the Dawning Nebulae has an astonishing 120,640 cores which breaks down to 4,640 Nvidia GPGPUs each mated with two hexa-core Intel X5650 2.66 GHz 'Westmere' chips. To highlight the potential of GPGPUs, the Nvidia cluster posted just over 492 MFLOPS/Watt, nearly 100 more than the top placed Xeon only cluster.
So what about the heat? It's a case of matching the best of the worst. AMD's top end Firestream 9370 has a 225W TDP that Nvidia, after a little goading from The INQUIRER, said was the correct TDP of its top end Tesla M2070 board. Initially, as we reported, it had declared that the TDP of the Tesla M2070 was 247W, a figure it has since corrected.
The biggest problem for Nvidia is that AMD is able to offer a 150W TDP single slot board in the shape of the Firestream 9350. While it might not win any benchmarks outright, it does require significantly less power which should make it viable in a wide array of situations. Nvidia has told us that it doesn't have a similar board at this time, though it sees its Quadro line as a halfway house between consumer Geforce cards and full blown Tesla boards.
As for reasons why Tesla boards have such a perceived high power draw, one aspect could be the deployment of ECC memory. Gupta is adamant that ECC is "vital for acceptance in HPC" while AMD's director of stream computing Patricia Harrell says it's something AMD simply hasn't needed.
According to Harrell, the need for ECC is mitigated by testing done in AMD's labs prior to shipping boards but equally as important, she claims that should AMD incorporate ECC support it would "lose performance per watt benefit". Harrell adds that it is a "reasonable assumption" that enabling ECC results in a higher power draw, a claim that is borne out by looking at published research papers. Meanwhile Nvidia claims that ECC is not only vital but has "negligible impact" on power usage.
When the latest Top 500 list appeared, it was the Nvidia cluster that stole the headlines. Not just because it signalled the dawn of GPGPUs in HPC but the performance per Watt compared to the number one cluster, Jaguar, was tremendous. GPGPUs have arrived and even AMD squeezed in on another Chinese cluster, Tianhe-1, which uses ATI Radeon HD 4870 cards. That seemingly has gotten Nvidia a bit hot under the collar.
At times it was hard not to miss the sheer disdain in Gupta's voice when he was talking about AMD. The passion in his words was palpable and it was as if Gupta felt offended that the hard work he and his team did was not replicated by AMD. More than once Gupta referred to AMD as a company that has made "zero investment in GPGPUs".
The reason for this was simple, said Gupta. "GPGPUs are at the lowest priority" because AMD is "compelled to sell CPUs". Gupta continued his attack on AMD by saying that the firm is "completely torn internally" between selling its old cash cow, the x86 CPU, and the future of HPC, GPGPUs.
Not surprisingly, AMD's Harrell flatly denied this claim of internal strife, saying that the chip designer is "supportive of GPGPUs". She deftly batted away Gupta's point about attachment to the x86 architecture by saying that such an argument is "typical for a firm without an x86 business".
Harrell echoed Gupta's view that GPGPUs are "critical for success" in HPC and that AMD does not see GPGPUs as a replacement for its Opteron CPUs. On the subject of internal conflict, Harrell said that recently AMD's x86 server chip division merged with its GPGPU division, and she maintained that it, like Nvidia, sees the need for the two architectures to co-exist.
While Gupta's claim of AMD's 'zero investment' in GPGPU design is clearly an exaggeration, there is something to be said for AMD's tentative steps into the market. For independent observers it is obvious that greater competition in the market will not only increase innovation but will also result in standards for both hardware and software being set sooner. Even Harrell admits that industry standards are not moving fast enough, but the battle is not over raw chip speed but rather the development environment and specifically the language itself.
AMD is betting the server farm on OpenCL, an open language that according to Gupta is missing key functionality. Gupta points to OpenCL as a language that has been "over hyped by AMD" and is bereft of features such as recursion and pointers. These, among other things said Gupta, are barriers to the adoption of OpenCL in HPC. But Harrell denied that AMD's support for OpenCL is hurting the firm, and said that rather its higher level, cross platform functionality has proven popular among its clients. As a foil to Gupta's earlier zero investment claim, Harrell said that AMD is "investing heavily in making OpenCL succeed".
To Nvidia it is seemingly a source of annoyance that AMD is trying to paint itself firmly in the OpenCL camp, and Gupta said that AMD has "no credible OpenCL strategy". He went even further by stating outright that "they [AMD] don't support OpenCL" claiming that there are "no production OpenCL drivers from AMD". Harrell retorted by pointing to AMD's developer site. However Nvidia clarified its point by saying, "Nvidia has the only conformant, publically available, production OpenCL GPU drivers." It claims that while AMD's drivers are conformant, it does not include them within the standard driver download.
It would be easy to paint Nvidia and Gupta as Green Goblins in trying hard to undermine OpenCL but Gupta openly admitted that he doesn't care which language succeeds, whether it be Nvidia's own 'closed' CUDA or OpenCL. "We don't care what software is run on GPGPUs as long as it's an Nvidia GPU," said Gupta. It should also be noted that both AMD and Nvidia are members of the Khronos Group, the consortium that oversees the development of OpenCL, though one must wonder what is said at their meetings.
When asked what is stopping AMD from being able to run CUDA applications on its GPU boards, Gupta simply replied, "nothing". Gupta's straight answer can, surprisingly, be taken at face value because theoretically AMD could create a CUDA compliant driver that could run code on its GPUs. Of course there are licensing issues and the rather small matter of company pride at stake, but in theory it could be done.
For Harrell the problem isn't technological but rather ideological. She said, "CUDA is not running as an industry standard" and that Nvidia has "total control over the language". The problem for AMD is that while that may be true and the firm might assume the moral high ground, Nvidia and consequently CUDA are fast becoming the de facto standard in HPC and academia.
CUDA might not be open, or even a standard, but history tells us that such technicalities never stopped other languages from attaining widespread popularity. Being policed by IBM didn't stop Fortran from still being the numerical language, half a century after it first appeared. Even with Sun Microsystems' best efforts to create a cumbersome 'framework' and employ licensing peculiarities, Java's popularity has managed to surpass C. It has happened before and it's looking like history will repeat itself.
There are parallels between Java and CUDA proliferation, through universities offering courses on CUDA development. These are students who will be graduating with CUDA not OpenCL development skills and taking them into industry. Like years of computer science graduates were force fed Java development at the expense of C, Nvidia - thanks to AMD and others not taking GPGPU seriously - might end up with armies of coders who can exploit its hardware better than that of its competitors.
A quick look at what's coming out of academic research should dispel any misconceptions one might have as to how well Nvidia has done in this area. If you think GPGPUs are merely used for fancy graphics rendering or boring heavy duty matrix manipulations that appear in the annals of graphics conferences such as Siggraph, then you're in for a surprise.
Later this month at the ACM Sigcomm conference, widely revered as the top networking conference, a paper entitled 'Packershader: a GPU-accelerated software router' will be presented. The researchers show how a Geforce GTX480 can cope with shifting packets around. Before you laugh at the notion of one of the most power hungry graphics cards being used as a router, the authors conclude, "We believe that the increased power consumption is tolerable, considering the performance improvement from GPUs."
So while AMD and others are betting on OpenCL, Nvidia has not only got the jump but has hedged its bets by supporting both CUDA and OpenCL. Actually, Nvidia proudly boasts about its support for Java, Python, Fortran and Directcompute.
According to Gupta this wide range of support will mean that Nvidia will remain popular among developers. As for OpenCL, Gupta forecasts it being overtaken by Microsoft's Directcompute. He even suggests that OpenCL might get the same pummelling that OpenGL did against DirectX. Though it's hard to see that happening given the support OpenCL has, one can't doubt that, at this stage of the battle at least, Nvidia not only has the high ground but controls the heavy artillery.
Nvidia deserves credit for not only lowering the cost of HPC but achieving a lot in a short space of time. However some of that credit should also be taken by AMD, which has seemingly stood by and let Nvidia get such a formidable grip on the industry. Even Harrell admits that AMD still needs to do more with its software and even with marketing.
For AMD, it's current crop of Firestream cards that are about to be released represents one last chance to put up a real fight in the HPC market. If it doesn't, it is likely that Nvidia and CUDA will never look back. µ