AT LONG LAST Nvidia's long awaited next generation high end DX11 parts have surfaced. The Fermi-based GeForce GTX480 - this one is still a bit rare, though - and GTX470 are out to compete against the incumbent single-GPU performance leaders, the AMD ATI Radeon HD5870 and HD5850. You've seen the avalanche of benchmarks all around the web, and most of them show these latest Nvidia powered graphics cards taking the lead, although not by much usually.
The wider memory buses - 384 bits wide on the GTX480 and 320 bits on the GTX470, compared to 256 bits on the ATI parts - help in attaining higher memory performance, as well as more bandwidth for the computational tasks in which Nvidia's GPGPU chips are expected to excel.
Now really, how important are these results in this current round of benchmark leapfrogging? Let's see.
First, ATI is expected to have sped up parts on both single and dual GPU fronts. Let's tentatively call them the HD5890 and HD5990, although the final names may differ. Count on anything from a 10 per cent average speedup on the single GPU part, resulting mostly from the 850MHz to 950MHz GPU clock jump, to more than 15 per cent on the dual GPU card, thanks to resolving power and cooling issues there.
Second, many vendors like Asus, Gigabyte, Sapphire and XFX might offer pre-overclocked cards with the existing HD5870 and HD5970 that attain these clock speeds by default anyway. Couple that with Eyefinity-enabled versions supporting 2GB video memory per GPU for that extra oomph in games and benchmarks and the updated cards should even out the battle with Nvidia on the single-GPU front.
On the dual-GPU front, a possible GeForce GTX490 consisting of two GTX470-class GPUs joined together, is still far away, probably awaiting another tuned, lower power GPU stepping. In the meantime, the HD5970 and its expected higher-clocked successor would rule the roost among the dual GPU cards.
Talking about another GPU stepping from Nvidia, there's another reason to look for a useful update there by Computex time two months from now. Wafer yields didn't allow Nvidia to enable all 512 shader cores on the GTX480, forcing it to limit the GTX480 to only 480 shaders. A yield or stepping improvement might allow Nvidia to release an 'updated' card, call it GTX485 for instance, having all 512 cores turned on. However, whether or not that will be possible for Nvidia is still a very big 'if' at the moment. Such a card could also have a 3GB large memory option.
In the meantime, if there are useful yields of chips with all 512 cores, you'll be most likely to see them in the most expensive OpenGL professional 3-D cards for workstations and visualisation clusters. A Quadro FX5900 6GB card comes to mind, expected sometime in April.
So, in summary, the performance match between Nvidia and ATI at the high end will see another round later this spring, with - we can hope - an interesting and competitive speed-wise benchmark battle between the two graphics rivals' updated chippery. If that happens it should help push prices down a bit, too. Therefore the current bunch of comparisons you see will most likely be very short lived. That won't dissuade the hard core enthusiasts from getting their own 'newest and fastest' card first, but at the very least, I'd personally hold off for the updated GPUs, come about May or so.
Another competitive aspect that's often overlooked is that, looking back at the CUDA and OpenCL experience and the associated programming difficulties, Nvidia made a major step forward with these Fermi GPU parts. You can program them directly in C or Fortran, rather than having to use obscure special approaches. I'm not sure yet how efficient all of that will be, but, for now, being able to tap nearly a TeraFLOPS of double-precision IEEE standard - that is, usable for most applications - floating-point power in a much easier way can mean a lot, such that many more PC programs will be able to take advantage of the new performance resource. Now, even Excel spreadsheets can easily be GPU-accelerated.
On the other hand, Nvidia could have kept a 512-bit wide memory bus for this purpose, simply to allow for more onboard GPU memory, up to 8GB on Quadro or Tesla and 4GB on the GeForce cards. The problem with GPU computing is that it can be very very fast, as long as the code and data sit in the local GPU RAM. Once it goes across the PCIe lanes to access the main memory, the performance penalty can be over an order of magnitude, negating the need for GPU acceleration. The previous GeForce and Quadro generation had this advantage due to its wide buses. This problem will remain until the GPUs link directly to CPUs via QPI or HyperTransport rather than over slow PCIe links with high latency.
At that point, assuming a compatible memory mapping and management structure with the system CPU, the GPU can, through direct inline C++ and Fortran coding, become a co-processor to the CPUs, just like the old x87 from Intel was to the x86 before floating-point was integrated into every processor. ATI, as part of AMD, will have an advantage there as it will have AMD's Hypertransport access by default, but the problem then will be that it will limit ATI to a small market share.
Nevertheless, we will be bringing you our own tests of the new graphics cards as they come along, hopefully including the customised higher-performance versions of the GTX480 and GTX470. Gainward and EVGA are two vendors to look at here in the Nvidia space, besides the usual Asus and Gigabyte. At the very least, the graphics card competition has turned interesting again. µ