WE'VE HEARD REPORTS about how the upcoming Nvidia GTX680, the very first Kepler 'GK104' GPU will beat all and sundry in everything, including AMD's top of the range Radeon HD 7970, despite the latter's new GCN architecture and 50 per cent wider memory buses and memory capacity.
After all, look at the impressive block diagram. With all the brand new compute-oriented shaders and such, it does leave one impressed:
According to specifications leaked by Techpowerup, the complicated hierarchy starts with the Gigathread Engine, which marshals all the unprocessed and processed information between the rest of the GPU and the PCI-Express 3.0 system interface. Below this are four graphics processing clusters (GPCs) and one common resource, the raster engine, and two streaming multiprocessors (SMs). Only this time, innovation has gone into redesigning the SM, and it is now called the SMX. Each SMX has one next-generation Polymorph 2.0 engine, an instruction cache, 192 CUDA cores, and other first-level caches. So four GPCs of two SMXs each, and 16 SMXs of 192 CUDA cores each, amount to the 1536 CUDA core count.
There are four raster units amounting to 32 ROPs, eight geometry units each with a tessellation unit, and some third-level cache. There's a 256-bit wide GDDR5 memory interface at 6GHz declared throughput, and as noted it's a third narrower than the top end AMD Radeon HD 7970.
As The INQUIRER hasn't recently gotten Nvidia cards for review, I used a bit of spare time here in sunny Shenzhen, where the March all-time high of 29C heat hit us just a day before. It was a sweaty ordeal taking a public bus to a funny factory place nearly 10 miles away, in a booming city of 15 million twice the size of Greater London, but it was worth it....
Since almost anything, including the world's newest GPUs, can be found in Shenzhen, I had a quick look at an - unindentified, obviously, for the vendor's protection - GTX680 2GB card in that factory for just half an hour. I was shown some 3Dmark 11 and similar benchmark results, but being a compute boffin, I ran my Sandra 2012SP2 benchmark that I carry around on a USB stick to check GPGPU compute performance in floating point, especially double precision.
Remember, this card is supposed to be 'the crown winner' for Nvidia, since it couldn't make the bigger GK100 die on time, and all the effort was put into tuning it to the hilt to try to win against AMD, which has lead the performance pack for the past year or so. Therefore, I thought I'll get some good compute performance results here, too - in particular since AMD has enabled double precision floating-point even in the mid-range Radeon HD 7870 GPU as the first in this market segment, not to mention the high end Radeon HD 7970 model. I ran the same benchmark before on both the Radeon HD 7870 and the Radeon HD 7970, on - really underclocked - reference clock versions, which AMD could push up by another 20 per cent anytime.
Here is the result:
Wow! The claim of beating the HD7970 goes right into the thin air, it seems. Nvidia's new GPU is beaten by the Radeon HD 7970 by an order of magnitude here in double precision floating-point, as well as nearly twice in ordinary single precision floating point. One is speechless here. Even the Radeon HD 7870 with its restricted double precision floating-point still outperforms the GTX680 by a noticeable margin in this department, as you can see here. Only the Radeon HD 7850 is substantially slower.
One might ask, why bother? Well, compute GPU performance can't rely on tweaked drivers, application detection turnarounds and similar tricks as well as other such shortcuts. It is pure, raw processing ability that defines the GPU general purpose computing useability. After all, Nvidia created the GPGPU market and CUDA programming environment. This situation not only badly hurts its prestige in this area but also forces the need for a, say, GK110 'real Kepler high end' follow-on to be delivered soon. Not to mention, Nvidia's GPU compute optimised cards like Tesla sell for thousands apiece, even though they are based on essentially the same dies as high end consumer GPUs, therefore GPU compute is important.
For the other aspects of it, I was shown how it is quite close to the Radeon HD 5970 in Full HD gaming performance, except where its memory bandwidth limitations with a third narrower bus lose to AMD in high anti-aliasing and highly textured scenarios. One interesting, and rather negative, observation over Chinese tea from the hardware guy in charge was the issue with PCB and component quality on the reference boards, something I'd leave for later when more boards are seen. If this problem is really there, though, it could affect overclocking chances rather badly.
What then? We need Nvidia to be a strong competitor with a good product line from top to bottom, to avoid further attempts to be acquired, just like the one from Intel sometime ago that almost succeeded save for these same Chinese saving Nvidia's stock price at the last moment with a truly huge order of, guess what, expensive GPU compute cards for their Tianhe multi-petaflop clusters of supercomputers in Tianjin, right next to the capital, Beijing. So, we can't say that the Chinese don't help US companies survive, even if lead by a Taiwanese.
For that, Nvidia needs a true performance leading world class GPU, one that will drive a 'waterfall effect' to help the sales of its other GPUs too. Remember that Intel will also greatly improve its integrated graphics, with near doubling in the Ivy Bridge generation, followed by another massive jump in Haswell. And these on-chip GPUs will support DX11 Compute, among other things. It's not a good idea to be squeezed from both top and bottom.
So, aside from a few gaming benchmark tests, the GK104 die in the GTX680 is not exactly the cure for Nvidia's ills or a true performance leader at the moment. Nvidia urgently needs the GK110 die, especially since AMD can really easily crank up its GPUs by well over 30 per cent across the board right away - count the 20 per cent frequency jump plus driver improvements, and there you are. And did we mention the dual GPU Radeon HD 7990 followed by the Sea Islands? It's an exciting year for watching the GPU market. µ