Jump to content
The Inquirer-Home

ATI's Stream Computing - a good reason to get bought

Comment Terraflip, gigaflop
Friday, 6 October 2006, 19:31
WHILE ATI AND NVIDIA lock horns over the ultimate trophy in 3DMarks or game frame rates, another aspect of GPU performance has now gained recognition: its performance in FP-intensive non-GPU applications, from molecular simulation and financial modeling to object interaction physics.

Why move it to the GPU? Well, it can now do the job nearly as well as proprietary FPGA-based accelators at often just one-tenth of the cost. Also, 64 bit pixel precision pipelines ultimately should lead to the full 64-bit FP processing capability.

Since standard code used for such apps envision a lot of branching and conditional program flow, ATI is at an advantage over Nvidia at present with its more complex architecture that provides for such applications. In some cases, the routines and datasets in question can comfortably fit within the GPU's on-board memory - but in others, you need all the memory you can get, if for instance operating on large arrays.

Now, enter HyperTransport 3.0 and a future ATIMD GPU connected directy to it via the Torrenza platform. It can still have its pool of dedicated fast, wide memory for graphics use, but at the same time, it can access the whole system memory at CPU-like speed via HT 3.0, at low latency and, if need be, cache coherency. In this case, this GPU would become a true coprocessor, in nearly the same fashion as the 8087 was to 8086 some 25 years ago.

Is it a coincidence then that ATI "Stream Computing" initiative, enabling ATI GPUs to work in concert with CPUs to solve complex computational problems, comes exactly at the time of them being acquired by AMD? After all, the scientists, engineers and financial wizards can hardly resist the ~ 360 peak GFLOPs of an ATI X1950XTX, when a comparably priced Intel Core 2 Duo E6700 at 2.66GHz gives you 'only' 42 peak GFLOPs in single precision, half that in double precision - and current Athlon64 top CPUs are half of the Intel figure.

Now you don't even need to go for sadomasohistic assembly language programming of your (non)GPU, if using PeakStream number crunching programming interface, providing plethora of GPU-based common math operations function calls for easier coding. The company's solution covers far more than the game physics, enabling you to use the R580 series GPUs as accelerated FPUs. Also, Stanford University has a new distributed computing application using the ATI GPUs for disease research computation - far from GPU intended use, isn't it?

FPGA and dedicated accelerators will be most directly affected - not only are these products far more expensive, but also more cumbersome and, for the users outside US / EU, they are often declared as controlled export items - unlike the plain vanilla GPUs.

Now, both R600 and G80 are expected to break the half teraflop peak processing power per chip - and, quite possibly, widen the range of apps able to be accelerated on them too. At the same time, there are no major per-core FPU performance boosts planned beyond what today's Intel Core2 and tomorrow's AMD K8L can do.

In summary, GPUs may be the trick in bringing the teraflop computing power to the desktop (SLI's G80 or Xfired R600 will surely do the trick - if the app uses it), and petaflop power to mid-sized supercomputing clusters - affordably. With Torrenza, AMD holds the advantage right now in implementing these early, unless Intel decides to give CSI to the X86 platform a bit earlier for tightly coupled co-processing - yet again after some 20 years. µ

Share this:

Comments

There are no comments submitted yet. Do you have an interesting opinion? Then be the first to post a comment.

Advertisement
Subscribe to the INQ Newsletter
Sign-up for the INQBot weekly newsletter
Click here to sign up Existing user
Advertisement
INQ Poll

Browsers

Who will win the next round of browser wars?