My expectation is that the 8800GTX OC at 630MHz should easily hit just over 600GFLOPs, while R600 may end up a little faster in peak FP power. Of course, you already know that Pat Gelsinger forced his demo crew to push the Terascale chip to 2TFLOPs (peak) this week in Beijing.
But, hold on for a second - all this is (confirmed) 32-bit, single precision FP. As we all know, most scientific and technical apps prefer, if not insist on, standard 64-bit, double precision FP.
32-bit FP may be enough for graphics processing, but if you want to go the way of GPGPU (which by itself is an oxymoron according to Intel), you need the 64-bit FP. After all, the new GPUs all have very wide internal paths that could even handle 128-bit data if needed, so widened FPUs shouldn't pose a problem.
Now, imagine, say, a simulation, where a researcher has to run a 'pilot stage' computation on few hundred cases on the GPGPU, and then select a few final ones for full-precision CPU processing.
If the rough 'pilot' results aren't affected by the precision, then by all means 32-bit FP is fine - after all, currently shipping twin G80s will give you over a teraflop of single FP. But, if you want wider acceptance as a FP coprocessor for more apps that really need it, GPGPU vendors better take care of the 64-bit FP capability. ยต