THE GREEN GOBLIN showed off a 1U rackmount box stuffed to the gills with GPUs this week, as Nvidia continues to claim that its graphics chips can largely replace CPUs for high performance computing (HPC) applications.
The company said the Tesla S1070 is a so-called 'GPU computer' and claimed that it offers higher performance with lower power consumption than CPU-only systems. Each Tesla box contains up to four GPUs and multiple units can be configured into computing clusters incorporating onboard communications links and data storage, Infiniband switches and cabling.
Tesla offers a preconfigured cluster of four S1070 boxes that Nvidia said is capable of delivering up to 16 teraflops, or one teraflops per GPU in the cluster.
Nvidia said that the French bank BNP Paribas has installed two Tesla clusters.
We're sceptical that much useful programming infrastructure exists to take advantage of Nvidia's GPU-based compute resources, and suspect that initial applications that might be using such GPU HPC systems are very likely hand-coded, in a FPGA assembly language dialect, by elves. µ
L'Inq
Information Week
Ever heard of Cuda? Is an extension of C.
Ever heard of compilers like HMPP or PGI?
No I guess.
They produce CUDA or PTX code out of C code.
No assembler.
elves :-)
hehe, know the feeling
Do companies say "gee, we shouldn't use that app because it was *hand assembled*."
its amazing to me how the inq is so dead set against nvidia trying to enter the general application market by pushing their gpu compute tech. maybe we should just let intel tell us what we should compile for.
Saw a recent demo of these to researchers at a UK uni. It was going swimmingly until they mentioned that you'll need recode everything in CUDA (C with proprietary extensions) and while they pay lipservice to OpenCL, they actually don't give a fig about it.
Sheeesh, and you think it ain't easy being green.
Hey Inqas,
It looks like total ignorance is becoming a trademark for your publications.
You've never heard about a number of universities teaching to program in CUDA which is a dialect of C language. You discovered for yourself a "new" (just 2-years old) S1070 box.
Well done guys! It is already time to surprise your readers with an article about Mr Gates' multitasking Windows 1.0...
We got sent a couple of machines by NV for evaluation, for free. We found they only worked well on embarassingly parallel problems (ie: didn't scale at all), and ran like crap in double precision mode. Which meant that none of the groups could use them. After nine-ish months of trying to find something that would run well on them, we needed the rack space so we shipped them back to NV. No-one seems to miss them.
I don't see GPGPU being anything other than a niche when it comes to HPC for a long time - all the stuff that us HPC guys want is a waste of chip space for normal GPU work, chip space that NV/ATI would much rather use for more shaders.
And then there's the software issue - to get anywhere near the rated performance out of the card requires a huge amount of work. Sure, you can take an existing C implementation, hack at it for a month, and get something that compiles and runs using CUDA ... except it'll usually get slapped silly by a quad-core Nehalem. Even assuming your HPC application is one of the few that is embarassingly parallel and can run in single precision mode, you'll need man-months if not man-years of effort to get a CUDA implementation working well.
From what I've seen, PowerXCell is a far more likely platform for future general-purpose HPC needs, rather than trying to kludge in GPUs. There's just too much of a disparity between rasterization of triangles and most HPC algorithms.
@dave "enter the general application market"
I loled, general application = highly parallel data loads? Sure GPUs can be fast at handling certain types of processing but it seems like a niche area.
Ever since nVidia made the bold claim that the GPU was going to replace the CPU as the most important component (performance-wise) in a computer, there has been nothing but laughter at them.
nVidia seems to believe that any and all computer applications can be easily converted to (or will be converted to in the near future) work on parallel systems. The simple fact is that many important areas, most notably virtualization (which is becoming more important in server farms), do not benefit from parallelization at all, only raw processor speed.
Cynic wrote:
you'll need man-months if not man-years of effort to get a CUDA implementation working well.
<<
Have you tried getting a couple of us Elves on your team?
Hence the expression in man-years :) IIRC the standard conversion is 1 man-year = 1 elve-month, though you do have to take into account that they all bugger off to the north pole for a few months of the year.