I dont know if I qualify as a real HPC user, but what I need is hardware support of higher precision computations. All the single and double performance increases are meaningless to me when I have to rely on code and tricks that are an order of magnitude slower than hardware. 128 bit would be nice, 256 bit precision would be better,
I've been saying for years that Nvidia should buy VIA.
NVIADIA ;)
However, a new possibility appears to be coming of age.
All of the ingredients are there, maturing like wine, till they might just be able to make a nice meal of it.
ARM.
Sure, it's not that fast, but it's a CPU architecture with lineage. Developers have knowledge of it, and it has market share.
Now, team Nvidia up with some developers to create an all encompassing open API and the suite to go with it that is directly competing with DirectX. Think OpenGL + sound + input.
Who benefits?
Every smartphone maker.
Apple OSX + iphone. (games on a mac?! *gasp*)
Google Android.
Nintendo.
Want convergence?
Make the API.
Lets face it. Most of us dont need what the new x86's have to offer anymore.
Of course, Nvidia could simply add ARM into their core design. This would be especially interesting for netbooks that are already turning towards ARM.
If the chipset itself was a processor, then it could operate seamlessly in low power ARM mode and switch to X86 mode when needed - or even on budget models, not have the option for x86 at all.
With virtualisation having already been mostly mastered, there's nothing stopping this.
Forget virtualising OS'. Imagine Alt-Tabbing between CPU architechtures!
This is already somewhat in motion.
There are ARM based NIC's that can download torrents while the x86 motherboard is off.
It has another interesting possibility.
Security.
The ARM core would be invulnerable to x86 viruses. ARM mode could provide fool-proof virus scanning, firewall, etc.
What if, instead of virtualising the browser to protect the OS, you just get the browser to open in the ARM OS?
Nvidia have a great opportunity here.
With phones and consoles and netbooks allready using or heading that way, they could use ARM to flog their graphics/chipsets.
With all that extra heat, we'll end up with fewer CPU cores or less clock speed. And how do you arrange them, anyway once you start to scale up? 1 large gpu on the side + 4 CPU cores in a square? 2x2 with one of the cores a GPU? 4x2 with two of the cores GPUs? 3/3? We may end up sacrificing cores for GPU die space.
What if I want high CPU power, but no GPU power at all? Won't all the high end consumer CPUs eventually have GPUs in them? And will we end up with a driver headache when the big idea of hybrid card/built-in/GPU-on-chip power comes around?
Nebojsa Replace Charlie? Hummm. CPU Parts are like TOY Chest. Grab heart out, SomeLungs & little Larnyx & In Business.
Obviously, if CPU where SEX toy, It'd Be Perfected by Now. O.K., Heres How for REAL.
Teletransport todays Chips BACK In Time, Say 50 years Ago. then that Changes todays Chips into Better chips, as timeline is speed up. Repeat Until ULTEE' RULES.
@LeeE
I agree, which is exactly why M-Space makes me excited.
I think there is a middle road. While the many-small-unbranchy-processors route is the most efficient for certain problem sets, there is a trade-off to be made in terms of development effort for those types of processor.
The current trend seems to be towards moving GPU processors to a certain level of programmability, where they become less challenging to code for, but retain their parallel advantage.
In other words, we're currently at a point where it makes sense to spend transistors on increasing
"codeability" rather than the purest parallel-graphics performance. This has the advantage of bringing huge performance increases to certain types of problem, that would otherwise never have been coded for these parallel architectures at all.
So, yes, there is a convergence, but only to a point. The monolithic core isn't going anywhere either: we'll always have some branchy, unparallel code.
Convergence between general purpose CPUs and GPUs is largely irrelevant because they deal with different 'problems' and it is those 'problems' that cannot be unified or converged.
Most of what the OS and typical desktop applications do is intrinsically sequential in nature and cannot be effectively parallelised. It's only when you're running workloads that are suited for parallelisation that many cores make sense and for this type of job you don't need a full instruction set: indeed, incorporating a full instruction set in the MPP hardware in GPUs is not only pointless but would require more silicon real estate, lowering the number of 'useful' cores that can be implemented.
Being able to use the MPP hardware in GPUs for _any_ type of MPP problem, instead of just graphics rendering, is useful, but adding all the extra stuff so that each core becomes a general purpose CPU is counter productive.
Hello? Can someone tell me how intel's larrabee fits in this picture and when it was supposed to be out?
I dont know if I qualify as a real HPC user, but what I need is hardware support of higher precision computations. All the single and double performance increases are meaningless to me when I have to rely on code and tricks that are an order of magnitude slower than hardware. 128 bit would be nice, 256 bit precision would be better,
I've been saying for years that Nvidia should buy VIA.
NVIADIA ;)
However, a new possibility appears to be coming of age.
All of the ingredients are there, maturing like wine, till they might just be able to make a nice meal of it.
ARM.
Sure, it's not that fast, but it's a CPU architecture with lineage. Developers have knowledge of it, and it has market share.
Now, team Nvidia up with some developers to create an all encompassing open API and the suite to go with it that is directly competing with DirectX. Think OpenGL + sound + input.
Who benefits?
Every smartphone maker.
Apple OSX + iphone. (games on a mac?! *gasp*)
Google Android.
Nintendo.
Want convergence?
Make the API.
Lets face it. Most of us dont need what the new x86's have to offer anymore.
Of course, Nvidia could simply add ARM into their core design. This would be especially interesting for netbooks that are already turning towards ARM.
If the chipset itself was a processor, then it could operate seamlessly in low power ARM mode and switch to X86 mode when needed - or even on budget models, not have the option for x86 at all.
With virtualisation having already been mostly mastered, there's nothing stopping this.
Forget virtualising OS'. Imagine Alt-Tabbing between CPU architechtures!
This is already somewhat in motion.
There are ARM based NIC's that can download torrents while the x86 motherboard is off.
It has another interesting possibility.
Security.
The ARM core would be invulnerable to x86 viruses. ARM mode could provide fool-proof virus scanning, firewall, etc.
What if, instead of virtualising the browser to protect the OS, you just get the browser to open in the ARM OS?
Nvidia have a great opportunity here.
With phones and consoles and netbooks allready using or heading that way, they could use ARM to flog their graphics/chipsets.
With all that extra heat, we'll end up with fewer CPU cores or less clock speed. And how do you arrange them, anyway once you start to scale up? 1 large gpu on the side + 4 CPU cores in a square? 2x2 with one of the cores a GPU? 4x2 with two of the cores GPUs? 3/3? We may end up sacrificing cores for GPU die space.
What if I want high CPU power, but no GPU power at all? Won't all the high end consumer CPUs eventually have GPUs in them? And will we end up with a driver headache when the big idea of hybrid card/built-in/GPU-on-chip power comes around?
Tell me about the heat, how do you deal with the heat, more power and GFLOPS the more heat, how would they run at 40c ambient temp?
I tried a stream (ATI's GPU math acceleration thing) app the other day and got 550GFlops on my 'old' HD4850
Application I refer to:
http://galaxy.u-aizu.ac.jp/trac/note/wiki/Astronomical_Many_Body_Simulations_On_RV770#DemoProgram
Just to show you what a GPU can do.
i want my jaggies as smooth as butter on my 37 inch screen before this stuff. .<
You're putting a great many forks in the road, Nebojsa Novakovic.
Shouldn't someone be concentrating on how best to execute a present code-stream on any given system configuration?
I'll try it this way and then that way until I get the right way. But will it still be the right way tomorrow?
The breakthru will be an optimising comparator-translator and compiling operations architecture.
Is there any word yet on the positronic brain headers found in Roswell?
Nebojsa Replace Charlie? Hummm. CPU Parts are like TOY Chest. Grab heart out, SomeLungs & little Larnyx & In Business.
Obviously, if CPU where SEX toy, It'd Be Perfected by Now. O.K., Heres How for REAL.
Teletransport todays Chips BACK In Time, Say 50 years Ago. then that Changes todays Chips into Better chips, as timeline is speed up. Repeat Until ULTEE' RULES.
drashek
@LeeE
I agree, which is exactly why M-Space makes me excited.
I think there is a middle road. While the many-small-unbranchy-processors route is the most efficient for certain problem sets, there is a trade-off to be made in terms of development effort for those types of processor.
The current trend seems to be towards moving GPU processors to a certain level of programmability, where they become less challenging to code for, but retain their parallel advantage.
In other words, we're currently at a point where it makes sense to spend transistors on increasing
"codeability" rather than the purest parallel-graphics performance. This has the advantage of bringing huge performance increases to certain types of problem, that would otherwise never have been coded for these parallel architectures at all.
So, yes, there is a convergence, but only to a point. The monolithic core isn't going anywhere either: we'll always have some branchy, unparallel code.
The question is how much.
Convergence between general purpose CPUs and GPUs is largely irrelevant because they deal with different 'problems' and it is those 'problems' that cannot be unified or converged.
Most of what the OS and typical desktop applications do is intrinsically sequential in nature and cannot be effectively parallelised. It's only when you're running workloads that are suited for parallelisation that many cores make sense and for this type of job you don't need a full instruction set: indeed, incorporating a full instruction set in the MPP hardware in GPUs is not only pointless but would require more silicon real estate, lowering the number of 'useful' cores that can be implemented.
Being able to use the MPP hardware in GPUs for _any_ type of MPP problem, instead of just graphics rendering, is useful, but adding all the extra stuff so that each core becomes a general purpose CPU is counter productive.
Fusion of the two at this point could only possibly result in severely worsened performance.