WHEN LOOKING at those Nehalem slides - and Opteron slides a few years before - those of us with longer memories could help recalling similar Alpha EV7 and EV8 stuff presented a decade earlier.
All the nice integrated multi-channel memory controllers, four memory-speed links to talk to other CPUs north, south, east and west, and immense scalability as a result. Ask those who happen to have used the mighty HP AlphaServer ES47 with such stuff - we had two of these in our old labs for a few months five years ago, and only now do other systems come matching their Streams numbers.
This past IDF, Intel announced its next major X86 instruction set facelift - the Advanced Vector Extensions, or AVX. What's the big deal, you may ask? After all, there were countless MMX and SSE rounds till this day.
Well, it is a nice 3-operand, RISC-like approach, so you can finally do A+B=C in a single opcode: AMD's proposed SSE5 is supposed to go along this line as well. Later, with fused multiply-adds, this could even become A*B+C=D. Then, you got a more efficient instruction format with a lot of baggage (and length) reduced - again, one of major problems of X86 on efficient fixed-opcode length RISCs. No need to mention the good ship Itanic here, it can have more instruction FORMATS than some RISCs have instructions - that's how 'elegant' it is.
Then, AVX doubles the SSE register length to 256 bits - doubling the amount of data fitting in and, matched with doubled data paths, providing twice the FP throughput per clock in the Sandy Bridge CPU some two years from now. And, one day maybe, you could fit two quad-precision 128-bit FP numbers into each of these registers. Marvellous!
But then, these innovations aren't that new? From the turn of the century, there was something called EV9 - a 2146 4 Alpha CPU somewhere in 2005. The thing was proposed to have one (possibly two) 8-way superscalar EV8 cores, each multithreaded of course. And a dedicated vector engine with 16 MB L3 cache. Now, that was to be an interesting beast for a general-purpose CPU: a 1024-bit wide monster, with matching L3 cache width, and 32 1024-bit wide vector registers (yeah, four kilobytes of numbers in there).
The thing would have achieved 16 parallel 64-bit DP FP mul-adds per clock then, and the humongous register space coupled with ultrawide cache and 16 RDRAM channels would have ensured quite a high practical performance rate, too, something on the order of 100++ DP GFLOPs per core.
Too bad we all know what happen to the Alpha and its original owner anyway, and that's not gonna change unless, say, Nvidia discovers that Alpha also had the world's best real-time X86 code translator for Windoze, the FX!32, making it the only remaining high-end candidate for the firm's CPU in the absence of AMD buyout. Is Nvidia willing to do a fabless EV9 feeder for their Geforces?
Back to today, you can see that many "brand new" things in the new and upcoming X86 processor from both sides do somehow trace their start to the Apha. And yeah, Intel AVX is a great news, and will go a long way towards gradually moving the application based towards a more elegant and efficient instruction set - not exactly a RISC yet, but a step in the right direction anyway. µ
I complain about the Alpha murder, and Intels marketing machine, but poeople are most often like cattle or horses - flaps on their eyes direct them to the every-day crap product advertised on TV. If only Compaq wouldn't buy DEC and than merge with HP, selling it off to Intel....who took 10 years to understand that Alpha from 10 years back is superior to their current products... and 10 years back I still had a Pentium 120Mhz..compare it to what average Joe has in his PC today.. ;-(
who's gonna tell you when
it's too late
who's gonna tell you things
aren't so great
you can't go on
thinking nothing's wrong
who's gonna drive you home tonight

who's gonna pick you up
when you fall
who's gonna hang it up
when you call
who's gonna pay attention
to your dreams
who's gonna plug their ears
when you scream

you can't go on
thinking nothing's wrong
who's gonna drive you home tonight

who's gonna hold you down
when you shake
who's gonna come around
when you break

NVIDIANVIDIANVIDIA
You need a good card-spanking, dood!
Meet us on Sandy bridge, cuz it's on!
Nebojsa, Your too Whirlwind without enough specifics. Heres Ultie Advanced imagineation Victors:

Alpha was 1998, mainly promoted due to onboard refrigerator, it was so HOT. Alpha is also developement stage & many a company has adopted name, meaningless number.

Mainly your comparing 5 years ago, Pent.IV & Bart.on to today. with hopes till better skies being quashed? I believe in SSE5, whatever it is & I feel every High Speed Single Core unit(5 y.o) should be Trashed Heaped. End of story.

Machines are moe stable today than ever & more complexity takes time. Several Years just to get thru proprititary ownsership to public tests, then little longer to retail. Its Compound thing, More there is, faster change is made & better it gets,sooner.
Now ask Me about Weather?Anyone?
drashek
It's been a fascinating 10 years watching the processor manufacturers finally realize that the wizards at DEC really knew what the heck they were doing. That Intel and AMD are finally reaching the same levels of performance that DEC anticipated years ago shows what a sorry mess both companies are in.

Now if Microsoft could see their way to developing an OS that is a worthy successor to OpenVMS.... but I figure the universe will come to an end before that happens...
Quote from IEEE paper :
---------
Tarantula adds to the Alpha ISA newarchitectural state in the form of 32 vector registers (v0..v31)andtheiras-
sociated control registers: vector length (vl), vector stride (vs), and vector mask (vm). Each vector register holds 128
64-bit values. The vl register is an 8-bit register that controls the length of each vector operation. The vs register
is a 64-bit register that controls the stride between memory locations accessed by vector memory operations. The vm
register is a 128-bit register used in instructions that operate under mask.
-----------------
Thus 32 vector registers, with 8192bit each, not 1024bit.

cheers

Alex