Ethics is a county of England where the girls wear Eva Glass slippers
But hold on, most of its designers have spread to other places like AMD, Intel, Sun, and a zillion startups big and small, so the core of the team needs to be built from scratch. That's a tough job, but still possible: watch how Intel is rebuilding its teams for its next generation X86 cores .
Also, what about software support? Yes, Windows NT on Alpha was the nicest and most stable and virus-free Windows ever, and Digital Unix was arguably one of the best around, ever. If there was a fresh start, all this machinery would have to be restarted all over again, without a clear path to success.
Let's put aside those very valid questions. What if there really is a will to get Alpha back into the changed market, with POWER being the current last major workstation/server RISC platform, with Itanium's uncertain future, and the god-awful X86 architecture marching full-speed towards dominating the 64-bit space as well - no thanks to AMD, this time around? What sort of chip would it have to be to have that good chance of success, if any?
EV9 -> EV8 : A Blast From The Past
We all remember the EV8, a lucky number in my Far Eastern realm, also known by its official name 21464 - eh,
that's quite an unlucky number anywhere from Singapore to Beijing. In summary, a nice 90 nanometre based eight way
highly efficient out-of-order superscalar core with 4 FP ops/cycle, four way simultaneous multithreading, four MB L2
on-chip cache, and eight times PC1066-PC1333 Rambus channels per CPU for up to 20 GB/s memory bandwidth per chip, plus
a nice directory-based ccNUMA with multiple direct processor-link channels - think HyperTransport on steroids - to
provide a nice and glueless 512 way SMP-class systems.
This chip, originally scheduled for 2003 on a 0.13µ process, would wipe the floor with most of its competition, single or dual core, if out today. But wait, there was something else, called EV9 - quite a bit of whose design concept came from the sunny Catalonian seaside, in Barcelona. Give them a chance to design a CPU, and they might come up with something as outstanding as those Gaudi buildings there. So, let's look at that design, and see whether, if (and BIG IF, still) there ever was an Alpha re-launch soon, this could be rechristened as 'lucky EV8' without that 21464 moniker.
Barcelonian works
A 2002 paper called "Tarantula: A Vector Extension To Alpha Architecture" by Catalonia Polytechnic University in
Barcelona and ex-DEC Compaq Alpha team in Massachusets is the one I'm talking about -
you can read a copy
here.
In summary, this was the proposed vectorised follow-on to EV8, one of the two directions considered - the other one was a dual-core EV8 with larger cache and even more Rambus channels, of course. The 'Tarantula' EV9 flavour was a very big spider compared to the 'Aranha' EV8, a small spider in Spanish, I guess?
How big? Well if it was out, as I said, its bite could be lethal to anything from Opteron to the Itanic ship. The EV9 adds a humongous, ultrafast yet devastatingly simple floating-point vector unit to the original Alpha architecture, with its own set of 32 really long vector registers - yes really long, each register is 8192 bits (a whole kilobyte!) long, which is more than most complete register files of today's 64-bit X86 processors. They are so long so that they can pack 128 64-bit values each, and there are 16 FP ALUs operating on each of them - in parallel!
To facilitate fast handling of large data sets, the EV9 "Tarantula" vector unit read its stuff directly from large 16 MB L2 cache, without disturbing the CPU core's L1 caches. The L2 cache here is very wide, able to feed all the vector ALUs at their peak FP rate - very helpful for demanding FP tasks. That L2 cache is in turn fed by 32 PC1066-PC1333 Rambus channels on-chip, again per CPU, giving you anything from 66 GB/s to 83 GB/s peak memory bandwidth per CPU, or 10 to 13 times that of current quad-CPU Itanium systems.
At that time, there was no clear decision on whether to have multiply-add in there, but if now, it would surely be in the vector unit... besides supercomputing, of course anything from Photoshop, 3-D visualisation, high-end gaming and multimedia (did anyone say 70 mm cinema-quality 3840x2400 movie encoding in real time by the CPU?) would help from well over 100 GFLOPs of double-precision sustained FP power in such a chip. Compared to it, a 2GHz Montecito in 2006 would give you 16 GFLOPs peak FP power if both CPU cores run to the fullest.
As you can see from the paper, at the end, the whole thing was supposed to be pretty power efficient, in the same league as current dual-core Pentium Extreme Editions. I have to say that, after looking at this, the Extreme moniker doesn't suit it anymore. Most importantly, its target process was 65 nanometres, and its target year was - 2006!
Back to the future, then...
Well, let's assume then that we got extra lucky and have a chance to re-look at this wonderful design again in
today's light. At the first glance, what would be changed? Well, as Intel has shown us well, you can always make chips
bigger - Madison 9M and Montecito are good examples... so let's see what an architecture like Alpha with very compact
cores - where the whole EV68 chip was smaller than the Merced CPU core die area alone in the same process- can do with
that. First of all, no fight between the dual-core and vector groups - do both. Two 'old' EV8 cores, plus the vector
unit attached to just one of them, are in. Each core has its own 2 MB L2 cache this time too (as their small, narrow L2
caches have much lower latencies important for scalar code) and is just 2-way multithreaded instead of four way. The
big, high-latency cache for the vector unit and both cores becomes L3 in this case and, in sync with Montecito/Montvale
expectations for the 65 nm process (which now might not happen), we put that cache at 32 MB. To feed it, we can use 16
channels of FB-DIMM channels using DDR2-667 (around 83 GB/s bandwidth) for heaps of cheap (future) standard server RAM,
or 16 channels of Rambus XDR RAM if we want even more bandwidth and open pages in RAM, but at a price.
For scaling, instead of proprietary Alpha links, we can stick with a cache-coherent version of coming HyperTransport 3 protocol - hah, even the 32-bit 1200 MHz HT 2 gives you 19.2 GB/s bandwidth per link, and there would be at least four of those, plus a dedicated slower I/O HT link - all based on standard stuff, not proprietary, again. After all, those EV7 Alpha links surely 'inspired' AMD Opteron and HT designs.
This way, we get both extrardinary multi-core superscalar execution for the usual code (integer, business etc), and really nasty yet well-fed vector FP monster with capabilities that makes SSE3 or Altivec look positively puny. It may have made little commercial sense in 2002, but now a lot of multimedia and even gaming code can use SIMD parallelism very well, so there is a wider user base. As a side effect, it could - if the business side matches the techies - capture the high-performance computing market across the board. In other markets, such a CPU would be a clear performance leader in absolutely everything... just like its glorious predecessors were last century.
Time to wake up
So OK, this is just one possibility based on a lot of work that already happened sometime ago... as you can see,
if it was not murdered then, Alpha could have continued its performance leadership till now and beyond. AMD might be
doing Alphas now, too, rather than playing with the X86 till hell freezes over - after all, Dirk Meyer fathered the EV6
Alpha.
I have no idea how or if some of these Alpha-specific vectorisation and other trinkets could be applied to other architectures, neither Itanic nor X86 seem suitable for this. I also don't know if HP has the resources and/or goodwill and commitment to even discuss this internally, unless the new boss says so and maybe finds a mighty well-heeled external partner to bring about the resurrection.
AMD is probably out of the question: they bowed out when they had a chance in June 2001 to save the Alpha, and the X86 mindset has penetrated them to the core - pun intended.
One obvious party is the Chinese Government - China loved Alpha, and they still keep Eckhard's Compaq Tru64 Unix for Alpha source code over there. For them, outside software support is not critical, they have enough of their own applications to make the platform production viable.
Politically, what's the big deal? Lenovo can have IBM's PC business, and China oil co's might even buy the Taleban's ex business partners Unocal soon. So why not do something good for mankind and bring back to life something that can at least remind us how gross the X86 architecture is?
Talking about X86, the last candidate is its original author, Intel - looking at the mined seas that the Itanic is sailing through, no thanks to its architecture too, and maybe a new chip, something along the lines of "Intel Lucky EV8" with a good instruction translator from IA64 and IA-32/64 instruction sets, but otherwise looking a lot like above, may be a good fresh start away from all those Opteron headaches, while their X86 things take care of themselves. µ