It is not just large, medium and small clusters where the Opteron has landed quite a few wins worldwide, but also many individual workstations and servers, especially now that the quad-CPU platform is becoming more ubiqutous.
The upcoming February prices for Opterons show a major reduction in high-end quad-CPU 84X series Opteron pricing, allowing for many more quad-CPU 64-bit servers, as well as - why not - fast quad-Opteron workstations with not just up to 32 GB of linearly addressable DDR400 memory with 25.6 GB/s bandwidth, but also two AGP8X slots if you wish.
Yes, in theory you can attach two AGP tunnels - one on each CPU - with separate bandwidth pipes, just like AlphaServer ES47 "Marvel" has two independent AGP buses, one on each EV7 Alpha. Remember, Opteron is an EV7 Alpha concept brought into the high-end PC market.
The Nvidia Nforce Pro chipset series now helps deliver even more efficient AGP 8X, plus high-speed low-latency built-in Gigabit Ethernet over an internal HyperTransport link. While its production boards are not out yet, the AMD 8111/8131/8151 chipset combo does just fine in the meantime, providing AGP 8X and multiple PCI-X 133 buses if needed - a marked contrast to the clumsy, feature-poor chipsets for AthlonMP few years ago.
Most importantly, not only does Opteron have little heat dissipation problems with its proper heat spreader, but also, it seemingly creates very little heat in the first place. Quad-Opteron blades or superdense brick are possible, with low-voltage Opterons possibly drawing far less than 30 W, yet running at close to the full Opteron rated speeds.
More needed
Still, we need a few more things to happen here - first of all, the SSE2 unit in Opteron has to be "fixed"
throughput-wise. Since AMD will anyway have to update that portion in the next round to incorporate the Prescott SSE3
instructions, they may just as well use the opportunity and double the SSE2 FPU throughput in 64-bit precision to two
operations every cycle, rather than two operations every two clock cycles like it is now. The Pentium 4/Xeon also does
it every two cycles, but its cycles are almost 50% faster, remember?
With double throughput in SSE2, a, say, 2.5 GHz Opteron would achieve 10 GFLOPs peak throughput per CPU vs 5 GFLOPs right now. Even though the real application improvement would be lesser (after all, we all have to wait for the memory to load operands and store results once in a while) due also to reduced per-clock efficiency (register renaming resources come to mind when fixing the Linpack GFLOPs benchmarks), it would still help Opteron qualify for more supercomputing wins.
Then, it would be a good idea to start thinking about large-cache Opterons for the high-end, low-volume, high price market, along the likes of long-forgotten "Mustang" AthlonMP with 2 MB cache, or for that matter XeonMP/P4 Extreme Edition. It is obvious that large cache does bring decent benefits, so why not have an "Opteron Optimum" or "Opteron Premium" then? Oh sorry, talking about that, "Pentium Premium" would have sounded so good on the P4 XE ... especially referring to the pricing...
With a 90 nm process, AMD should have no major problem producing an "Opteron Optimum" with a straight 4 MB L2 cache on-chip (or 1MB L2 and 4MB L3 with somewhat higher access time). Large cache could also help increase the performance in a dual-core Opteron flavour, where each CPU core on the chip has its own 1 MB L2 cahce, but they share a larger L3 cache and the non-blocking memory bus. While one core accesses the memory, the other one can be reading from L3 cache, without contention.
Itanic fires salvos from its cannons?
Besides the usual problems of porting and optimising software for the Itanium, even Intel started to approvingly
nod their head at the mentioning of memory bus contention being the major problem for the Itanium. After all, large
cache or now, a quad-CPU Opteron 848 has 25.6 GB/s memory bandwidth, compared to just 6.4 GB/s in a quad-Itanium2. In
large scientific or technical applications with huge data sets, even a 6MB cache will be thrashed often. So, the main
memory access has to be sped up.
Before delivering the Madison 9M, which might run as fast as 1.67 GHz on a 667MHz FSB - 800 MHz FSB might not be do-able there yet, Intel is expected to, in the next few months, release updated versions of the existing Madison Itanium2 with either 533 or, possibly, 667 MHz FSBs. After all, if 667 works fine, why bother wasting time to qualify the 533 as well? The clocks may reach as high as 1.6GHz, but probably not beyond that.
Also, Itanium seems to suffer from the same problem that a platform still dear to me, Alpha, unjustly suffered most of its painful life. That's a lack of ported or committed software applications. At that time, it was Intel which won over Alpha in the Windows NT Workstation market because of legacy compatibility, despite poorer performance (no, I didn't mention any vendor pressures or alike tactics). Isn't it ironic that now, Itanic might be beaten by the Opteron in the very same way? After all, Itanic does have a bit better floating-point scores, but it looks like Opteron's legacy compatibility is just what the doctor has prescribed for risk-averse IT manager's hearts. ยต