That's kinda disappointing, knowing that there are many users in need of combined computation and visualisation with fast 3-D graphics AND lotsa fast, big memory to handle the tasks ranging from 3-D and prepress to molecular modelling and quantum chemistry. Of course, they need a proper OS (that means UNIX or Linux) to run all that stuff reliably.
Except for HP's zx2000 and zx6000 offerings, based on McKinley and now Madison, there are no Itanium2 3-D workstations in the market, and IBM has finally come out with a decent (i.e. both CPU and 3-D are decently fast) POWER4+ 3-D station, the IntelliStation 275.
Then we have AMD Opteron - proper dual-Opteron workstations are due any moment now, using AMD, VIA, and probably most often, Nvidia chipsets. A dual 2 GHz 246 Opteron is, well, pretty much the leader in SPECint and memory bandwidth, and decent in SPECfp - however, it still lags somewhat behind dual Xeon 3.06 and dual Madison in quite a few 3-D and rendering tasks. That may change soon as 64-bit Opteron-optimised apps come to breach the speed gap and bring the 64-bit benefits, probably on Linux before the WindowsXP64.
So, that's pretty much the landscape - right in the middle of a chaotic "restructuring exercise".
Jobs rides into town
Then, the Mac cometh and screws up everything (for some other big guys in the ring right now). Well, we've all
seen Apple's PR kit and zillions of interesting comments all over the place. Let's start with unusual GCC-based SPEC
benchmarks (usually, every vendor compiles and tunes SPEC using the best compiler they can find for their platform,
never the average-speed GCC). Then comes the 64-bit platform, but the MacOS X is not (yet) 64-bit. Most importantly,
with exception of some Opterons, these are the cheapest dual-CPU 64-bit workstations in history, at US$ 3K for a base
dual 2 GHz config, and less then US$ 10K Apple web price for a fully stuffed 8 GB RAM, 2x250 GB HD, Radeon 9800 Pro 3-D
OpenGL (remember, this card is faster in 3-D than most proprietary 3-D cards on UNIX workstations) and 23-inch
1920x1200 UXGA-W HDTV CinemaHD LCD!!!
After all, MacOS X is a UNIX (although a bit unusual and "modified" one, to say the least), and these are the cheapest 64-bit dual-CPU commercial UNIX workstations, then. So what is the stuff really worth the look for a high-end workstation user in there?
Choice innards
So, the PowerPC 970 (or Power4 Lite, or G5, or call it whatever you want) yields were better than expected, and
there is a 2 GHz dual-CPU system right in the first line-up. In fact, I expect to see up to 2.4 GHz not far off down
the line for the initial 64-bit PowerMac generation.
Put those GCC-based SPEC results aside for the moment: if you're, say, buildng a cluster of PowerMacs for a visualisation supercomputer or a computing engine, what parameters will you look at? Well, the raw peak power of each 2 GHz G5 is impressive for such a cheap desktop: 2 mul-adds (4 FP ops) per cycle give you 8 peak GFLOPs (16 GFLOPs in a dual-CPU system) with the L2 cache able to stream in operands pretty much to sustain that rate. After all, this is a "lite" IBM POWER4+ processor...
Don't forget the Altivec-compatible engine with its own 32 x 128-bit register set that can add another four single-precision FP ops every cycle, and a scalable 64-bit (2 x 32-bit full-duplex) point-to-point path from each CPU (instead of a shared FSB) that always operates at half the CPU speed, running faster as the CPU gets faster. So, each 2 GHz G5 has its own 1 GHz, 8 GB/s "FSB" directly to the chipset, just like AthlonMP or Alpha were doing before - at slower speeds, of course.
The chipset will then be a kind of a crossbar switch to facilitate parallel transfers between different units without FSB-characteristic collisions. In the initial implementation, there is a simple 128-bit DDR400 (two channels combined) memory path with up to 8 DIMMs for 8 GB directly addressable RAM. And of course, HyperTransport link to two PCI-X buses for slots (one 133 MHz 1 GB/s slot, and two 100 MHz 800 MB/s slots on the second bus) plus one PCI-X for on-board peripherals, and, most importantly, a separate feed to AGP 8X Pro bus, to round up the full-fledge 3-D capability.
The secondary things, like SATA, dual-speed Firewire and USB 2.0, DVD-RW and other gadgets add up to form what could be industry's leading 64-bit workstation, if the OS X gets a quick 64-bit upgrade soon. The combination of high CPU throughput, good memory bandwidth, high capacity, lotsa software on a pleasant OS, and reasonable price is there.
Drawbacks
Firstly, the design is not as aggressive as it could be - if you look at Nvidia Nforce2 as a good example, a
graphics and I/O intensive workstation needs to have a memory bus faster than its CPU buses, due to the need to feed
long data streams to its fast graphics and I/O paths in parallel with the CPUs.
That means that, using the present DDR400 technology, you need to feed 16 GB/s total on two 1 GHz CPU paths, 2 GB/s on AGP 8X, and 3 GB/s on the PCI-X buses, altogether over 20 GB/s peak load, or an eight-channel DDR400 requirement to have spare load for any occassion. Of course, that may not be feasible, but a good compromise for a fast 3-D and imaging station would be a quad-channel or dual 128-bit bus DDR400 memory subsystem which combines twice the bandwidth (12.8 GB/s) of the current platform with twice the capacity (16 DIMMs for 16 or 32 GB RAM) via doubled channels. This may improve the real SMP app performance quite a bit, too, due to less competition for a single 128-bit DDR400 path.
Secondly, the standard AGP 8X Pro slot should allow way more high-end 3-D card choices - Nvidia's follow-on to Quadro FX2000, and 3DLabs Wildcat cards, are the obvious examples.
Thirdly, the box is a bit bulky: a slimline 1U or 2U rackmountable workstation for visualisation clusters could be a hit - just couple it with, say, Quadrics QsNet2 interconnect for fast 64-bit shared memory, and a good 3-D output aggregation device, and you could have a nasty real-time 3-D visualisation supercomputer in a rack...
Finally, combining a fully 64-bit MacOS X with a 64-bit native Linux distro for these boxes in a, say, dual-boot fashion, would help a lot to spread the platform further, but that would require some "opening" of the platform by Apple, obviously. After all, there already is 64-bit POWER4 Linux.
Repercussions
Well, the new 64-bit PowerMac may emerge to be an even more dangerous competitor to the nascent Itanium
workstation push than the Opteron. Firstly, no one can doubt pure 64-bitness of the POWER4 architecture transplanted
into the G5, and the applications from usual suspects will come anyway now that the Mac is back in the high-performance
game - after all, Mac could have more workstation and multimedia application support than all UNIX workstation
platforms combined. And, as seen in the previous paragraph, there is still aplenty unused performance on the platform
level to push the ante further even without major clock speed improvements.
Intel may really have to give a major push to the Itanium platform with a more revolutionary, maybe Alpha-inspired, part and still at the same offer a price competitive 64-bit followon to the Pentium/Xeon family. After all, IBM can play with multiple 64-bit platforms and even segment them nicely, why couldn't Intel?
The IBM POWER5 platform will still be faster, more complex and more expensive than the G5, but there will be workstation, server and blade-level AIX products incorporating the new "lite" CPU as well - don't expect them to be as cheap as a Mac, but the extra volumes will help IBM pSeries become more competitive.
On the other hand, Opteron and Mac might end up not just complementing each other, but being designed and made in the same IBM facility at East Fishkill - is it part of IBM's grand plan to take a revenge on Intel and Microsoft, two behemoths IBM literally gave life to? Also, note that both platforms rely on Hypertransport, which actually can be used as a very high performance I/O path on its own, without need for something like PCI Express...
In a year's time, the next rev of G5, while officially not related to POWER5, is expected to have similar multithreading and improved FP throughput capabilities, while still keeping the low power of the current family. Combined with the compiler and software improvementd, the roadmap might give us the fastest desktop platform around - but will Apple make full use of it, or screw up a great opportunity as they did many times before? µ