One of the hallmarks of Microsoft is that we dream big - Steve 'Understatement' Ballmer
WE'VE REVIEWED a spread of Intel's Nehalem Xeon CPUs here over the past few months. Our coverage has ranged from the W3580 as a single-CPU workstation twin of the Core i7 975XE to the dual-CPU W5590 running at the same speed as the W3580 but doubling the total system core and memory channel count. We also took a look at a variety of mainboards from Tyan, Supermicro and Asus along the way.
But what was the real feature and performance improvement in those new generation processors versus their immediate Xeon 5400 series predecessors and the related mainboard platforms, and what could have been done better in both cases?
What I'll do in this installment is compare the features, setup and overall performance of the choice high-end platforms from both generations, using the top-end Tyan S5397 mainboard with the i5400 Seaburg chipset for the "Harpertown" 5400 Xeons and the Tyan S7010 mainboard with the i5520 Tylersburg chipset for the "Gainestown" 5500 series Xeons.
The older Xeon 5400 series in its last incarnation - the Xeon X5482 and X5492 processors and the QX9775 unlocked multiplier version - were 45nm dual die quad-core CPUs with a total 12MB of on-package cache and a FSB speed of 1600Mhz. The i5400 chipset had a dedicated FSB to each of the CPUs, as well as four FBD-800 memory channels to feed them all - not to forget a huge 24MB cache snoop buffer to copy all of the L2 cache contents in the chipset. So total bandwidth was never a problem here, but latency was - the FB-DIMMs, while based on a good idea to buffer the DRAM loads away from the memory controllers and parallelise read and write requests, added a bit too much latency and heat.

If there had been an updated Seaburg with four channels of standard ECC DDR2 or DDR3 memory replacing the FB-DIMMs, memory-related benchmarks for these Xeons would have looked far better. In fact, it would have kept AMD at bay even in some memory-intensive apps prior to the Nehalem arrival.
The Tyan S5397 mainboard is the maxed-out incarnation of the i5400 chipset platform for these dual Xeons, with everything you can think of included - two full PCIe x16 v2 slots for GPUs, plus one x4 slot for storage, on top of 64-bit PCI-X and 32-bit PCI slots. The whopping 16 FB-DIMM 800 memory slots for up to 128 GB RAM if you dare use the 8GB modules are supplemented by a 4-disk onboard SAS RAID controller on top of the usual SATA ports. Dual Gigabit ports, plus the availability of old but proven serial ports and equally old integrated graphics just in case, round off the feature set on this Extended ATX sized board.
The usual Intel server CPUs, like the X5492 3.4GHz FSB1600 Xeon and the X5470 3.33GHz FSB1333 Xeon, worked fine on the mainboard. The X5470 E-stepping CPU is itself a fantastic overclocker on any board that supports FSB speed override due to its 10X multiplier. On my now dead and awaiting repairs forever Skulltrail mainboard, this CPU was doing regular dual duty at 4GHz and FSB1600 every day, rock stable at 1.36 volts.
Speaking of the Skulltrail, I tried the QX9775 processor - basically the multiplier-unlocked X5482 Xeon - in the Tyan board but, even after a BIOS update, it didn't work. That's a pity, because the Asus Z7S-WS workstation board did recognise it fine. I really wanted to see if the multiplier scaling on the Tyan board would give us a dual 4GHz 8-core 128GB RAM monster suitable for either large model engineering and scientific analysis or maybe gigantic World of Warcraft armies' fight simulations. Anyway, that insane configuration included just 16GB of RAM in eight low power 1.5 volt Nanya ECC DIMMs.
And regarding the BIOS update, can Tyan, Supermicro and other vendors stop relying on ancient BIOS update floppies from DOS prompt (and with HIMEM.SYS disabling requested) for it? It is oh so 1984 and many systems don't even have floppy drives today in the first place, and secondly, a simple password-protected BIOS flash update utility within the BIOS, like one finds on Asus boards, could make life much easier while also helping to prevent users from doing stealth BIOS updates behind the sysadmin's back.
This board will support ATI Multi GPU Crossfire, but not Nvidia Quadro SLI (or GeForce SLI, for that matter), as Nvidia only approves Quadro SLI with 'certified system vendor configurations'. I have tried it with two Quadro FX5800 4GB cards, and no it didn't work. Maybe Tyan and Supermicro can convince the Nvidia Quadro team to enable an 'unsupported SLI' mode here?

On the other side of the table was the Tyan S7010 mainboard, its workstation board - but with only a single GPU slot - dual Nehalem Xeon entry. It has dual Intel Xeon W5590 3.33GHz processors with the 3.46GHz Turbo Mode that does work, for readers who have been mentioning conflicting Intel information about Turbo not working there, and also a total of 8 cores. If you enable HyperThreading, you've got 16 threads to play with, too. A whopping 48GB of RAM - 12 ECC server DIMMs of 4GB each - runs beautifully at DDR3-1333 speed across all DIMMs, giving us over 40GBps Sandra memory bandwidth results, the record to this day. The whole thing is cooled by the powerful yet silent Asetek dual LCLC hermetically sealed liquid cooling setup, which could easily support some overclocking there if the board enabled it.
It's a pity that Tyan doesn't have a board with the same 12-DIMM memory setup but full dual-GPU capability. Since these CPUs, as mentioned before, may be unlocked, I'd also like to see manual clock multiplier, base clock and memory latency setting options in the BIOS setup.
Now you've seen the big benchmark tables here before, so I'll just update the overall impression. If running at a near apples-to-apples comparison, that is, the same or similar CPU speeds, HyperThreading off, and default settings, the new Xeon W5590 will be just slightly faster than the X5492 in pretty much everything, except quadruple the memory bandwidth results.
However, the X5492 is some 20 per cent faster than the W5590 in Sandra cryptographic benchmarks, mostly due to the internal cache latency increase in the newer CPUs. This difference will only be corrected when the pin-compatible 32nm Gulftown-EP Westmere-based 6-core Xeon with 12MB cache and, hopefully, certified DDR3 FSB1600 server memory, enters the fray after the year end. Its AES instruction set extensions will speed up cryptography jobs by an order of magnitude.
Much the same story repeats in Cinebench testing where, without HyperThreading, the difference between the old and new Xeons is academic. The render routine fits nicely into the cache and, knowing that the peak floating-point throughput of both cores is similar, the net ray tracing speed ends up similar, too.
In terms of actual use, like messing with large AutoCAD models or Photoshop images, the Nehalem-based platform feels somewhat zippier due to the memory bandwidth benefits. The more memory dependence a job has, the more benefit there is for the new CPUs. And, inversely, the more cache-bound or tight code loop-rich the job is, and the less thread interaction across multiple cores, the less benefit there is for Nehalem, as the larger and faster L2 caches of the older processors will suffice, or in some cases will actually be faster than the newer CPUs.
In summary, the new Xeons are record holders in the workstation arena. AMD's Istanbul does manage to beat them in Linpack tests, and only Linpack it seems, but Magny Cours should do far better, though. However, the old 5400 series Xeons are still just a tad slower, but keep up their pace pretty well. So, if you chance upon good offers for these machines, do consider seriously. The systems and boards are just as capable, and FB-DIMM memories are going at fire sale prices now. But your power bills might be a bit higher, as the older Xeons were slightly more power guzzling. µ