The Inquirer-Home

Harpertown benchmarks show a monster in the making

Core Wars Barcelona needs a big speed bump to compete
Tue Sep 18 2007, 23:21
JUST A FEW DAYS prior to the annual IDF autumnal jam, combined with the fallout from AMD's Barcelona launch, we got hold of yet another test setup from Intel. This one was more interesting than usual: the first 45nm "Penryn" generation, dual-socket Harpertown platform, based on a SuperMicro X7DWA mainboard in a SuperMicro black casing and 650W slim elongated power supply, which can take in a second redundant PS option as well.

For some reason, while other vendors like Asus and Tyan can also be very quick with the new design spins each round - and despite SuperMicro's recent AMD support too - Intel usually sticks with SuperMicro for the initial workstation and server test platforms.

The initial range of FSB1600 Penryns will include CPUs at up to 3.2GHz, the Xeon X5482 parts - what we got, though, was the a notch below - a 3GHz FSB1600 configuration with two 12MB cache quad-core bins put together. As you can see, Intel uses half-clock multipliers in Penryns, just like AMD uses them in Barcelona.

While we were somewhat upset at not getting the highest-speed 3.2GHz grade yet, the 3GHz spin was fine for direct, clock-for-clock comparison against the few months old V8 platform with dual 3GHz X5365 65nm Clovertown processors on a FSB1333 Greencreek chipset. In a wee window off opportunity prior to San Francisco plane taking off, we managed to put a few comparative benchmarks together - some of these tell you the fine-print difference between the two successive generations.

More will follow after IDF, when we run the Linux round as well. Our second round of tests will contain further comparative benchmarks to complete the picture, including the Linux parts, as well as the expected product pricing and choices initially available.

The combo server/workstation platform was equipped to the hilt: eight FB-DIMMs provide 16GB RAM. Finger-burning hot? Not this time, even with just a guided airflow path from the CPU fans - the DIMMs are that 1.5v low-voltage, yet low latency Nanya DDR2-800 CL5 variety that we saw at the last Computex. These DIMMs seem to also help the unexpected major performance increases in some compartments.

alt='harpsys'

An Adaptec SAS controller with twin 146GB 15K rpm Seagate drives as the storage, while Nvidia Quadro FX4600 took care of the workstation graphics, sitting in one of the two PCI-E x16 v2 (double speed) slots. As mentioned before, Seaburg provides for two full PCI-E x16 v2 slots while still having several spare PCI-E I/O lanes on a side.

We installed both WinXP 32 and WinXP 64 for the first test round. Linux and (on a separate HDD) Vista64 will be added to the mix for further benchmarks.

After getting the Barcelona benchmark update from friends who had it early, and our own previous 3GHz Clovertown runs, we expected the new Intel configuration to be slightly faster than the Clovertown, and slightly more than the Barcelona too.

And, while some tests showed only minor speedups, there were a couple of truly surprising major jumps, especially in the critical areas of memory access, where Intel's high-end workstation and server platforms were recently far weaker than AMD's.

Here are the initial results, including a few that compare it vs Clovertown:

alt='harperbench'

You can also see the Bapco SysMark 2007 Preview 32-bit round, as well as Cinebench R10 64 bit and SpecOPC ViewPerf 10 64-bit scores in the screenshots. We'll be re-running the last two with Leadtek Quadro FX5600 as well.

alt='harpbapco'

alt='harpview'

As you can see, the Sandra memory throughput went up double and then some - the Seaburg chipset reduces the latency somewhat compared to Greencreek, but it also has far more efficient cache snoop buffer and, more importantly, optimised FB-DIMM memory controller that finally uses the concurrent read-and-write FBD capabilities well.

alt='cine10-harp3gfx4600'

alt='harpsancpu'

alt='harpsanmem'

The result is astonishing: while the Clovertown/Greencreek combo was at less than half AMD's memory benchmark performance, the Harpertown/Seaburg combo is within 20 per cent of the fastest dual Socket F Opteron platforms. So, AMD is still somewhat faster - and Barcelona cores add a bit more due to the memory controller optimisations - but now the difference is tiny enough - combined with triple the effective cache capacity on Harpertown after allowing for Barcelona exclusive cache - to push aside memory as Intel's application performance bottleneck.

It also helps on Linpack 64-bit FP throughput tests - last month, during the Oregon lab test runs, we hit 77 GFLOPs Rmax out of 102 GFLOPs Rpeak on the dual-socket 3.2GHz Harpertown, compared to 65 GFLOPs Rmax out of 96 GFLOPs Rpeak on 3GHz Clovertown. This is 20per cent speedup for a 6per cent clock difference... see screenshot.

alt='linpackharp30000'

These scores are important for another reason. Intel's first dual-socket, high-end desktop Skulltrail round uses this same chipset and four channels of FBD, but with further speed optimisations like T1 command rate, less RAS overhead and, well, overclocking on both FSB and, yes, FB-DIMMs. If the results of their joint work with the likes of OCZ, Corsair and Kingston, give us, say, CL4 FBD at above DDR2-1000+ runs, combined with the dual FSB2000+ off 4+GHz dual water-cooled Harpertowns, we'll be in for some truly interesting scores yet again. Let's see how the Skulltrail does later today here at IDF then.

As for the CPU scores, again the same story as on the Caneland - these are the record scores on the dual-socket platforms for now, with slight jumps over the Clovertown. Our Charlie will be up soon with our own Barcelona numbers, so brace for a round of interesting comparisons right after IDF.

Note the Cinebench V10 scores here - for the first time on this benchmark, the CPU scaling on eight cores has crossed six times, while the Povray v3.7 is the usual near eight times for eight cores. In summary after this quick pre-IDF look, this is the fastest-yet dual-socket desktop/workstation/server/HPC node today.

Far more important is the new found overall system balance, or yin/yang harmony if you will, where "yang" is the raw masculine CPU and cache power, and "yin" is the soft, feminine memory bandwidth and throughput, which feeds the CPU consistently well, without the major bottlenecks seen before. Call it a good engagement on a way to a happy marriage, which I guess is the first batch of Nehalems in nine months from now. Oh boy. Anyway, AMD's Opteron generation always had an advantage on this aspect - that advantage is still there, but now it is "minor" instead of "major".

Should AMD be truly, madly, deeply concerned about Intel's new baby? In my mind, yes it should. It has a clear overall CPU/FPU speed advantage, fixing some loopholes where AMD beat the Clovertown before, combined with vastly improved memory throughput to close the last critical gap.

If AMD manages to pull the (possible new stepping) miracle and get, say, a 2.8GHz Barcelona out of the door in measurable volumes by year-end, that would be a coup de grace and a welcome event to all of us. Whoa, AMD will stay alive, no Intel monopoly for now! The hacks, ANALysts, PRs and all the other gang of that ilk can keep their jobs, as there will be more things to write about.

Humour - or relief - aside, I believe, based on the Harpertown's initial results, AMD needs a 2.8GHz Barcelona part to have a strong, balanced competitive position against a 3.2GHz Harpertown.

With benchmarks we saw at friends' places in Taipei, Barcelona is still slightly faster clock-for-clock in a few benchmark types, whether it is SPECfp or some database and non-SSE3/4 compression routines, as well as memory benchmarks. However, the current Oct/Nov shipment expectations, with 2.2GHz Barcelona vs 3.2GHz Harpertown, more than negate Barca's slight per-clock advantage.

On top of all this, hidden sources' silent whispers say that Intel may be able to go towards 4GHz on the Penryn generation real soon now - as in, by year-end - even with air cooling if they wish, after all this is the 45 nm process tuning test baby prior to the Nehalem's expected ~ Computex 2008 launch.

So, Intel could reasonably quickly shoot out a 3.6GHz or even higher speed TDP-busting but still production-grade Harpertown part if need arises. Will the same scenario repeat on the desktop? We wish AMD best of luck - they will need it against Chipzilla's new 45 nm weaponry. µ

 

Share this:

blog comments powered by Disqus
Advertisement
Subscribe to INQ newsletters

Sign up for INQbot – a weekly roundup of the best from the INQ

Advertisement
INQ Poll

Heartbleed bug discovered in OpenSSL

Have you reacted to Heartbleed?