A consumer is a shopper who is sore about something - Harold Coffin
I put together a system as on the photo: Xeon 3220 quad-core 2.4 GHz FSB1066 CPU with 2 x 4 MB L2 cache - a workstation equivalent of QX6600, cooled by Thermaltake's "Chinese Fan" copper fan. Asus' heatpipe system was additionally ventilated by its own vertical fan on the VRM side, as well as one directional fan I let lean on the north bridge to stream air directly between the copper fins.
The 2 x 1GB Corsair Dominator 6400CL3 DIMMs were cooled by Corsair's own fans - the alternate choice, Geil, was on its own fan-wise - and Asus EAH2900XT R600XT board of course has its big ATIMD cooling system in place as well.
With a new process for the P35 chipset, I expected better performance at lower NB/SB voltages. At the BIOS boot stage, things went very well at the start - the frequencies achieved were a bit higher than Nforce 680i or Intel P965. The X3220 booted to the BIOS nicely at 3.5GHz CPU/FSB1750 - with memory set at DDR2-875 CL4 in this case. The CPU temp was around 60 C in BIOS hardware monitor after 5 mins of idleness. I tried 3.6 GHz FSB1800, but it didn't power up at all.
For the memory tests, I could get the system to power up at 3.33GHz CPU/FSB1667 and memory at DDR2-833 CL3-3-3-5, but it couldn't boot Windows, whether I used Corsair or Geil memory - which both work fine on Nforce at this setting. The fastest high-FSB speed at which Windows booted on this first P35 was 3.2 GHz CPU/FSB1600. OK, so the RAM worked as DDR2-800 CL3-3-3-5 at 2.3 volts for Corsair and 2.25 volts for Geil, surely not bad.
Now comes the fun part - the first benchmarks. I ran Sandra 2007 SP1 on WinXP 64, and Sandra XI SP1 on WinXP32. The CPU results were as expected, but the memory ones were sorely disappointing.
A configuration which, at the same clock and FSB on Nforce680i Asus Striker Extreme, gives me in excess of 8300 MB/s Sandra bandwidth and 57 ns latency (and some more, a linear 4% increase, at FSB1667 reliably), gave just 6850 MB/s or so on the brand new P35! That's really bad for FSB1600 and DDR2-800 CL3-3-3-5. I'd get the same speed on the old 975X at FSB1220 and DDR2-610 CL-3-2-2-5!
Then I checked the latency benchmark. Compared to 57 ns Nforce 680i numbers, here, with the same FSB and memory, we had 76 ns. Of course, not as bad as 118 ns on dual-Xeon FB-DIMM Greencreek chipset, but still some 35% slower than Nvidia. To give it further chance, I went in with a trial of Everest Ultimated Edition benchmark, running memory read and latency - see the results underneath.
Then I changed the memory timings to DDR2-1000 CL4-4-4-6, while keeping the FSB1600. On Nforce 680i, you'd hardly get anything this way, as the chipset is well optimised to use the fully matched in-sync bandwidth of DDR2-800 dual channel with FSB1600. The extra latency and sync overhead would eat up that little bit of bandwidth advantage.
But not on the P35! The speedup experienced wasn't close to the Nforce numbers, but still comparatively huge. 7530 MB/s memory bandwidth, and 69 ns total latency - this is a 10% gain! This is the first time I see such behaviour on a very high speed, above 1500 MHz, FSB. Of course, keep in mind it's still far less that Nforce numbers with either memory setting. Quite an oddity.
It looked like the either of two possibilities: either Asus BIOS is early and grossly unoptimised (possible, but unlikely), or Intel changed the chipset design: the P35 memory controller itself isn't optimised anymore for matched in-sync bandwidth situations (like FSB1600 with low-latency DDR2-800 CL3) but likes the arrangements, even async, where high-latency high-MHz RAM is used. So, basically, it seems to be a DDR3 optimisation approach overall, even when using (still faster) DDR2 DIMMs? I'd like to know if so, and why - after all, we can all see that DDR3 ramp up is not that quick, and good DDR2 optimisation is still important for the next one year or so.
After being alerted about the problem, Asus then offered to supply the brand new BIOS v310, which just arrived few hours prior to you reading this story. I eagerly went ahead to spend my evening messing again with the board. Guess what, for the same settings at DDR2-800 CL3-3-3-5, the performance was a little higher. Memory in Sandra was now 7030 to 7035 MB/s, while latency went down to 71 ns, a whole 8% improvement from the old BIOS. Still, the overall lag vs Nforce 680i at the same frequencies and memory was some 20% - not exactly good. Everest Ultimate benchmarks confirmed the same story: Memory Read at 8034 MB/s in new BIOS vs 7622 MB/s under old BIOS, and Memory Latency of 58 ns vs 63 ns previously - and vs 10693 MB/s & 46 ns on Nforce-based Striker Extreme at the same clock.
In summary, Asus P5K Deluxe is a good board, with top-notch features, and, more importantly, excellent heat-pipe system. Even after the BIOS update, I feel the chipset memory controller design is a problem if you tend to use in-sync low latency matched bandwidth memory-to-FSB approach, like my favourite DDR2-833 CL3-3-3-5 timing for FSB1667. In such cases, Asus' own Striker Extreme is the best choice still - Bearlake feels really Bear-like compared to it, performance-wise.
However, if you got hold of some of those new Corsair, Geil or G.Skill DDR2-1200 CL4-4-4 modules, and are willing to constantly fry them at above 2.4 volts to keep that performance, then maybe this P5K Deluxe or its upcoming enthusiast sibling, the Asus Blitz Extreme, maybe worth checking out. It does seem that P35 loves that kind of async high-bandwidth who-cares latency memory - not unlike the DDR3 approach, isn't it?
I hope that Intel does make the X38 memory controller more flexible. DDR3 does take some time to get off the ground, and I feel a good one year will pass till we have really good DDR3 parts - read DDR3-1600 CL6, or better. In the meantime, no thanx to the AMD AgenaFX, DDR2 will continue to evolve speed wise for a while.
If the smooth FSB2000 and above capability, rumoured of the X38, materialises well, it may make sense to, besides full DDR3 optimisations for new-generation low-voltage memory, let it also make full use of upcoming DDR2-1000++ CL4-3-3-5 modules, allowing us for the first time to break the 10 GB/s Sandra memory results on an Intel X86 PC plaform - AMD did it last year, when AM2 showed up. ?