I have never seen a major general-purpose CPU chip get a substantial design overhaul, increase caches, and new instructions AND get moved into the supposedly much faster process (after all it is a big jump from 130 nm to 90 nm!) - and end up slower per-clock, hotter and more power-consuming than its predecessor. And, to top it all, without any initial clock speed advantage...
What is really the problem with Prescott? Why did this happen, forcing even mighty Intel to abruptly change its plans last yeat and rush out repackaged XeonMP as Pentium4 Extreme Edition - which now seems to be poised to be the Intel performance desktop CPU for big part of this year?
Performance losses
It seems that, across most PC benchmarks and usual test apps, the Prescott performance on the same clock varies
between 5% faster and 15% slower than the normal Northwood core, or never faster and like 25% slower compared to the
Extreme Edition.
The doubled L1 and L2 caches, better branch-prediction logic and faster multiply ops should help - why don't they?
Smoking the (deep) pipes - staggering stages?
Even accounting for all the "current leakages" and other problems widely discussed, it was appalling to see the need for the pipeline to be deepened over half just to keep the clock speed - from already deep 20 stages to a staggering 31 stages! The higher branch misprediction penalty is one outcome, the other being slower performance in software where other instructions may wait for a result from a previous instruction to proceed... 11 more stages is 11 more clock cycles to wait, in that case.
Supposedly, the extra stages should give the Prescott core ability to scale up GHz wise much better in the future, however that future seems to be called Tejas, not Prescott, anyway. If the current Socket478 Prescott is pretty much limited to 3.6 GHz, a frequency quite obtainable with Pentium4 Extreme too, then Intel could have just as well simply moved the Northwood core to 90 nm first, and given it the extra 1 MB L2 to fight the Athlons better.
Test bed for something
Well, Intel didn't do it - why? Among the possible explanations, one surfaces clearly - extra major works in the
core, not publicised right now. Just like the Willamette core of Pentium 4 already had "test bed" HyperThreading logic
in, but it only got "official" late in the Northwood core, so the supposedly huge extra logic in Prescott could be a
"test bed" for some upcoming new major core extension which may officially surface in, say, Tejas CPU.
Is it SSE3? No - that stuff is already there, pretty much useless till the software starts to use the new opcodes. Large caches? They don't explain the extra CPU core transistors. Well, the only other major explanations could be either dual cores, or 64-bit core extensions.
Were the three years since AMD first officially announced the X86-64 initiative and its registers, instruction set and other attributes, enough for Intel to implement a fully compatible extension into their X86 cores? I believe yes, and I do believe that whatever "hidden" 64-bit extensions, if any, in Prescott, would be more-or-less along the AMD64 line. There could be some "extras" or, so to speak, "elements" of course, but there is no reason to believe that Intel would have tried to spite the fate and try to do a completely another set of 64-bit extensions, when a nice one was right out there for a free picking? Oh yes, the egos get bruised, and a certain famed "good 64-bit mammoth cruise ship" gets an iceberg-style hit, but if the money gets made at the end of the day, why bother?
Nocona's better luck
Besides having a good two quarters extra time to "sort out issues" and tune the benchmarks better, Prescott
workstation/server variety, Nocona, will have a few things going for it compared to its desktop flavour.
Firstly, it is expected to come out at a clear frequency lead over current XeonDP, at least 3.6 GHz vs current 3.2 GHz Xeon. Second, the major 50% FSB jump, from 533 to 800 MHz, and low-latenct DDR2 400 memory, will help those memory benchmarks. Thirdly, it will not have the socket 478 to 775 migration problems, since the new 604-pin socket (not compatible with the XeonDP ones, by the way) will stay put for the next few Xeon rounds, at least.
Finally, the supposed "good" numbers Prescott gets on SPEC ViewPerf workstation tests, pretty much the only tests where it beats the Northwood clock-for-clock, might just indicate where the core fits best, after all - 3D CAD & visualisation, anyone?
Give it some credit
In practice, to give it some benefit,running Prescott on some of the existing boards was not a major problem -
while it couldn't run on the very first version of Intel D875PBZ Bonanza board, the board's recent version handles it
well. So does the MSI 875P Neo, and Gigabyte 8KNXP - both Canterwood-based - after their BIOSes are updated.
Right now, I am putting the Prescott through some of the actual practical CPU-intensive apps, not just benchmark codes, to see how far things go. ยต