These quad cores are controversial from the start, and pose a lot of questions for the technology buyer. At the end of the day, it comes down to one thing, are they fast? The answer is an unequivocal "yes, but".
Let's kick off what is launching today. There are five parts, the Xeon X5355 at 2.66GHz, E5345 at 2.33GHz, E5320 at 1.86GHz and the E5310 at 1.66GHz. The odd man out is the non-Xeon QX6700, an Exscream Edition chip running at 2.66GHz and aimed at gamers.
All the parts that end in a 5 have a 1333HMz FSB, those that end in a 0 have a 1066FSB. The X parts have a 120W TDP, Q prefixed come in at 130W, and the run of the mill E cores have an 80W TDP. For four cores that is nothing short of incredible, 2.33GHz Woodcrest cores at 20W? The mere thought of that would have had people howling with laughter 18 months ago.
The first bombshell Intel dropped is the price, and to me it is by far the most compelling part of this story. Conroe and Woodcrest are very fast CPUs, no question there, they have brought Intel back into the price/performance game. The real question in many people's heads was what sort of a premium would Clovertown be at, and would Intel price it to negate the price/performance advantage of the added cores?
When this slide was shown to us a few weeks ago, jaws dropped. What Intel is saying is that bin for bin, Clovertown will be priced equally to Woodcrest. The price premium in this case is zero, Intel is trying to put the boot in big time.
OK, not exactly zero, the game it is playing is bin for bin with a little fudging of what bin means. The top bin of Woodcrest is 3.0GHz and the top bin of Clovertown is 2.33GHz, the 2.66GHz part is a Bin+1 or Extreme in Core4P parlance. The X5355 is $1172, E5345 $851, E5320 $690, E5310 $455 and the QX6700 is priced at $999 as all Exscream products should be.
That binning quibble aside, you are getting two Woodcrests for the price of one, a really hard thing to argue against. There are some apps that will do better on dual cores because of the added 667MHz, but for most server apps, Clovertown will come out better. This is where the "Yes, but...." part comes in.
With price out of the way, we focus on performance, and there is a lot of ups and downs here, mostly ups however. For the rest of this article, I will assume the applications thread fairly decently unless I specifically state otherwise. A single threaded app will do better on Woodcrest because of clock, the end. The real action lies in the strengths and weaknesses of the dual dual die Clovertown.
The basic architecture is the same as Woodcrest, two FSBs running at 1333MHz if the chip can pull that, on the Blackford chipset. There are four channels of FBD memory running at hopefully half the FSB clock for optimal performance.
The first potential gotcha comes from that FSB. Is it enough? The answer is it depends. Intel is in many cases FSB bound for performance, and there is little it can do to fix it in the short term. Woodcrest was supposed to come out on a 1066FSB but due to one engineering miracle or another, it pulled off 1333. This gave it a decent performance boost.
The problem is that with Clovertown, it didn't raise the FSB, so it is probably leaving a bunch of performance on the table. If Intel could pull off a 1600 or 2132FSB you know it would, and the gains would be tangible. Instead it has large high speed caches and get to the same point by paying for things in silicon real estate.
As you can see, the performance loss from a single bus step ranges from 7-9% on memory light Int apps to a high of 23% for memory intensive FP apps. The real world comparison between a Woodcrest 3.0 and a Clovertown 2.66 shows that things are not nearly as negative as that one dimension would lead you to believe, that chart does not take into account the doubling of cores. This one does, and the numbers are very different.
Look at what it says. On the left, you have single threaded apps where raw clock speed rules and threading mattes little. In the middle, you have mostly HPC and FP heavy apps where a lot of bandwidth is needed and Clovertown is choked by the FSB or lack thereof in addition to scaling losses. On the right you have Int code that is far less FSB sensitive and can run out of the cache much more often. SunGard comes very close to perfect scaling (2.66 * 2 / 3.00 = 1.78).
What this says is that Clovertown has strengths and weaknesses, but almost all apps showed a 25% speed boost or more, and only 3 of 17 lost performance with a quad core part. If you assume linear scaling for the 2.33GHz Clovertowns, you go from about a 10% benefit to about 55% on the high end. The three that lose performance will lose more of course.
What this means is that even with the bottlenecks, the raw horsepower of Clovertown shows through and beats Woodcrest in the vast majority of cases. Of course, if you are buying one for your own apps, test and retest, but odds are good that you will see a decent gain.
Here is where the price part of the price performance mix comes in. With Clovertown at a zero price premium and a performance gain of 10+%, why would you not buy it? Power I hear you say? Well, compared to a 3.0GHz Woodcrest, there is no power penalty for a 2.33GHz Clovertown. That is the good part.
The bad part is nothing the end user will ever care about, more of a technical curiosity. If you look at the power situation, Woodcrest is a 65W part up to and including the 2.66GHz bin. 3.0GHz ups that to 80W, so you can be pretty sure that 2.66 Woodys are running very close to that 65W limit, and 3.0 parts have a bit of headroom.
65W * 2 dice is 130W, pretty close to the 120W envelope of the X5355. To make the high end parts, not much cherry picking will be required. Now if you assume that the 2.33GHz parts are hugely more efficient that the 2.66s and instead of being at 65W they are at 55W, and you take two of them, you have a 110W draw.
While 110W is less than 120W, it is a lot more than the 80W they are specced for. Can you say cherry picking? I can see the E5345 being the part in the shortest supply, it is constrained in a lot of ways that the SKUs above and below it are not.
Where do things end up? Is Clovertown a clean kill over Woodcrest? Certainly not, but in most cases it is a worthy upgrade even before cost is taken into account. If you look at the numbers from 2CPU, you will see this clearly illustrated. On the SunGard test, you get amazingly close to the theoretical maximums in the real world.
In the 3DStudioMax test, you get a complete mixed bag of performance. The odd thing about these numbers is they are generated from the same program but have wildly varying results. Using the same code with different data sets, if you look at Woodcrest vs Clovertown, they swap the lead on almost every subtest. It just goes to show that there is no clear answer to the question of which CPU is better, they are both good at different things.
There is a more pertinent question out there for the IT management crowd, not which one should I pick up, but should I bother to upgrade at all? We conducted two tests with a Clovertown 2.33/1333 and a PD 3.6GHz, a compile of FreeBSD 6.2Beta3 and some encryption tests, both under BSD.
# date ; make -j 8 buildworld > /dev/null ; date
Wed Nov 1 12:16:27 CST 2006
Wed Nov 1 12:30:30 CST 2006
14 minutes 3 seconds on the Clovertown System.
# date ; make -j 2 buildworld > /dev/null ; date
Wed Nov 1 13:50:52 CST 2006
Wed Nov 1 14:31:18 CST 2006
40 minutes 26 seconds on a PD.
On an unoptomized system with no tweaked anything, just a clean install of FreeBSD 6.2Beta3, it compiled itself in 1/3 the time of a PD that had more than 50% higher clock speed. You could also look at it through the cores lens, 4x the cores, 3x the speedup. Not bad at all.
People who are much more in tune with how compilers work tell me that there are several points in the compile process that are more or less completely single threaded. With this in mind, it makes the speedup of Clovertown more impressive.
How about single threaded apps? Is Clovertown worth it then? In a comparison against Woodcrest, the answer is no, but against a PD based Xeon, yeah, it is. Remember, the PD is at a 50% clock speed advantage on these tests.
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
CT - Many salts: 2215K c/s real, 2218K c/s virtual
CT - Only one salt: 1888K c/s real, 1888K c/s virtual
PD - Many salts: 874444 c/s real, 879944 c/s virtual
PD - Only one salt: 777856 c/s real, 780294 c/s virtual
Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE
CT - Many salts: 72035 c/s real, 72143 c/s virtual
CT - Only one salt: 69869 c/s real, 70392 c/s virtual
PD - Many salts: 30054 c/s real, 30243 c/s virtual
PD - Only one salt: 29644 c/s real, 29737 c/s virtual
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
CT - Raw: 8344 c/s real, 8331 c/s virtual
PD - Raw: 8485 c/s real, 8552 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
CT - Raw: 332 c/s real, 332 c/s virtual
PD - Raw: 425 c/s real, 427 c/s virtual
Benchmarking: Kerberos AFS DES [48/64 4K]... DONE
CT - Short: 309708 c/s real, 309708 c/s virtual
CT - Long: 912941 c/s real, 914310 c/s virtual
PD - Short: 183552 c/s real, 184127 c/s virtual
PD - Long: 491980 c/s real, 493523 c/s virtual
Benchmarking: NT LM DES [128/128 BS SSE2-16]... DONE
CT - Raw: 12341K c/s real, 12360K c/s virtual
PD - Raw: 6590K c/s real, 6611K c/s virtual
Thanks to Paul (redacted) for running these benches for me
As you can see, the Clovertown won four of six, tied one and lost one. The wins tended to be by large margins however. In any case for compute heavy tasks like encryption, it is fairly algorithm dependent, but for the most part Clovertown is head and shoulders above the older cores.
At the end of the day, there are almost no down sides to Clovertown if your code threads. There are scenarios both real and contrived that make it fall over and lose to Woodcrest, but they are few and far between. If you have a good handle on your code, or there is a newer version coming that threads well, by all means, buy the Clovertown.
With no price premium, Intel has laid the gauntlet down for AMD. They have put out a quad core part, 'true' or not, that simply raises the performance bar by a fair margin at little or no power cost. Unless your code does not play well, go for the Clovertown, software will only grow into it over the lifetime of the server. For the next six months, Intel's biggest challenge will be to make enough of them. µ
Sign up for INQbot – a weekly roundup of the best from the INQ