I'm astounded you guys [analysts] tolerate their [Intel's] margin collapse - W.J. Sanders III
In the end, what I ended up with was far from comprehensive, far from complete, but there was enough data there to make it worth while to share. The subjects were current servers, dual and quad cores from AMD and Intel. All four boxes were made by Colfax and supplied by AMD. If you want to read something into this, there may be follow ups to this that do not involve Colfax hardware. Think soon.
Staring at four dual core servers and a bunch of spare parts, I started tinkering. The initial players were all based on Supermicro chassis of various types. I had an Opteron @ 2.6GHz and a Woodcrest at 3.0, both with a pair of striped Seagate 160GB drives. Next to them was a Clovertown box at 2.33GHz a Barcelona system at 2.0 with a pair of 1.9HE chips beside it, each had a single Seagate 400GB SATA drive.
It is also worth noting that all four of the systems had identical Ablecom PWS-702A-R1 700W power supplies. The Barcelona and Clovertown boxes had a redundant PSU but that was disabled for this test. The Woody and Opteron are about as directly comparable as you can get, as are the Clovertowns and Barcelonas. The two groups are going to be off by whatever the drives draw, a few watts at most, but the HDs are barely touched in this testing.
What was I trying to measure? Power used with different memory configs. If you are testing hardware, beware, this is about the most annoying thing you can do, and those damnable DIMM slots will tear your fingers up before you reach the half way point. There are better ways to spend a weekend, but sadly I did not take my own advice.
The plan was to pull out the trusty Extech 380803 power meter and measure each box under idle and CPU loaded conditions with different memory configs. Idle was just that, and to pound on the CPUs, I used Valve's Map Compilation Benchmark, a piece of software that builds part of the Citadel level. It is about the best tool I have run into that will push as many threads as you need for enough time to get a decent reading.
It is fairly small so disk access should be minimal, but sadly memory access should be minimal as well. It is not the optimal tool for what I am doing, but this benchmarking session was far from precisely planned so it evens out. If you have suggestions for other software to run next time, drop me a line.
In the end, the map compilation did what it needed to do, and pegged all of the CPUs for a decent length of time. Only the peak wattage was measured, I was simply looking for a number, not total work done.
Memory was set to 4, 8, 12 and 16G, corresponding to 1, 2, 3, and 4 DIMMs per channel. For AMD based machines, ECC Registered DDR2-667 was used, specifically ATP AL28K72L8BHE6S , and Intel boxes had ATP AP28K72S8BHE6S FBD-667 sticks. All were 1G per DIMM, and I had a lot of them.

All testing was done on MS Server 2003R2 patched to current. Machines were booted with 4G of the applicable memory and sat on the desktop with no apps running. Idle power was measured when the power stopped fluctuating. The Valve Map Compilation Benchmark was run and the peak wattage used was recorded.
In the end, a picture is worth a thousand words, so here are the results. For those that want the full data set, write me and I'll send it along. Speeds are in GHz, Ram in GB, Idle and Load in Watts and Time in seconds to complete the run.

The number show several things quite clearly, and they all relate to one critical component, FB-DIMMs (link). AMD boxes at idle run about 150W +/- 10W. The Intel boxes start out in the low 200s and go to the low 300s once you hit 16G. Under load, the Opteron and the Barcelona 2.0 are around 270 watts, plus or minus a bit, and the 1.9HE about 30W less. Both the Woodcrest and Clovertown servers ranged from 300 to 400W depending on memory footprint.
On raw power consumption alone, this is a clear win for AMD, but a space shuttle burns more fuel than a Vespa. If you are trying to get across town, you might want the Vespa even though you will have a hard time getting to orbit on it. Raw power consumption is not everything, which is why the real number people pay attention to is performance per watt.
A good way to look at it is if you had a big problem to do that mimicked the Valve workload, you would be drawing max power for long periods of time. If one CPU was faster than the other, you could do the same amount of work in the same time with less CPUs. If you look at the time column, you will see that he benchmark is very Intel friendly, their CPUs run it in around 65-80% of the time it takes AMD to complete the task.
A decent measure of performance as a whole would be (time spent) * (power consumed), basically the load power times the time taken. This gives us energy spent to do the requested work. Looking at this angle, you can see some interesting patterns emerge.

In performance , Woodcrest beats the Opteron hands down everywhere. Barcelona trounces both of those, and the HE Barcelona is ahead of the normal variant everywhere. Clovertown beats Barcelona in most cases, but as memory footprints climb, the added wattage of the AMB takes its toll. At the 12G step, Clovertown is running neck and neck with the 2.0 Barcelona and both lose to the 1.9HE. At 16G, Clovertown loses to both, somewhat handily.
This test is very artificial, completely non-comprehensive, and in general, quite flawed, but it does point out some interesting things. AMD is a clear winner on idle power, no question there. Intel, in this test is the clear performance winner. When you look at performance per watt, they are much more equal. If AMD puts out a 2.5GHz Barcelona in the same power band, things might be fairly scary for Intel, but right now, in all but the most dense memory configurations, Intel takes the crown.
If you pick a different benchmark, things will change a lot, this one favors Intel in a big way. You could also look at this through the lens of time, if you have an hour to do a job, and one CPU takes 45 min to run, and the other 30, total power used would be (time load) * (power load) + (time idle) * (power idle). In this case, things might swing a bit more towards AMD, but I will leave it to the reader to analyze the numbers.
One other thing to note is the deltas with the additional ram. For Opterons, the range is 17W at idle, 21W under load. Woodcrest brings this to 99W and 100W respectively. Barcelona at 2.0 has a 13W/15W delta and 13/16 for the 1.9GHz part. Clovertown is at 96/98W.
FB-DIMMs are killing Intel. Any advantage they have on the CPU front is totally wasted by the memory wattage. The Valve benchmark gains absolutely zero from the added memory, a benchmark that stresses the memory subsystem more will probably narrow the gap a bit, but taking it from 10x to 5x isn't much of a thing for Intel to jump up and down about. I guess I know what next weekend will be spent doing......
In the end, there are more than enough ways to a look at the problem of performance, take your pick. The most important bit is not the raw numbers but knowing your workload. With that in mind, you can not only test in sane ways, but get realistic numbers. Without a defined workload, numbers are just that.
In the end,what I did all weekend was to take a specific look at a slice of the server market. You have the raw numbers, play with them and let me know what you find. Have fun. ยต