AS EXPECTED, the last week of March saw both Intel and AMD unveil what will be their highest end X86 CPU product lines for this year: for Intel the Nehalem EX Xeon 7500 series, and for AMD the Magny Cours Opteron 6100 series. And we still have the Westmere EP Xeon 5600 series that Intel launched just two weeks ago.
After all the launch buzz, what's the comparative situation and how do the new platforms stack up against each other? Before we go into the Nehalem EX benchmarking, here is the overview on the new stuff from both vendors.
The tops of both platforms, when all cores are enabled, are a single-die, 8 core, 16 thread 2.26GHz part for the Xeon 7500 series, and a dual-die, 12 core, 12 thread 2.3GHz part for the Opteron 6100 series. The Nehalem EX has a much larger L3 cache at 24MB for 8 cores on one die, versus a total of 12MB for 12 cores across two dies on the Magny Cours, which in reality is only 10MB as 2MB - 1MB per die - is locked for HyperTransport Assist remote cache snooping from other sockets. So, if your application has huge pieces of code and critical data able to reside in large caches rather than go to much slower memory, the latest Intel server chip might offer a substantial application speedup.
Here are the estimated parameters for the new product lines:
The Nehalem EX has higher raw memory bandwidth, with four serial memory channels per socket, each going to a buffer chip that supports two DDR3-1066 memory channels. So, a total of eight DDR3-1066 channels per socket, but with the buffer chip latency overhead to add. Magny Cours has a total of four direct DDR3-1333 memory channels spread across two dies, with no buffers in between adding latency overhead. However, keep in mind that it's not really four channels, but "two plus two" channels. If a thread residing on one die wants to access memory across all four channels, it can access two channels locally on the same CPU die pretty fast. But it needs to hop over slower HyperTransport to get the memory content on the other two channels, so there will be a slowdown in accessing them, if not extra wait time because of other threads on that other die also trying to access the memory.
Nehalem EX, with its high end enterprise focus, also seems to have more enterprise RAS features than Magny Cours, although the AMD offering has also improved on this front. Keep in mind many of these will only have real use on larger servers with 4 or 8 sockets, where the Nehalem EX focus really is. After all, Magny Cours is limited to four sockets only. More on this, of course, in the system tests.
Both vendors have also announced lower SKUs for both high end CPU lines with a combination of lower cost, lower frequency, fewer cores and, of course, less dosh per CPU payable. So, both platforms do have cheaper offerings that can bring the many core enterprise goodness to smaller enterprises too.
The image below shows the Dell PowerEdge R810 quad Nehalem EX test system, with 128GB of RAM across 32 channels and just 2U or 3.5 inches thick.
In the absence of a new 8-socket capable offering, AMD seems to be trying to focus the Magny Cours against the dual Xeon Westmere line, like the X5680 that we reviewed here within the last two weeks.
AMD's approach is simple - try to match the highest end X5680 pair of Westmere Xeons with four Magny Cores chips, a total of eight dies against two, for similar price and performance. That would be assuming that application performance scales linearly with more cores or threads, which it mostly doesn't despite so much work having been put into multithreading. A program might, for instance, scale well up to say 8 threads, and suddenly taper off or even stop scaling any further - remember CineBench 10?
Keep in mind that 12 cores at 3.46GHz will still be quite a bit faster in real world than 24 cores at 2.3GHz in most cases. This is the case, unless we're in the virtualisation or datacentre business with many smaller threads and processes competing for the attention, or in specific supercomputing tasks where the performance really scales even with thousands of cores.
Now, the datacentres with zillions of cheap blades and skinny thin 1U servers, and generic designs and margins so low that the Taiwan vendors start comparing them to desktop PCs in terms of the little amount of money they can get from making them, might not be the best targets to make shedloads of money per every CPU sold. We're talking a couple of hundred bucks up to a maximum of around $1,000 per socket, compared to 3D workstations and SMP servers where the per socket CPU price goes at between $1,000 and $3,000.
So, AMD will likely go for a price push here to keep its generic mainstream server market share if possible despite the slower per core product, while Intel focuses on mopping up the high end, more profitable workstation, SMP server and other very high end parts of the market. Keep in mind that the "Lisbon" single socket 2.8GHz to 3GHz six core DDR3 server part from AMD, expected this month, should help it a bit in some of these segments where, if the performance disadvantage versus Intel is not too large, some loyal customers might decide to stick with AMD.
One plus - if you're one of those rare server buyers who actually upgrades system CPUs during the lifetime of the machine - is that AMD's Magny Cours and Lisbon, both in new sockets, promise upgradeability to next generation Bulldozer-based Interlagos and Valencia - 16 core dual die and 8 core single die - platforms respectively a year from now, while the Nehalem EX will have Westmere EX at that time for a socket compatible upgrade.
On the other hand, as hinted before, Magny Cours and Lisbon Opterons may end up cornered between the high core speed Westmere EP and the many core and thread Nehalem EX, each with greater focus in that particular domain.
In the meantime, we'll look at how the initial systems actually compare in performance, power and features, as well as how the two big vendors play the market positioning and sales competition game against each other. Upcoming soon, the Dell R810 will be our Nehalem EX Easter Bunny of a sort. µ