EARLIER THIS MONTH, Intel announced the last members of its 32nm Westmere processor generation, the high end Westmere EX or Xeon E7 series. Just like the 45nm process predecessors, the Nehalem EX, these chips pack many cores, humongous amounts of cache, four memory channels and four QPI links to other CPUs - eight CPUs in one system directly, or up to 256 CPUs using custom controllers like the SGI Altix UV - all on one die.
There are some records worth noting in the new Xeon E7 generation, not all just because of that process shrink. These are the first general purpose CPUs with full - not simplified - cores that have 10 cores per die. Then, they are also the first single die CPUs with 30MB of shared L3 cache, on top of huge L2 caches within each CPU core. Furthermore, with the AES cryptography instruction acceleration in there, this being Westmere, a four socket, 40 core box with 128GB of RAM is probably one of the best password cracking or decryption machines around.
Looking at the outside system infrastructure, Intel hasn't changed anything. The memory is still DDR3-1066 with support for 1.35v DIMM modules, although it shouldn't have been too hard to validate the 1333 speed grade for more bandwidth sensitive usage models.
The recent supercomputing benchmark tests on the 256 socket SGI Altix UV system here in Singapore showed that, after upgrading from Nehalem EX to the same frequency 2.66GHz Westmere EX but with eight cores per socket instead of six, there was no slowdown per core even in memory related applications. This means that the existing memory system had sufficient buffering to feed the additional cores anyway. So, the scalability is there even with more cores per socket.
If we look at the top end model, the 10 core 2.4GHz Xeon E7-8870 with 30MB of L3 cache sitting on a wide ring bus, the processor can provide theoretical 96 double precision GFLOPs per socket, or 384 GFLOPs in a typical quad socket box with a few terabytes of RAM. These systems aren't mainly aimed at HPC but rather enterprise commercial use, with all the redundancy, error correction and reliability features that come with them, perhaps approaching those of mainframes. Even the E7-4870 four socket version that we will review soon has the very same features and capabilities, just limited to 'only' 40 cores and 1TB of RAM in a small single box.
And that's where some of the important impacts of the new chip come from. Here we have a combination of a full capability high core count processor with high bandwidth and capacity memory system, good scaling up to 80 cores in a simple, single box without extra hooks, and quite a decent reliability, availability, serviceability (RAS) feature set too. And all that running the usual - love it or hate it - x86 instruction set. So, this could upset the carts of quite a few other high end platforms.