LOOKING AT the workstation and server markets these days, you will notice three distinct segments based on the number of sockets.
At the low end, we see single-socket systems with up to eight cores at the moment whose overall architecture looks much like a PC, while the most popular mid-range segment is of course the dual socket machines, which in many ways represent the ideal price/performance choice these days.
Then we come to the 'cream of the crop', at least core count-wise. The four-socket system category is more exclusive and pricey, with a substantially different platform, often with more slower cores per socket in addition to more sockets combined with higher memory capacity and, usually, higher latency too.
At present, the Westmere-based Xeon E7 4800 and 8800 series, for four and eight socket systems, respectively, rule the roost from Intel's point of view. These 10-core CPUs support up to 2TB of RAM per quad CPU board right now, although it's just at DDR3-1066 speed with the added latency from the memory buffer chips positioned between each CPU memory channel and the DIMMs. Their core speeds, at up to 2.4GHz per core versus 2.9GHz per core in Xeon E5 servers, and the generational gap between the older Westmere in the E7s and newer Sandy Bridge cores in the E5s, are somewhat disadvantaged though.
On the AMD side, the 'Interlagos' Opteron 62xx series processors are used, with dual dies per socket to support eight core pair operation, again per socket, for a total of 32 core pairs and a terabyte of memory per board. The G34 socket for Interlagos and follow-on Abu Dhabi processors doesn't support eight socket operation, however.
There is on-going pressure to further increase the performance in this segment per CPU core as well, and offer more than just a 'super reliable' server. How about a super workstation with many cores and multi-GPU capability for, say, 3D rendering and analysis?
Intel already has an answer ready to launch in a couple of months - the Xeon E5-4600 series, with the same architecture, platform chipset and TDP as the dual socket E5-2600, but with added capability to support four socket configurations for up to 32 cores and 1.5TB of RAM using the presently highest capacity 32GB ECC DIMM modules at up to DDR3-1600 speed across 16 memory channels. So, the core speed as well as memory bandwidth and latency match those of dual-socket systems.
However, the lack of sufficient QPI links - two per CPU only allow a pure 'square' linkage without diagonal links - means that opposite CPUs in the square need extra hops to communicate. This could affect performance in some tasks with heavy communication for inter-CPU processing or I/O from different CPUs.
The Xeon E5-4600 will provide up to 160 PCIe v3 lanes across four CPUs, enabling four dual-GPU cards plus dedicated PCIe SSD and interconnect cards to be used, all at full bandwidth without any bridges required and with the DDIO direct I/O latency hiding the benefits unique to the E5 series at the moment. For all practical purposes, a July 2012 timeframe E5-4690 2.9GHz quad-socket 32-core system will be faster in most tasks than the current E7-4870 2.4GHz quad socket 40-core system, but the latter will still support much higher total memory capacity, more QPI links - four per CPU, for higher inter-processor bandwidth - and of course many more RAS - reliability, availability and serviceability - features.
Soon after, hopefully this summer, AMD is expected to unveil 'Abu Dhabi', its 'Piledriver' core based replacement for the Bulldozer-based Interlagos chips. These chips will use the same G34 socket, meaning they'll still be limited to four-channel DDR3 memory, and with a total of eight core pairs per socket across two dies. The hope there is that the higher core count should somewhat make up for the speed disadvantage versus the comparable Intel processors. 'Abu Dhabi' should also make a decent offering for heavily multithreaded jobs where the memory bandwidth isn't of utmost importance.
Unfortunately, AMD canned its plans for a new G2012 socket with an expected three channel or wider memory per die in that generation, and we'll have to wait for a new 'Steamroller' core based processor generation, likely in late 2013, to see dual-socket and quad-socket servers with higher memory and I/O bandwidth using such a new socket.
Either way, 2013 instead of 2012 will be the year of new platform chipset level and CPU level changes for AMD-based workstations and servers, with 'Terramar' dual die and 'Sepang' single die processors likely the first.