DURING COMPUTEX, as written here, we did see the impressive upcoming Nehalem-EX, the eight-core high-end server monster CPU with 24MB cache and four memory channels, all on one die! We mentioned the possibility of a Skulltrail-type or otherwise extreme multi-processor desktop then, but to be clear, Intel never committed to such a product, it's simply a possibility.
The huge die features eight (8) cores that can handle 16 threads with multithreading, and has 24MB of shared L3 cache, useful in server applications. Don't forget the four (4) full speed QPI links, compared to just one in the Core i7 (Nehalem UP) and two in the Xeon 5500 (Nehalem-EP for DP).
Frequency-wise, with all of this stuff on a square-inch sized die, the likely limit will be around 2.66GHz at most. If you were to overclock it, what would you get? Definitely well above 3GHz, but at the price of 180W or more TDP per socket. But then, in a dual-socket configuration you'd have a 16-core, 32-thread, 48MB cache machine able to address half a terabyte of RAM by your deskside.
Well, last week our sources both in Taiwan and close to Intel confirmed that several vendors expect to have Nehalem-EX 16-core DP desktops, in fact super workstations, at the chip's launch towards year-end. But soon after, the early-2010 32nm Gulftown refresh of the Nehalem-EP is expected, most probably with six (6) cores and 12MB cache per die, not to mention a likely 3.6GHz top clock range for the workstation parts.
So why bother with the expensive, huge, hot Nehalem-EX as a desktop or workstation at all? Besides very large memory capacity for say EDA chip design and simulation work, the machine has far more SMP and I/O scalability than the Nehalem-EP: more memory channels, plus the ability to use those extra QPI links for faster linkage and more parallel QPI transfers between CPUs, and of course multiple I/O bridges accessible simultaneously by both CPU chips for nearly unlimited PCIe expandability. Take a look at the diagram:
As you can see, many multi-GPU cards, plus PCIe based SSD and RAMdisk arrays, and 10 Gigabit Ethernet or faster connections could then all go in the system, with dedicated bandwidth for each card plus only one QPI hop to each CPU.
The Nehalem-EP system will be faster per core and per thread, and will have somewhat lower memory latency, but overall bandwidth - including inter-CPU and memory bandwidth, and especially I/O bandwidth- will be lower. So, depending on the type of task, you will be able to pick your favourite here.
You will have a choice: both Nehalem-EP and Nehalem-EX workstations-cum-extreme-desktops should be there to play with. The more heavily threaded, cache and memory-bound apps will perform better on the Nehalem-EX biggie, while Nehalem-EP will be more cut out for uber-gaming and usual workstation apps. Neither will be a slouch, we're talking about 200Gflops class machines here, mind you.
And, after all, many of the latest games are well multithreaded.... µ
The week in Google
The scandal that just keeps giving
Clip to the end....