The Inquirer-Home

Intel vs AMD multicore battle looms

Comment 12 cores in the next 12 months
Thu Apr 30 2009, 16:49

AMD CELEBRATED its Opteron chip's sixth birthday in a fitting way by accelerating the six-core Istanbul release to June, and by announcing the 12-core total of two Istanbul dies in one Magny-Cours package to come out half a year later.

At the same time, Intel's single die eight-core, 16-thread Beckton Nehalem EX should be out a little before Magny-Cours, followed by native 32nm six-core Westmere DP chips early next year.

How well will they stand up against each other? Well, core for core and clock for clock, Istanbul will still be behind Nehalem in most cases, whether you look at CPU, cache or memory bound benchmarks. Nehalem only seems to be beaten clock for clock by the older Penryn in some tight loops and cryptography jobs, for now.

Looks like that, except for HT Assist in NUMA, there are no other Istanbul improvements over the Shanghai save for the additional two cores - even the L3 cache is still at 6 MB, a capacity that may turn out to be a tad too tight for six cores to share.

The Magny-Cours combination in its gigantic G34 socket might prove to be more interesting, though - two six-core dies, each with both of its DDR3 channels exposed, plus two HT3.1 paths from each chip routed out as well.

What happens to the other two HT channels on each CPU chip? Well, being on an MCM substrate, it is easy to either join two of them into one 64-bit wide HT channel, or keep them as two parallel standard HT channels to support multiple transactions simultaneously, something not unexpected from 12 cores talking to each other. At the same time, the few millimetres proximity and the substrate could allow AMD to clock those two HT3 links way faster - up to twice as fast - than the external HT3 ones. As long as AMD allows different speeds on different HyperTransport links, of course. The result, if done?... very good internal NUMA latency and bandwidth.

Would it beat Beckton Nehalem EX? Not so easy. Unless AMD implements further incremental core and cache improvements in Magny-Cours, which is highly unlikely, Becktons should have both per-core and clock-for-clock performance advantages - unless of course Intel decides to play really conservative with the clock frequencies. Also, Beckton has a humongous 24MB L3 cache and four DDR3 channels for eight cores, compared to 12MB L3 cache and four DDR3 channels for 12 cores in Magny-Cours. So, cache and memory bound jobs should also perform better on Nehalem EX based on current expectations - but not by a wide margin, mind you.

If AMD manages to improve the G34 socket TDP margins and keep Magny-Cours speeds not far behind Istanbul and Shanghai, and if Intel reduces the Beckton Nehalem EX core clock to save on TDP, we could see a clock-for-clock matchup here. Also, Magny-Cours will be available in high end dual processor (DP) platforms. Intel partners could decide to do a DP Nehalem EX too - that one would make a lovely workstation, I have to say.

But hold on, Intel will have its 32nm Westmere shrink in the DP race too, sometime in early to mid-2010 as well. Besides the expected Gulftown six-core per die DP versions with both core improvements and 50 per cent cache increase to 12 MB, there are also expected to be memory and QPI improvements.

One thing I hope to see is offering a way around one little secret - the current 3 cycle core to L3 uncore extra latency seen on Nehalems due to syncing the core and uncore portions running at different clock speed, even if you set them to run at the same speed.

Similar to the old time PAT performance acceleration on Intel chipsets, there could be a mode where, if the core and uncore are set to the exactly the same clock, the 3 cycle extra latency is reduced or removed, going some way towards reducing the overall L3 latency there. Can we have this fix for Westmere UP and DP flavours?

If such and other improvements are in, die for die, in early 2010 Intel will still have a very clear advantage over AMD Istanbul performance-wise, even if the clock speeds are conservatively kept at 3.6GHz to 4GHz.

As for late into 2010 and beyond, well the Sandy Bridge versus Bulldozer battle should be an even more interesting one.


Share this:

blog comments powered by Disqus
Subscribe to INQ newsletters

Sign up for INQbot – a weekly roundup of the best from the INQ

INQ Poll

Heartbleed bug discovered in OpenSSL

Have you reacted to Heartbleed?