Intel tries to backstop its own roadmaps - Bob Colwell, former Intel chief architect
AMD CELEBRATED its Opteron chip's sixth birthday in a fitting way by accelerating the six-core Istanbul release to June, and by announcing the 12-core total of two Istanbul dies in one Magny-Cours package to come out half a year later.
At the same time, Intel's single die eight-core, 16-thread Beckton Nehalem EX should be out a little before Magny-Cours, followed by native 32nm six-core Westmere DP chips early next year.
How well will they stand up against each other? Well, core for core and clock for clock, Istanbul will still be behind Nehalem in most cases, whether you look at CPU, cache or memory bound benchmarks. Nehalem only seems to be beaten clock for clock by the older Penryn in some tight loops and cryptography jobs, for now.
Looks like that, except for HT Assist in NUMA, there are no other Istanbul improvements over the Shanghai save for the additional two cores - even the L3 cache is still at 6 MB, a capacity that may turn out to be a tad too tight for six cores to share.
The Magny-Cours combination in its gigantic G34 socket might prove to be more interesting, though - two six-core dies, each with both of its DDR3 channels exposed, plus two HT3.1 paths from each chip routed out as well.
What happens to the other two HT channels on each CPU chip? Well, being on an MCM substrate, it is easy to either join two of them into one 64-bit wide HT channel, or keep them as two parallel standard HT channels to support multiple transactions simultaneously, something not unexpected from 12 cores talking to each other. At the same time, the few millimetres proximity and the substrate could allow AMD to clock those two HT3 links way faster - up to twice as fast - than the external HT3 ones. As long as AMD allows different speeds on different HyperTransport links, of course. The result, if done?... very good internal NUMA latency and bandwidth.
Would it beat Beckton Nehalem EX? Not so easy. Unless AMD implements further incremental core and cache improvements in Magny-Cours, which is highly unlikely, Becktons should have both per-core and clock-for-clock performance advantages - unless of course Intel decides to play really conservative with the clock frequencies. Also, Beckton has a humongous 24MB L3 cache and four DDR3 channels for eight cores, compared to 12MB L3 cache and four DDR3 channels for 12 cores in Magny-Cours. So, cache and memory bound jobs should also perform better on Nehalem EX based on current expectations - but not by a wide margin, mind you.
If AMD manages to improve the G34 socket TDP margins and keep Magny-Cours speeds not far behind Istanbul and Shanghai, and if Intel reduces the Beckton Nehalem EX core clock to save on TDP, we could see a clock-for-clock matchup here. Also, Magny-Cours will be available in high end dual processor (DP) platforms. Intel partners could decide to do a DP Nehalem EX too - that one would make a lovely workstation, I have to say.
But hold on, Intel will have its 32nm Westmere shrink in the DP race too, sometime in early to mid-2010 as well. Besides the expected Gulftown six-core per die DP versions with both core improvements and 50 per cent cache increase to 12 MB, there are also expected to be memory and QPI improvements.
One thing I hope to see is offering a way around one little secret - the current 3 cycle core to L3 uncore extra latency seen on Nehalems due to syncing the core and uncore portions running at different clock speed, even if you set them to run at the same speed.
Similar to the old time PAT performance acceleration on Intel chipsets, there could be a mode where, if the core and uncore are set to the exactly the same clock, the 3 cycle extra latency is reduced or removed, going some way towards reducing the overall L3 latency there. Can we have this fix for Westmere UP and DP flavours?
If such and other improvements are in, die for die, in early 2010 Intel will still have a very clear advantage over AMD Istanbul performance-wise, even if the clock speeds are conservatively kept at 3.6GHz to 4GHz.
As for late into 2010 and beyond, well the Sandy Bridge versus Bulldozer battle should be an even more interesting one.
Why Such Delays? Heres InSide Ultee' Duper. Remember RD880D mention in Commentos today? Its 10.1 part from R785 Integrated chipset, just replacing R780. Why NOT DX11?
Well, Nvidia Has announced 380 GPU that will be DX11, near end of year. Thats KEY. Nvidia Made Open GL (Or should it be Openly GL?), anyway, Nvidia has fundumental Handle On NT6. So ATI Cann't really do much with DX 11, Till Nviidia shows them how & makes time arrrangements to copy technology. Might Say ALL Hardware is Awaiting MS DX11, HardWare solution to Become Available Before Putting Out World W/Dog Defective Parts.
Its So E-Z, Now I See Light, BullDozer Ahead. drashek
Good article.
"If such and other improvements are in, die for die, in early 2010 Intel will still have a very clear advantage over AMD Istanbul performance-wise, even if the clock speeds are conservatively kept at 3.6GHz to 4GHz."
3.6-4 GHz is "conservative"?
AMD talked about not wanting to get into a 'number of cores' race (I seem to recall them comparing it to the clockspeed race)
Here was AMD talking about how it was not smart to just keeping adding cores and how a 'glued' MCM approach was a non-optimal solution. Fast forward all of 1-2 years and AMD is talking up 12 cores AND MCM. Pot... Kettle... Black.
I guess when AMD can't make the actual core as efficient, throwing more cores and doing an MCM approach to get even more cores is suddenly no longer a 'kludge'. Next thing you know they'll be adding more cache :)
Maybe on 32nm those speeds will be achievable. But 4GHz for an 8-core part seems a bit too much. We may see 4GHz stock on home user chips, though.
Er, OpenGL was made by Silicon Graphics Inc. in 1992.
The article is about Intel and AMD server/workstation CPUs, not about DirectX, Nvidia or ATI.
As for who out of the two major PC GPU manufactures will have a GPU compliant with DirectX 11 first, who knows? Sadly for me, the point is a moot one since I own Windows XP. So it’s DirectX 9.0c for me unless Microsoft release 10/10.1/11 for XP.
There would have to be a “Killer App” to push me to upgrade at this moment in the near future. Wing Commander got me to save up for a PC many moons ago (486DX2 66 with 4MB ram!)
Valve’s hardware survey (March 2009) shows 28.28% of Steam subscribers have DirectX 10 hardware on Vista. 29.15% have a DirectX 10 GPU on XP. 25.22% of users have a DirectX 9 SM 2b/3 GPU. Nearly 55% of users are still on DirectX 9c.
Although you are right, at least AMD has realised one of it's biggest mistakes of recent was not gluing two athlons together to combat Intel's core 2 quads.
I hope AMD get their chips performance up however they can because the only reason Nehalem is so ridiculously expensive at the moment is AMD's total lack of competitiveness.
Maybe it's too little too late though?
Shad, thanks for longer term view. Must be Nvidia adopted Open Gl or Bought Tech. Anyway Open Gl by 2002 on k7 was base for writing Vista. thats actual link that made open GL basic ingediant. it was asus mainboard used to test vista as open gl allowed deeper improvements & is needed as fundumental base for addional NT6 O/S. Actually Nvidia is example of heavy Open GL User & todays Prime canadate for any NT6 breakthrus in NT6 GPU Field. Ultee' drashek
When Intel pushed its netburst architecture they knocked off its processor performance by AMD architecture. But, AMD have engineer that they know the right combination's on clock and circuity. Too complex chips have a high thermal dissipation's if the core clock set too high. Using a not so complex circuity will be benefits its performance per watt ratio and easily to be clocked with high frequencies and give a alot raw performances.
AND, remember AMD roadmaps is always conservative for worst conditions if they doing right and Globalfoundries get their 32 nm in time, it is possible in Q1 Magnycours will use 32 nm fabrication process ahead of its schedule.
I also seem to remember Intel making similarly disparaging comments about on-die memory controllers...
In the world of technology, every company can be accused of having at least one pot-kettle-black moment, and at least it's taken AMD a couple of years to reverse their stance here, some companies seem to rubbish their competitors design ideas whilst simultaneously releasing products of their own which look suspiciously similar.
More and more cores per chip, fine, I like. But that means more bandwidth bottleneck to the memory, right ?
So when will the RAM simply become part of the chip package ?
Buy a 12-core whateverahlem with 4GB of DDR5 RAM on-chip.
Or buy a 12-core 2GB chip.
Or a 16-core 8GB chip.
Etc...
Tell me it'll happen one day.
I wonder if we'll see a 10-core from AMD, a combination of a 6-core and a 4-core on one die.
good article, very informative!
here I also have a good share, that is:
http://www.tradestead
there are different kinds of beautiful and powerful consumer with very cheap price electronics that I like it very much!
@Pascal: Integrated RAM in large sizes is a long ways off. The process for making DRAM is different from the process for CPUs and GPUs, and the two cannot be easily mixed. There are such things as "Embedded DRAM," which are DRAM made in the CPU or GPU process, but they are much larger area than the native DRAM process, which is perfectly tweaked for making DRAM.
That said, there is promise in the next few years for major advances in "stacked die" technology, where they will stack the DRAM dies directly on top of the CPU or GPU, allowing for much faster, wider, lower latency access to memory. Basically, it will greatly reduce or eliminate the memory bottleneck as we know it. That will be a game changer for lots of apps.