Jump to content
The Inquirer-Home

Beans spilled over Intel's Montecito clean shirt

Intel Developer Forum Dog's breakfast, anyone?
Thursday, 9 September 2004, 16:33
MONTECITO IS THE NEXT BIG THING for Intel, and I am not just talking about die size. Cameron McNairy a Montecito architect spilled some of the beans about the chip on Tuesday. It is the first dual core Itanium chip, and a large departure from Itania past.

The chip is about exploiting parallelism at all levels, dual cores for the most basic parts, dual treads per core for more fine grained work, and multiple instruction bundles per core for more. All this, and a whopping 26.5MB of total cache add up to a potentially monster chip. The die size estimated by some to be around 580Sqmm, a monster number, but the power appetite is not. It should deliver about four times the performance of a current Itanium so monster applies here too.

This 1.72 billion transistor chip may seem like it will suck power like water, especially since it uses Intel's 90nm process. Because of the new Foxton technology, it actually only consumes 100w, 30w less than it's predecessor.

One of the first things up on the menu of improvements is better Instruction Level Parallelism. This comes about in a bunch of ways, starting with an additional shifter and popcount. More efficient speculation recovery and a few new instructions make ILP notably better on this chip. Better caches and data flow between them, including more L2 and L3 victim buffers don't hurt things either. Lastly, the queues are more efficient. It all adds up to big overall gains.

Moving up a step, you have the dual threads and dual cores. Memory accesses between these threads are interleaved to hide latency. While the picking of threads is dynamic, and some resources are shared exclusively, while others are shared competitively.

To manage this potential mess, the two cores have an arbiter between them. While the cores decide which thread owns them for the given time, the arbiter decides which core has precedence. This means that one core has full access to the bus and the outside world while the other sits and watches. The arbiter picks this in a much more dynamic way than the thread control in the cores themselves.

This allows some of the threads to be picked in a highly complex fashion, and others to be more crudely activated. Upon a thread stall, a Montecito core can make a context switch to another thread with only a 15 cycle penalty. This means a massive reduction in idle time, and as a consequence, you get a faster working CPU.

The power reduction is no less impressive than the threading. Foxton, the power savings code name is an large array of things with a common goal. It starts by adjusting the power and frequency of the chip dynamically and smoothly. Since power used is related to both of those values (P=fCV^2), a 1% frequency change results in about a 3% power savings. While it sounds good, a 34% reduction in clock will not bring you a net gain of 2% power, but the math works out, physics just don't comply.

Foxton sets the voltage to the minimum level needed to do the desired computing task. It will allow for a brief overvolting to compensate for increased demand, but it pretty much kills overclocking. If voltage draw or temperature go too high, it will ratchet things down to get to the desired 100w level. It can also do this locally to parts of the chip, not necessarily globally. Nothing else has this technology, and nothing can match the 2-3 millisecond response times.

Next up is reliability. In the coming months and years, look for Intel to beat the RAS drum hard, and Montecito is no exception here. It has ECC or better protection on all parts of the chip that matter. Things related to protection that were taken for granted no longer are. Also, Pellston technology allows for mapping out of bad cache lines individually.

One last thing that was discussed is the ability to operate two chips in hardware lockstep. This is not new by any means, but combined with Pellston, it makes for some interesting scenarios. If a cache line fails and is mapped out, Pellston breaks lockstep. If lockstep is reestablished, the 'good' chip will map out that cache line also to preserve consistency. It is almost like they actually thought this whole chip through. ยต

Share this:

Comments

There are no comments submitted yet. Do you have an interesting opinion? Then be the first to post a comment.

Advertisement
Subscribe to the INQ Newsletter
Sign-up for the INQBot weekly newsletter
Click here to sign up Existing user
Advertisement
INQ Poll

Christmas computer sales

Will you be buying a new computer this Christmas?