The Inquirer-Home

FSB headaches face HT answers

Taipei IDF Clovertown Part II
Thu Apr 13 2006, 11:56
SO, CLOVERTOWN (and, by extension, Kentsfield and, after that, Tigerton) seems to work well by now - in fact, since it works so well, it could end up at almost the same time as Woodcrest and Conroe, i.e. become to them what Presler was to Cedar Mill: a dual die version of the latter in the same package.

Anyway, Intel folk, including the Dan Casaletto, the man-in-charge of microprocessor development and one of the ex-Alpha leaders - OK, I have to take my hat off to him, in that case - were 'very cautious' on the FSB speed issue for Clovertown. As our Charlie mentioned long ago, the three-load bus situation there would probably prevent it from reaching 1,333MHz FSB of 'standard' Woodcrest. Or would it?

Outside one tech session here, a source mentioned that 1,600 FSB is the practical limit for the highly tuned point-to-point FSB on 2006 Conroe and Woodcrest, even after overclocking, assuming reliable long-term operation. With some tweaking, the whisper says that 1,333 FSB (and not a hertz more) could be done for a three-load situation like the one on Kentsfield or Clovertown, but it may take some time to fix & validate on both CPU and north bridge sides - maybe requiring that Christmas++ arrival date for these CPUs?

Now, there was chat about throwing away that FSB and using an interconnect like, let's say, HyperTransport 3.0, on the new Intel Core, and what it would bring to the table. Here's my take on the situation.

Intel likes the idea of being able to instantly double the per-socket core density by having the single-die or dual-die option in the same socket. With their MCM substrate expertise since the Pentium Pro days, and good yields and frequency characteristics of the Presler XE (which is a dual-die MCM), it makes sense to use this trump card - on top of the ability to mix 'n' match even dies from different wafers on that one MCM, depending on individual die test results, to get maximum-speed high-end eXtreme Edition high-margin parts.

However, with the antiquated FSB approach, two dies on that chip have to share the bus, imposing a bandwidth bottleneck, not only because of twice the "data hunger" on that FSB, but also slowing it down by at least 20% due to three loads instead of two.

How to solve that? Two ways: One is to bring out BOTH FSBs (one per core) out to the dual-FSB north bridge, and require a new package in the process. It is a quick fix, but needs a new socket!

The other way is to implement a nice interconnect like CSI (who knows if and when) or, here and now, HT 3.0. I made the following assumptions for ease of illustration in this theoretical scenario:

Each chip has four HT 3 channels (could be more, could be just three, too, depending on the positioning), each of which is a 1.33 GHz 2x16-bit path with 21 GBytes/s bandwidth. Each chip also has its own integrated memory controller dual-channel DDR3-1333 memory path, also with 21 GBytes/s bandwidth - in perfect sync with HT.

Now, the goal is to have a single socket for both uni and dual-die options, where the power envelope is left open, while the four 'external' HT channels plus two DDR3-1333 memory channels are present in both cases. No external slow-down, same speeds whether single or dual dies. Look at my "cubist" artwork here:

alt='htstuff'

Fits well? As you can see, both cases have exactly the same outside memory and I/O situations - in the dual-die case, the first die's HT 1 & 2, and the second die's HT 3 & 4, come out externally, while the HT 3 & 4 of the first die, and HT 1 & 2 of the second die, link together respectively. Why dual-link (42 GB/s total) connection between two dies on-chip?

Well, let's assume these are quad-core dies each - and assume a "conservative" configuration in which, to keep pin count at a minimum, only the Die 1 memory controller is active, so the Die 2 has to handle its (presumably frequent due to four on-die cores) memory access through the HT crossbar in Die 1. In that case, it may be good to have a dedicated HT link for that purpose to minimise performance penalty, while the other HT link is used for inter-die core communication and cache coherency operations.

In the Opteron world, Iwill last year made a shoebox dual-socket workstation where the second socket had to go to the first socket's memory as its own memory controller was disabled. Iwill put in an option to use two HT links between the sockets instead of one - possibly exactly for the abovementioned purpose. Of course, it is far easier to implement the dual HT links on the MCM substrate...

Now, if Intel is more pin-generous, and willing to let the packaging have another 160+ "optional" pins, these could be used to activate the Die 2 dual-DDR3 memory controller too, and provide another "eXtreme Edition" option for both Intel and, this time, chipset and mainboard makers. The second memory controller would not only double the total memory bandwidth, important in multi-GPU gamer, 3-D workstation and server situations, but also scale the total setup in the many-socket large-system configurations, without the memory bottleneck. And, if you plug the same chip into the "single memory path" system, well, the second memory contoller would simply be inactive!

In my mind, this presents Intel (and AMD!) with an easy 8-core per-socket path for their 2007-2008 45nm generation. µ

 

Share this:

blog comments powered by Disqus
Advertisement
Subscribe to INQ newsletters

Sign up for INQbot – a weekly roundup of the best from the INQ

Advertisement
INQ Poll

Heartbleed bug discovered in OpenSSL

Have you reacted to Heartbleed?