Blair is an utterly discredited prime minister. He's a well intentioned fool - David Starkey
LAST FRIDAY'S story on possible superdesktops using the upcoming Nehalem-EX and Nehalem-EP processors also hinted at their expected overclocking abilities, but what kinds of speeds might we realistically be able to achieve with these monsters?
After all, the 32nm Gulftown Nehalem-EP dualie should start at 3.6GHz for its six (6) cores per die, and will most probably run well above 4.5 GHz when properly cooled on a good mainboard in most cases. As for the biggie eight (8) core Nehalem-EX, even then the 2.66GHz top default clock should be arousable to some 3.2GHz under the right conditions. So, these multiprocessing brethren of the otherwise similar Core i7 should share its overclocking margin too.
Sounds easy, right? However, what would we really be overclocking there? Not everything, it seems. Even on the current Core i7, as you know, the default clock you see only applies to the four cores and their L1 and L2 caches. The shared 8MB L3 cache, the memory controller and the QPI interface, all collectively known as "uncore", have their own clock, which is asynchronous at that - read, more latency in between. This arrangement enables you to have, among other things, better overclocking for the "core" portion, but the latency and, to certain extent, access bandwidth to the uncore are sacrificed a bit. And, before you start bias-ranting, AMD was equally guilty of this with its Barcelona and Shanghai processors and its desktop Phenom CPUs too.
So, your glorious overclocking achievement may show the 4.00GHz on screen, but the L3 cache and memory controller inside the CPU might only be working at 2.66GHz if it's using DDR3-1333 DRAM, at double or more the memory data rate. Now, this may be necessary in the case of the humongous Nehalem-EX die where the 24MB L3 cache, four (4) memory channels and four (4) QPI links obviously can't run at a very high frequency, but either way your bandwidth and latency benchmarks will be affected, depending on both the "core" and the "uncore" clock rates.

On the desktop Core i7 running at 3.33GHz and running the Sandra 2009 latency test, the L1 cache may show 4-cycle latency compared to 3-cycle latency for the same-sized Core 2 cache, while the small L2 256KB cache will show 10 cycles, and the big shared L3 8MB cache block, anywhere from 37 to 46 cycles depending on the, yes, "uncore" clock - as you can see on this SiSoft Sandra 2009 shot. Now, the Core 2 large L2 cache of 12MB - two times 6MB, on two dies of course - shows just 16 or 18 cycles latency if staying within each dual-core die on the two-die MCM, depending on whether it is C or E stepping, the later being more overclockable at the cost of a bit extra L2 cache latency.
As reported elsewhere on the web, due to process and design improvements the mainstream version of the upcoming Sandy Bridge 32nm CPU should have somewhat improved latencies for the same cache structure as the current Core i7. The 32K L1 cache will be back to 3 cycles, the 256K L2 cache down to 9 cycles, and the 8MB L3 cache at 25 cycles - not bad for a cache shared between four CPU cores at the same time! This is a Core i5 follow-on, the higher end CPUs will have more cores, larger caches and possibly slightly larger latencies.
In summary, there's more to it than the clock numbers alone. Even within the same product family, subsequent steppings may have different design compromises to achieve the desired goals, some of them not widely known. And, as the CPUs become more complex, not just with differently-clocked async parts but also in various generations of "turbo" auto-overclock settings, one clock frequency number won't be sufficient to describe the speed anyway. How about, say, Core i5 XXX, core 3333 MHz, uncore 2667 MHz, turbo 3600 MHz, for instance? µ
oh goody, more stuff to confuzzule me. doh!
After they dumped from brightsideofnews, Intel placed him in the INQ
In starting system, being at 20 Latency edge isBE HAPPY edge. theres always something to prevent going much lower. What?
Bad installs on worn system is oNE Big reason. You can cut out O/S install in activation first entry, before even runing & make NO difference, Yet when You shove screwdrivewr into mouse port, KaBam, SPARKS, if you even survive, as depth of core sparken will be close to total. Remember these are cheap electronic parts, NOT some Experiment that can be Fixed. Each Tests leaves Dish of Burnt Carbon, little else.
So as hot swapping & snapping in power cables, with FULL Power behind them, ripping out & overloading system needlessly with every port needlessly in active service. Soon Latency grows to manageable range. about 80. Below that its too tight,Keep it that WAY, except if you run that Lovely Script steve picked up from CD Crowd while squating upon someone clueless, JAVA
Script & LOTS of IT.BEST THING FOR LATENCY PROBS.
Still, it isn't hard just to say clueless is OK. You can only control circumstances to limit. If You Will Be Done,& yE sMART AS gOD, Getta 20 Lati. If Thy will be DONE, its over 80, if it puts out enough NOISE to attrack Casper, try 200. You Know You Be Punished. Ye, its usually software blip & new partition Is NEW War. REAL Oveclocking is putting ALL items together, Like Quality Install & with low cost brand, about $1,200 will get yo 3D Vans@18,000,Uber is figuring oUT how! if ye be bucko. Antricrapate' & relax. You know its 955/ddr2/3/main in same range with cheap everything, don't forget TOP power, while takes your snerd, 750 watt $40. it costs NO more to make big watter than small, just look bigger. Like truck parts. BUY sli enable ACRYLIC case for less sparks. Notice watter & water might mean stocking cap in Tropical Jungle Builds.
TAKE WEEK OFF & REBREAK ALL OLD EQUIPMENT PILING UP, sparkase:cHaNGe EveRy BiOs fOr NEw On STALL'd, No Start except FAN.Even that cRaP is mere lat of 400. Hardly tell dif betwwen bad software install & Frankenstein.
vondrashek LATI Master Y2K?
Until they figure out to get it all at the same speed we are in for a revisited SECC2 ride. Lets call LGA1366 what it actually is: SECC3.
And would you pulease look at all those different sockets that Intel will be released. It is liek OMGWTFBQSAUCE 5 different incompatible timebombs.
@vondrashek: Frankenstein? Of course! That's the answer. How else could you blur the Turing test so completetly? Being a monster made up of disparate parts explains everything. ;-)
Why is this relevant at all if you are not mentioning the gain (or loss) of speed in instruction crunching?
Madeup example 1: Linear increase
An increase of the cpu 'speed' from 3.2Ghz -with 40 cycles of latency to access L2 for example- to 4.0Ghz (+25% clk) translates into 50 cycles latency (+25%)
3.2Ghz = 0.0125*10^-6
4.0Ghz = 0.0125*10^-6
Result: no gain nor loss
Madeup example 2: Nonlinear increase
An increase of the cpu 'speed' from 3.2Ghz -with 40 cycles of latency to access L2 for example- to 4.0Ghz (+25% clk) translates into 48 cycles latency (+20%)
3.2Ghz = 0.0125*10^-6
4.0Ghz = 0.012*10^-6
Result: ~ 4% improvement
There's this thing called MIPS that measures the performance of microprocessors including the internal memory they have. If they can chunk more MIPS after overclocking it, they are faster. That simple!
Tell me what your on SparkenTime, I havn't been that dillusional since I had acid :) You make completely no sence dude, you forget your meds today or? :(