Jump to content
The Inquirer-Home

Quad socket Intel Caneland platform benchmarked

First INQpressions Fun and games with Caneland in PC bench runs
Thursday, 13 September 2007, 09:18
THIS WAS A FUN WEEK for workstation and server users, as AMD announced its long-delayed Barcelona (samples of which are slow to appear) and Intel got its quad-socket Caneland platform with "Tigerton" Core 2 CPUs and Clarksboro chipset out of the door - a few weeks before its main course, the Harpertown Penryn 45nm round.

As Charlie mentioned here, Clarksboro chipset is a complex little item, with four independent FSB1066 paths, one for each CPU, meeting together and, through a 64MB cache snoop buffer - generously sized to support even a hypothetical future 16 MB Penryn or future FSB-based device, if any appears - accessed four FB-DIMM 533 channels.

We may have liked Intel to have done at least a quad FSB1333 connection, fully fed by eight FBD 667 channels, as it would bring the total FSB and memory bandwidth on a par even with the new quad-socket Barcelonas (minus the extra inter-CPU HyperTransport links, of course).

Anyway, the lack of HT links is somewhat compensated by the fully uniform memory image, where every address has the same access time from any of the CPUs. This seems to be still quite useful in lots of commercial apps, but the reduced total and per-CPU bandwidth might make the platform unsuitable for usual HPC programs - unless they are cache-bound, of course.

Intel says it did consider those options, but, for a quick rollout and reduced complexity, we got what we got. Once the Penryn-based 45nm CPU refresh comes in some nine months, we'd like to see a process shrink of Clarksboro too, maybe with quad FSB1600 links and eight FBD 800 channels, plus a few extra PCI-Express lanes. Together with a 3.2GHz FSB1600 CPUs, that would make the platform competitive on the quad socket front even for HPC. Next year, Barcelona platforms will also be refreshed with faster DDR2-800 memory subsystems - and Intel will hold off Nehalem transition of quad-socket platforms at least a few quarters after the mid-2008 dual-socket flavour launch.

Recently I had a bit of time to play with Intel's reference Caneland box in its northern Oregon lair. Rather than running the usual TPC or Sungard commercial performance runs, on which we know it is expected to excel, I was curious to see how a typical multi-threaded PC benchmark like, say, Sandra XI, would perform on such a monster box - especially since I don't think anyone else ran that on such a box.

The particular interest to me was the net memory bandwidth and latency in individual CPU access runs, as such single-path checks are usually not seen (or considered important) in actual commercial server apps. So, even if the results are low, it may only show how bad gaming PC this machine would be, yet it may still be a darn good server.

So, I ran Sandra XI SP4 64 bit on Windows 2003 Server 64 bit as well... here are the screenshots: alt='canelandsandracpu' alt='canelandsandramm' Oh boy, fantastic peak power - the highest Sandra CPU benchmark results on record! No, even four socket Barcelona with current 2GHz chips won't come any close on these numbers. But then, look at the memory bandwidth and latency. The impact of handling four FSB links and a large snoop buffer shows on the latency - a total of 140ns for 64MB random access range, compared to 118ns on Greencreek dual FSB chipset, 71ns on X38 chipset A0 beta version, and 55ns on highly-tuned Asus Striker Extreme Nforce 680i.

alt='canelandsandrabandwidth'

alt='canelandsandralatency'

In a quad-socket Opteron, the latency to the local CPU memory would roughly be half this, the latency to the neighbouring CPUs would be around the same range, and the latency to the farthest CPU in the other corner would be higher than even the Clarksboro FB-DIMM round.

The bandwidth show most of the penalty impact from the lower FSB and reduced per-CPU FB-DIMM channel numbers: the 2.9 GB/s (obviously single thread) Sandra is over a third slower than 4.5 GB/s on Greencreek, and about half the bandwidth of a single-thread Sandra on an Opteron. Of course, that Opteron has far less cache to buffer its access.

So, if you ever want to use this as a home super PC, keep in mind its memory and, yes, I/O (not exactly enough lanes for graphics and I/O) but still it has peak CPU performance far above any overclocked gaming rigs.

Talk about gaming, with these 16 cores and some 188 GFLOPs peak double precision FP power, this would still be a fearsome 3-D raytraced gaming monster, even without any fast GPUs this time - CPUs would be powerful enough to do all the work. And, yes, ray tracing is CPU and cache bound. ยต

Share this:

Comments

There are no comments submitted yet. Do you have an interesting opinion? Then be the first to post a comment.

Advertisement
Subscribe to the INQ Newsletter
Sign-up for the INQBot weekly newsletter
Click here to sign up Existing user
Advertisement
INQ Poll

Christmas computer sales

Will you be buying a new computer this Christmas?