Jump to content
The Inquirer-Home

How Quickpath GPGPUs may access two CPUs at once

Geneseo on steroids
Sunday, 30 September 2007, 12:09

AS WE MENTIONED before, Intel's QuickPath Interconnect, or QPI, should bring along a bagful of news on the board-level architecture front when it arrives in the middle of next year, along with the first Nehalems.

That first round is widely expected to be the dual-socket TylersburgDP platform - just like the Nehalem dual-CPU "newborn" shown at IDF.

In our earlier story, we covered the potential for QPI to be brought out to expansion slots or handed special cabling for tightly coupled NUMA multiprocessor rack system.

Now, with or without that slot, the initial "newborns: are expected to have a bit of their QPI resources unused: in a dual-socket TylersburgDP Nehalem platform, there would be two full-width QPI links per CPU.

alt='qpinova'

One QPI full-width link on each CPU is used to communicate between the CPUs directly, while half of the other full-width link would go to the I/O Bridge. The other half of that second QPI link from each CPU would be, well, free.

In the absence of an HTX-like card slot of some kind, couldn't these spare links then be connected to something else? Well yes - you could have either a pseudo-NUMA bridge to another two-socket node, or, say, a FPGA or GPGPU accelerator using those links.

Such a high bandwidth peripheral could use those two links at the same time, each connected to one of the CPUs (and therefore their local multi-channel memory systems) and trying to stream something to or from the, presumably interleaved, system main memory at maximum speed through that IO Bridge - or, even better, a high-end Xilinx FPGA sitting in that "other" socket connected to two half-width QPI links.

If the system memory is configured as interleaved between the CPUs, that data stream could be sent simultaneously from that GPGPU or FPGA, across both QPI half-width channels and on-chip memory channels of both CPUs simultaneously at an memory-like equal latency for both halves.

The lucky accelerator would be able to saturate both combined links, as it would have access to 50+ GB/s of total main memory bandwidth (assumption of two CPUs, with dual-channel DDR3-1600 memory), or twice the combined half-channel QPI bandwidth, at a comparatively low latency. All this at a low pin count, easy to fit on any mainboard.

Compare that to the current FPGA or custom accelerators sitting on 1GB/s PCI-X133 with about one microsecond latency to the system main memory, or GPGPUs on 8GB/s PCI-E x16 with often a bit higher latency to that same system memory. Obviously, we could do far more now - say, any kind of large scale GPGPU or accelerator operations across the whole system memory.

The question is interesting because of what quite a few companies, Intel included, i will probably be doing in a year or so - custom accelerators of some kind sitting directly on full-width QPI.

What we need in this case is not just channel splitting or bifurcation, but also channel aggregation at the accelerator end, together with synchronisation between two separate half-channels going from one peripheral to two different CPUs. doesn't this make sense? Watch this space... we could be talking Geneseo on steroids. ยต

Share this:

Comments
AMD offers this NOW..

Why there is no a single word here that AMD affers this NOW (using HyperTransport links), instead of 2nd half of 2008? Why there is no word on that f.ex. Cray Inc. will offer FPGA accelerated systems this year using AMD's solution? Why there is only comparisons to old approaches, and not to AMD's one?

posted by : dess, 30 September 2007 Complain about this comment
do we really need it?

If people are sticking to XP and vista is the falure they preach of. Then the question comes that comes to mind is do we need more banwith, obviously the only real market for this is servers. Gaming unfortunately has lost it, making games for hardware that only a very few people actually have. Then to make things worse there are very few games that take advantage of current hardware. We don't need quad core for office, torrents, inquiring, listening to mp3, surely not for thrash tube or hd dvd unless you're gonna be watching hd videos in window mode and that wont make since for the obvious. How many people need a inline, twin cam, variable timing, twin turbo, 4 cylinder with limited slip stage 3 clutch in a jhon deer riding mower?

posted by : missingxtension, 01 October 2007 Complain about this comment
Advertisement
Subscribe to the INQ Newsletter
Sign-up for the INQBot weekly newsletter
Click here to sign up Existing user
Advertisement
INQ Poll

Christmas computer sales

Will you be buying a new computer this Christmas?