AS EXPECTED, the PCI Express third-generation specification is getting close to its final shape. During IDF, there were some interesting insights on the tricks applied, and how they may affect the usage in that new high-bandwidth mode.
As we know, the initial PCIe v1 has 2.5Gbps per direction per link speed - so a 16x PCIe v1 slot with sixteen such bidirectional links would have 2 x 16 x 2.5Gbps = 80Gbps raw bandwidth. After the 8B/10B data encoding used, the available theoretical bandwidth becomes exactly 8GB/s, or 250MB/s per link per direction. The PCIe v2, with 5Gbps per direction per link speed, exactly doubles these numbers.
Now, PCIe v3 goes ahead not with another double (10 Gbps), but with just 8 Gbps, 60 per cent more raw bandwidth. Yeah, it does help use similar tooling and save costs, but how to get double the real bandwidth? Simply. Encoding change - once you use the 'high bandwidth' full-speed PCIe v3 mode, the encoding isn't 8B/10B anymore, but 128B/130B encode, where a two-bit sync character is followed by a 128-bit payload. This, of course, brings the available theoretical bandwidth up by another 25 per cent, to 1GB/s per link per direction, exactly double that of PCIe v2.
For a 16-bit wide typical graphics PCIe slot in v3 variety, this would mean a whole of 32GB/s - enough to feed quad SLI or Crossfire off a single card without bandwidth bottlenecks.
Note that the new spec also envisions ultrawide 32-bit PCIe slots, twice the width of current graphics PCIe slots. Who would need a total of 64GB/s of I/O bandwidth? Maybe an inter-system link, huh? Anyway, don't be surprised to see the same width extensions in QPI as well - HyperTransport already has the 2 x 32 mode since its inception.
PCIe v2 has already added virtualisation and device sharing on top of doubled bandwidth. For PCIe v3, we should see further reductions in latency up to some 20 per cent on reads, as this often overlooked performance parameter - critical in more than just supercomputing or real-time apps - increases in importance. PCIe was initially worse than the decade-old PCI-X there.
Atomic ops on very small data should help in shared virtual memory HPC use where large packets were pretty much useless, while Transaction Processing Hints (TPH) reduce the system memory access latency for PCIe adapters. These two, according to early benchmarks, really seem to benefit the HPC apps most - look at the diagram showing dgemm in HPC vs spec_japp and tpcc tests.
Finally, something very important for those ubergamers and other 'enthusiasts' getting ready for 300+ Watt graphics cards in their post-Nehalem rigs: the eight-level (up to 32 levels possible) dynamic control of device power and thermal budgets in PCIe v3 should ease the implementation of such monster cards and, more importantly, ensure they don't continually consume all of those 300W (or more) even when they are basically idle.
And what does all of this mean to us mere mortals? Board-design wise, the 8Gbps per link choice may have been wise not to complicate things too much. However, except for the graphics slots and occasional high-speed interconnect peripherals, I don't expect too much PCIe v3 use initially. Simply, for most other add-ons, PCIe v2 has plenty of headroom. The apps that benefit from low latency - including, why not, ultra high-end audio and video, might like the new v3 features though, even without the spare bandwidth.
Also, the evolutionary approach means no major headaches expected in the next-gen chipset implementations to support it.
Talking about that - don't expect this stuff in full production for another year. Quite possibly, it will see the light of the day in the first Westmere (32nm Nehalem shrink) extreme chipsets, as well as similar AMD Bulldozer - or whatever they actually have - platform stuff, late next year. µ
Sign up for INQbot – a weekly roundup of the best from the INQ