CHIP DESIGNER Nvidia has released the Kepler-based Tesla K10 GPGPU accelerator board.
Nvidia's Kepler architecture has found its way into three consumer products, however the firm has finally slipped it into a Tesla GPGPU accelerator board. The firm's Tesla K10 accelerator board features two Kepler GPUs and is aimed at speeding up single-precision floating point workloads.
Nvidia's Tesla K10 board has two Kepler GK104 1536-core GPUs, with each GPU providing 2.29 teraflops of single-precision floating point performance and 0.095 teraflops of double-precision floating point performance. Nvidia has increased the total onboard memory to 8GB, meaning that larger datasets can be accessed, however per-GPU memory bandwidth has actually dropped from the previous generation to 160GB/s.
Although Nvidia didn't disclose clock speeds for the two GK104 GPUs, Sumit Gupta, senior director of Nvidia's Telsa business unit did say that they are half those of Fermi, due to power consumption constraint. Nvidia said that its Tesla K10 board with two GPUs has the same thermal design power as its single GPU Fermi-based Tesla M2090.
When Gupta was asked how much of the power savings Nvidia claims come from the process shrink down to 28nm and how much from architecture, he said, "Process gives us something, we went from 40nm to 28nm but processes typically don't give you that big a leap in power anymore because there is still a lot of leakage power today. A lot of it was architecture redesign, for example by reducing the clocks [speeds]. We almost halved the clocks and it significantly reduced the power. We also improved the efficiency of the architecture."
Gupta continued by saying that Kepler-based Tesla cards get closer to their theoretical peak performance, something that every high performance computing cluster is desperately trying to achieve. He said, "In the past Fermi was able to deliver 65 per cent of the performance of its peak, Kepler is going to be able to do 80 to 85 per cent of its performance compared to peak. So we have improved the delivered efficiency as well, so there is a huge amount of architectural impact that is part of the performance per watt improvement."
For Nvidia's customers the biggest boost actually comes with PCI-Express Gen3 support, meaning that bandwidth has doubled to 16GB/s. As Nvidia's and other GPGPU accelerator boards typically stall while waiting for system memory, typically DDR3 at this time, to feed them with enough information, Gupta said the company would "gladly accept increases in PCI-Express bandwidth".
Nvidia introduced its Hyper-Q technology that can execute 32 message passing interface tasks at one time, with Gupta saying that the Tesla K10 board has higher utilisation but for smaller periods of time. The problem with this is feeding the GK104 with enough data to crunch, for which traditional CPUs still have the upper hand.
Gupta said, "Fermi and Kepler can address up to 1TB of memory, the challenge is GDDR5 memory do not come in those sizes. So today we can do up to 6GB of memory [per GPU] with the current Fermi and Kepler products. As better memories come along we'll be able to do 12GB and in the future 24GB."
Nvidia has enabled its GPU Direct technology that allows GPUs in a cluster to access other GPUs' local memory, bringing direct access to vast amounts of memory. However Nvidia's GPU Direct technology is hamstrung by bandwidth limitations, which even with Infiniband is restricted to 40Gbit/s, a figure that is nowhere near high enough for an GPGPU accelerator that is left idle by a 16GB/s PCI-Express bus.
Although Nvidia's Fermi based Tesla M2090 was getting a bit long in the tooth, the firm decided it had such a cushion over its rivals that it has staggered its Kepler-based Tesla launch, with the Tesla K10 sporting less than a third of the double-precision floating point performance of the Tesla M2090. Nvidia said its Tesla K20 card with the GK110 GPU, details of which it kept under wraps, will tip up in the fourth quarter of 2012.
Nvidia should be commended for dragging GPGPU accelerators into the HPC market and eventually forcing Intel to take note with its Many Integrated Core (MIC) accelerators. Intel is expected to announce an update to MIC next month at the International Supercomputing Conference, where Nvidia will be trying to win new business with its Kepler GK104 and its upcoming Kepler GK110 accelerator boards.
Nvidia has released Tesla K10 clock speeds to The INQUIRER, with the GK104 GPU running at 745MHz and the GDDR5 memory running at 2.5GHz. As Nvidia said, the Kepler-based Tesla K10 GPU clock speeds are significantly lower than those of Fermi-based Tesla boards, with the Tesla M2090 board running its GPU at 1.3GHz.
Given that Nvidia’s Tesla K10 boasts close to a four-fold increase in single-precision floating point performance, it highlights a big improvement in architectural efficiency in Kepler over Fermi. µ