The Inquirer-Home

AMD says HSA will cut latency bottleneck in GPU processing

Capacity is more important than bandwidth
Fri May 03 2013, 17:15
AMD logo

CHIP DESIGNER AMD said the unified memory architecture in its upcoming Kaveri chip will allow more workloads to be offloaded to the GPU by mitigating the latency bottleneck caused by repeatingly having to fetch data from system memory.

One of the key aspects of AMD's heterogeneous system architecture (HSA) is the ability for developers to deploy code that runs on the CPU's on-die GPU and access main memory. AMD told The INQUIRER that being able to cut down on the number of memory fetches will make running more workloads on the GPU viable.

Both AMD and Nvidia sell GPGPU accelerator boards that make use of GDDR5 memory that offers considerably higher bandwidth than the DDR3 memory that is used as main memory on most systems. However AMD director of software Margaret Lewis said that the latency of memory copy operations - the act of taking data from system memory and putting it on memory that is addressable by the GPU - was a far bigger problem than the relative bandwidth difference of using DDR memory to feed the GPU.

Lewis said with the unified memory architecture "you can apply that routine more broadly". She elaborated, "Maybe you were using a smaller amount of data you didn't want the overhead of the [data] copy back and forth and now you have a larger footprint that can [allow you to] move more data in.

"There is a range of smaller activities that you never thought of moving to the GPU before because of the expense of the memcopy [memory copy function]."

According to Lewis, HSA doesn't change the fact that Kaveri will still be an accelerated processing unit much like Llano, Trinity or Richland, with a CPU and a GPU. Instead she said that HSA makes it viable to move workloads that would make use of very large datasets.

She said, "There were things that you maybe thought would fit well on a vector processing node but you never did it because you knew that you would have do so many memcopies that the latency would kill you. HSA doesn't change what kind of processor it is, but it changes the parameters because of the lack of memcopy and the capacity you get with memory, it just means it [the GPU] comes into play with workloads you didn't think of before."

Lewis isn't the only person to talk up the performance issues caused by limited memory on GPGPU accelerators. Nvidia has talked about the need to pipeline data fetches in order to feed its Tesla boards and Acceleware, a firm that teaches OpenCL and CUDA programming and is partly funded by Nvidia, talked about the importance of careful memory management to ensure high performance when deploying on GPGPUs.

AMD's first HSA compliant chip will be Kaveri, which the firm has pencilled in to appear later this year. AMD is also working on bringing HSA support to the Linux kernel and compilers, including the popular GNU C compiler. µ

 

Share this:

blog comments powered by Disqus
Advertisement
Subscribe to INQ newsletters

Sign up for INQbot – a weekly roundup of the best from the INQ

Advertisement
INQ Poll

Microsoft's Windows 10 Preview has permission to watch your every move

Does Microsoft have the right to keylog users of its Windows 10 Technical Preview?