For integrated GPUs, memory bandwidth is the biggest bottleneck whether doing gfx or compute.
AMD must have tweaked their cpu ucode in such a way that when CPU sees instructions pointing to memory addresses which are part of "stolen memory for graphics", it might be prefetching the data to some small ondie buffer.
They have removed the need for the communication to go from CPU GPU via RAM. Instead they're using on-die communication which isn't just 'a little more efficient', it's actually the greatest bottleneck.
Isn't that what CPUs already do: feed the GPU (along with doing some other sundry tasks)?
Ok, so they have made the intercommunication between the two a little more efficient, but ... the main problem is that most general purpose computations don't really scale well to GPU architectures.
But they are not completes! Something is missing...
@Richard
For integrated GPUs, memory bandwidth is the biggest bottleneck whether doing gfx or compute.
AMD must have tweaked their cpu ucode in such a way that when CPU sees instructions pointing to memory addresses which are part of "stolen memory for graphics", it might be prefetching the data to some small ondie buffer.
They have removed the need for the communication to go from CPU GPU via RAM. Instead they're using on-die communication which isn't just 'a little more efficient', it's actually the greatest bottleneck.
Isn't that what CPUs already do: feed the GPU (along with doing some other sundry tasks)?
Ok, so they have made the intercommunication between the two a little more efficient, but ... the main problem is that most general purpose computations don't really scale well to GPU architectures.