EVER SINCE Nvidia and AMD introduced unified shaders and compute capability into their graphics processors, the idea came to use that power for non-graphics - preferably even general purpose - computing applications. Since then, both Nvidia and AMD have managed to get applications ranging from Photoshop and Powerpoint to various supercomputing and scientific codes accelerated on their GPUs.
The claimed performance increases vary, depending on the ratio between local computing on the GPU and having to - slowly - move data between the GPU and the rest of the system over the high latency PCIe bus. You'll see anything from double-digit per cent gains to over 10 times increases depending on the vendor, the application and the degree of marketing skew involved, that is, measuring a GPU against a single CPU core or a whole multicore CPU, for instance.
So, especially coupled with GPU makers' increased marketing efforts, might you think there'll be even more of it? Well yes, except there seems to be a fat brick wall facing that prospect. Three key problems block further adoption of GPUs for more applications.
First, there's the processor architecture. GPUs are not only not compatible with the terrible but predominant x86 instruction set, they are unlike any general purpose processors of any architecture. GPUs have hundreds of small mini-cores with very proprietary instruction sets that may change considerably with every hardware generation, as they are not exposed to the outside world, but buffered through drivers and programming interfaces. Complex, and usually never fully disclosed cache and memory architectures, coupled with unpredictable future internals of each vendor's GPU family, do nothing to create a base of well versed low-level coders like the people who made the x86 instruction set a success.
Second, the programming model gets in the way. It would be ideal if GPUs were co-processors to CPUs like the 80x87 chip was to the 80x86 processor some three decades ago in the early PC days. But they aren't there yet. Being in a separate memory space, with totally different and disparate instruction sets, means programming 'heterogeneously' via OpenCL or CUDA and offloading, copying and copying back the data. And, often heavily changing the source code to have it make use of that GPU acceleration. This is where Intel's 'son of Larrabee' Xeon Phi chip design excels, as it is basically a mutated, vectorised x86 version, with a re-compile doing most of the work.