ATI TODAY IS announcing its new Stream transcoding paradigm, and unlike some others, it makes quite a bit of sense. Stream now parses workloads between the CPU and the GPU, and does things much more intelligently.
There are some who think that the GPU is the only thing that matters in a computer, mainly because that is all they sell. There is a term for that point of view: wrong. Most workloads tend to have serial and parallel portions, and GPUs are very good at the parallel side, but are horrible at the serial parts. CPUs are the exact opposite.
Use what makes sense
Some problems have a serial portion that takes so much time that the parallel portion is almost free by comparison. Others are so parallel that the time needed for the entire workload just depends on how many cores you throw at them. Most are more balanced, so if you stick all your compute on the GPU, things go much slower than parsing them sanely.
Not a UN org chart, just GPU compute
That is what makes the new stream APIs so interesting. They not only pull in the CPU and GPU, but also the video decoders on the GPU as well. If there is a step in a problem that needs lots of serial number crunching, it is shunted to the CPU, parallel parts to the GPU, and video decoder if needed. Instead of using one or two parts, the latest Stream can do it all, at once, theoretically well.
For transcoding, this can make a huge difference, one of the major steps, the initial decode and scaling, is essentially free. This allows both of the main chips to do the things that they should be doing without interruption.
As is the norm with all of these releases, ATI put out a bunch of graphs showing Stream transcoding pummeling Nvidia cards with the same software in H.264 and MPEG-2 transcoding. They are comparing some low-end cards to other low-end cards, so we will have to wait and see how well it does when the usual suspects compare things across a wider range of cards.
The new transcoder should work on most 4000 series GPUs. The older version only used the higher-end cards, but now the compatibility matrix goes all the way down to the lowest-end 4350 - almost integrated territory there. The lowest-end cards don't do GPU encode though, they simply don't have the horsepower, and would be slower than a CPU.
You can get the new Stream bits for free in the Catalyst 9.5 'hotfix', available now, but only for the Broken OSTM. It works with Cyberlink Espresso, and a lot more to come soon. There will also be XP versions as well, but that will be a few months off. Basically, all of the usual encoding and HD playback wares will support the new encoder on the next rev. µ
Nice to hear that AMD are tackling the problem logically, unlike "morons" which say the CPU is dead (when their own SLI scaling depends heavily on CPU horsepower).
I highly doubt the "intelligence" of the task delegator to properly divide tasks between the CPU and GPU in any generic fashion. Any proper transcoding program will use the GPU and CPU for the correct type of tasks anyway, so what is the point?
ATI/AMD do some nice stuff
- and are in the great position of having viable CPU & GPU technology
- unlike both Intel & (cough) nVidia
- will AMD ever stop bleeding money though?
amd are achieving lots without having to bribe their retail associates
intel are losers - practically AND morally
... and let me see if I got this right. So, in usual and expected Nvidia bashing from Charlie, "There are some who think that the GPU is the only thing that matters in a computer, mainly because that is all they sell". But the fact that AMD makes their transcoding proposal public mentioning CPU and GPU has absolutely no relation to the fact that, oh, AMD sells both CPUs and GPUs?
Now, isn't AMD doing EXACTLY the same Nvidia is doing? Promoting the products they sell?
Charlie, you should really stop hating Nvidia and try to be more neutral... By now I just read your articles for fun, not for facts, and to try to extract some minimal truth from your opinionated language...
Txabi obviously buys nvidia cards and kisses them gently before sleeping. It is amazing how attached he is to them. They are so cute at this age.
First, How'd Lousy SSD Today do on Radom Write In ATI Enviorment? OCZ just got UP from 50 Kb/s to ?blazing 10 Mb/s. Random READ/Write Is What Parsed IS ALL About. Hurrican of Sorts, With HANG Gliding SSD.
Next: How Many Lanes does IT Play To? First Vista Ultimate 32, Before 64 Even Was Possible, for public, Used 19 of 40 lane potential, Eg 1-16X slot 2 pci & 1- 1X. Period & that took 4 months to HOLD Down, Once OUT, With Updates. finally Ultee' 32 was IN (Stable) by Aug'7.
Todays Cards Are Screamers, Yet Can Your software Work In that enviorment, especially with SSD, Intel SSD Has Busted 50 Mb/s Random Write Barrier recently on Random R/W Scores. Wow. Say it Again, Wow.
Up To Pentium era. Put ANY of Todays TOP Performers On IT, Well, Clue turns OUT To Be, more Descrete Channels within SSD ItSelf, Beyound controller. ALL Needing to Be Mapped. Might Be Months & months & months till ANY SSD has Guts To Feed Such Monsters. Each Channel has 30 Mb/s Capbility, So order of Using ALL Channels ALL time IS Imperative To REACh Even SATA Scores, Let Alone SATA III. Computex Will You SHOW Us?
ITS GREAT STUFF, YET HARDWARE IS OF GREAT DISPARITY, AS STATED, ITS 5995
XFX World or Bust....
Charlie, you are announcing that a "paradigm" is better than actual solutions and software that exist and are in use today (I use badaboom and tmpgenc xpress 4.0 and they work very very well). Vaporware is better than real software? Man, c'mon, just report about the facts, instead of just shouting out your bias.
@txabi
The difference is that AMD makes sense and is right, whereas nVidia is wrong. It isn't just about the GPU as nVidia would have you believe. GPU compute is good for highly parallel data loads, that's it.
isn't transcoding a "highly parallel" process?
if you had nvidia card, you could clearly see that both GPU and CPU are in usage when transcoding with say badaboom and such...
but why bother with actual testing when you can just stay ignorant & rant, right?
This only shows that AMD is readying for OpenCL. They still do this quietly and with out much fanfare. Nvidia was the machine behind bringing GPGPU into the limelight and are just as ready when it comes to OpenCL. You do have to make money when you put resources down to bring a technology attention.
The programmers can use CUDA to implement the GPU as they wish.... TMPGENC Xpress 4.0, for example, uses the GPU to render the filters and the cpu to do the actual encoding. Leaving this up to ATI's *intelligent* API is just dumb.
@ James M:
what is that comment about? yes I do buy Nvidia cards because I consider them superior. From your comment I take that you imply that my liking Nvidia is a stupid, uncool, childish thing. Therefore, should everybody just buy whatever suits your needs best? The fact that people have needs/opinions different from yours does not entitle you to make fun of them. Your rudeness is proved by your verbal assaults towards me and that makes you a worse person than any of the respectful people in this thread of comments.
@ Natfly:
Pay attention to what I wrote. I'm NOT defending Nvidia, but neutrality instead. These days anytime someone says the slightest non praising comment about AMD he/she is relegated as a "fanboy". I'm not saying Nvidia's proprietary strategy is laudable, I think OpenCL should be the focus of all companies right now, but never forget the very real fact that it was thanks to CUDA that Nvidia made GPGPU relevant in the market, and it was thanks to Physx that they interested gamers in ingame realtime physics. Had they been done before? Yes, thanks to Havok, but it is thanks to Nvidia's (stupidly) proprietary strategy that now GPGPU enjoys all this attention. Now I hope the do the only intelligent thing to do: port Physx from CUDA to OpenCL so that everybody can use it. After using both OpenGL and CUDA physics solutions, I think the latter is more realistic, but that's personal preference and doesn't exclude the fact that it should be ported to an open format so that it's productive for the whole market. Now, AMD can obviously push the CPU+GPU strategy because they produce both, Nvidia doesn't have that luxury and so they are trying to absolute focus in the GPU. I agree that their usual tune of "CPU will die, GPU is all you need" is completely ridiculous because we still need a CPU for functions that are not parallel, but the fact that much of what Nvidia says is bullshit doesn't give everybody the right to criticise even what they are doing right.
Always telling the other guy how to improve his product.
always it's some other guy
it's just a broken lullabye
substitution mass confusion
'raised you right, your ante.
So righteous making fudge
I know shes having a fit,
She doesn't like me a bit
She buangs like William Jensun Huangs
Ati-boy
Larts do the transcord again
when gaming at a high res with everything maxed a cpu aint no thang.
"Pay attention to what I wrote. I'm NOT defending Nvidia, but neutrality instead."
I never assumed or said that you were defending nVidia. Being neutral doesn't mean you have to refrain from critical thinking to find out reasons for making false statements.
Pointing out that nVidia only makes GPUs was, what I assume, an attempt to find reason behind their obviously incorrect statement.
Charlie insulted Microsoft with his "Broken OS" comment, but apparently only nVidia has any defenders on this site. I hope it is better to be feared than loved or Microsoft is in trouble.
microsoft & customers are like parents & children. we love them, we fear them, we depend on them & we sometimes hate them. but when you hate your parents you keep it to yourself!
charlie is adopted.
First time commenting here. But doesn't it all just come down to one thing... perf/price where = to Happy place?
Who cares how each gets to the end result, doesn't it all come down to the cheapest way to 75+fps with 4xAA on Cysis????
cut the crap - DOES IT WORK?! plenty of bs in comments above, but no one has tested?
well I for one was very interested, I have been waiting for what seems an eternity for ati to get act together to get transcoding working on GPU.
First test - NOT GOOD. DETAILS:
PC Spec: q6600 quad core 2.4 @ 3.4ghz
gfx - ati 4850 1gb @ 720mhz gpu / 1100mhz ram
cyberlinks "MediaShow Espresso"
350mb tv show xvid to mpeg2
RUN1- NO STREAM: using cat 9.4
gpu usage steady at 4pc, cpu usage 100percent across all cores.
TIME TO OUTPUT: 310 seconds
RUN2- USING STREAM OPTION, with of course cat 9.5 hotfix package
gpu usage around 14 percent, often dropping to 0 and occasionaly peaking to 20pc. cpu usage just under 50pc
TIME TO OUTPUT: 436 seconds.
Conclusion - WTF?!!! 2 mins 6 seconds, 40 percent SLOWER with stream?!!!
Thats the bad news. What is WIERD, is that with ouput file size from run1 was 1.02GB. Output from run2 (stream) was 1.81GB
So, with stream, it took 40 percent more time, but the output file size with what appears to be the same quality, is 56 percent smaller file size?!
Is this what is supposed to happen?
Anyone else done tests?!
I will try going the other way, from mpeg2 to avi and see for next run...
update - run 3 using cat 5 hotfix NO stream - as run1 above, but a minute quicker lol
doesnt seem to be able to convert mpeg2 to avi xvid or indeed anything to xvid...so...meh...
i guess its nice to have much smaller file sizes, using the stream processing update and free up cpu usage for other stuff while converting, just a shame it isnt faster and more converting available in espesso.
i will keep an interested eye out on future media products supporting this...
http://www.legitreviews.com/article/978/1/
with a fast CPU - NVIDIA CUDA is the way to go.
with a slow CPU - ATI Stream excels
so as usual, low-end systems are better off with ATI. high-end system's realm belong to NVIDIA. amen to that.
Applejack - thanks for the good link there..interesting results..
In Charlie's world:
AMD/ATI = Everything is always Good
nVidia = Everthing is always Bad
In the real world, it's a little more complicated, as the link above shows
- sometimes the ATI solution is better, sometimes the nVidia solution is better
- depending on your host-system
Looking at the link above its obvious that nVidia solution is using more CPU power after all, so the entire article flawed............