THERE'S A LOT of fake news going around about the upcoming GPUish chip called the GT300. Let's clear some air on this Larrabee-lite architecture.
First of all, almost everything you have heard about the two upcoming DX11 architectures is wrong. There is a single source making up news, and second rate sites are parroting it left and right. The R870 news is laughably inaccurate, and the GT300 info is quite curious too. Either ATI figured out a way to break the laws of physics with memory speed and Nvidia managed to almost double its transistor density - do the math on purported numbers, they aren't even in the ballpark - or someone is blatantly making up numbers.
That said, lets get on with what we know, and delve into the architectures a bit. The GT300 is going to lose, badly, in the GPU game, and we will go over why and how.
First a little background science and math. There are three fabrication processes out there that ATI and Nvidia use, all from TSMC, 65nm, 55nm and 40nm. They are each a 'half step' from the next, and 65nm to 40nm is a full step. If you do the math, the shrink from 65nm to 55nm ((55 * 55) / (65 *65) ~= 0.72) saves you about 1/4 the area, that is, 55nm is 0.72 of the area of 65nm for the same transistor count. 55nm shrunk to 40nm gives you 0.53 of the area, and 65nm shrunk to 40nm gives you 0.38 of the area. We will be using these later.
Second is the time it takes to do things. We will use the best case scenarios, with a hot lot from TSMC taking a mere six weeks, and the time from wafers in to boards out of an AIB being 12 weeks. Top it off with test and debug times of two weeks for first silicon and one week for each subsequent spin. To simplify rough calculations, all months will be assumed to have 4 weeks.
Okay, ATI stated that it will have DX11 GPUs on sale when Windows 7 launches, purportedly October 23, 2009. Since this was done in a financial conference call, SEC rules applying, you can be pretty sure ATI is serious about this. Nvidia on the other hand basically dodged the question, hard, in its conference call the other day.
At least you should know why Nvidia picked the farcical date of October 15 for its partners. Why farcical? Lets go over the numbers once again.
According to sources in Satan Clara, GT300 has not taped out yet, as of last week. It is still set for June, which means best case, June 1st. Add six weeks for first silicon, two more for initial debug, and you are at eight weeks, minimum. That means the go or no-go decision might be made as early as August 1st. If everything goes perfectly, and there is no second spin required, you would have to add 90 days to that, meaning November 1st, before you could see any boards.
So, if all the stars align, and everything goes perfectly, Nvidia could hit Q4 of 2009. But that won't happen.
Why not? There is a concept called risk when doing chips, and the GT300 is a high risk part. GT300 is the first chip of a new architecture, or so Nvidia claims. It is also going to be the first GDDR5 part, and moreover, it will be Nvidia's first 'big' chip on the 40nm process.
Nvidia chipmaking of late has been laughably bad. GT200 was slated for November of 2007 and came out in May or so in 2008, two quarters late. We are still waiting for the derivative parts. The shrink, GT206/GT200b is technically a no-brainer, but instead of arriving in August of 2008, it trickled out in January, 2009. The shrink of that to 40nm, the GT212/GT200c was flat out canceled, Nvidia couldn't do it.
The next largest 40nm part, the GT214 also failed, and it was redone as the GT215. The next smallest parts, the GT216 and GT218, very small chips, are hugely delayed, perhaps to finally show up in late June. Nvidia can't make a chip that is one-quarter of the purported size of the GT300 on the TSMC 40nm process. That is, make it at all, period - making it profitably is, well, a humorous concept for now.
GT300 is also the first DX11 part from the green team, and it didn't even have DX10.1 parts. Between the new process, larger size, bleeding-edge memory technology, dysfunctional design teams, new feature sets and fab partners trashed at every opportunity, you could hardly imagine ways to have more risk in a new chip design than Nvidia has with the GT300.
If everything goes perfectly and Nvidia puts out a GT300 with zero bugs, or easy fix minor bugs, then it could be out in November. Given that there is only one GPU that we have heard of that hit this milestone, a derivative part, not a new architecture, it is almost assuredly not going to happen. No OEM is going to bet their Windows 7 launch vehicles on Nvidia's track record. They remember the 9400, GT200, and well, everything else.
If there is only one respin, you are into 2010. If there is a second respin, then you might have a hard time hitting Q1 of 2010. Of late, we can't think of any Nvidia product that hasn't had at least two respins, be they simple optical shrinks or big chips.
Conversely, the ATI R870 is a low risk part. ATI has a functional 40nm part on the market with the RV740/HD4770, and has had GDDR5 on cards since last June. Heck, it basically developed GDDR5. The RV740 - again, a part already on the market - is rumored to be notably larger than either the GT216 or 218, and more or less the same size as the GT215 that Nvidia can't seem to make.
DX11 is a much funnier story. The DX10 feature list was quite long when it was first proposed. ATI dutifully worked with Microsoft to get it implemented, and did so with the HD2900. Nvidia stomped around like a petulant child and refused to support most of those features, and Microsoft stupidly capitulated and removed large tracts of DX10 functionality.
This had several effects, the most notable being that the now castrated DX10 was a pretty sad API, barely moving anything forward. It also meant that ATI spent a lot of silicon area implementing things that would never be used. DX10.1 put some of those back, but not the big ones.
DX11 is basically what DX10 was meant to be with a few minor additions. That means ATI has had a mostly DX11 compliant part since the HD2900. The R870/HD5870 effectively will be the fourth generation DX11 GPU from the red team. Remember the tessellator? Been there, done that since 80nm parts.
This is not to say that is will be easy for either side, TSMC has basically come out and said that its 40nm process basically is horrid, an assertion backed up by everyone that uses it. That said, both the GT300 and R870 are designed for the process, so they are stuck with it. If yields can't be made economically viable, you will be in a situation of older 55nm parts going head to head for all of 2010. Given Nvidia's total lack of cost competitiveness on that node, it would be more a question of them surviving the year.
That brings us to the main point, what is GT300? If you recall Jen-Hsun's mocking jabs about Laughabee, you might find it ironic that GT300 is basically a Larrabee clone. Sadly though, it doesn't have the process tech, software support, or architecture behind it to make it work, but then again, this isn't the first time that Nvidia's grand prognostications have landed on its head.
The basic structure of GT300 is the same as Larrabee. Nvidia is going to use general purpose 'shaders' to do compute tasks, and the things that any sane company would put into dedicated hardware are going to be done in software. Basically DX11 will be shader code on top of a generic CPU-like structure. Just like Larrabee, but from the look of it, Larrabee got the underlying hardware right.
Before you jump up and down, and before all the Nvidiots start drooling, this is a massive problem for Nvidia. The chip was conceived at a time when Nvidia thought GPU compute was actually going to bring it some money, and it was an exit strategy for the company when GPUs went away.
It didn't happen that way, partially because of buggy hardware, partially because of over-promising and under-delivering, and then came the deathblows from Larrabee and Fusion. Nvidia's grand ambitions were stuffed into the dirt, and rightly so.
Nvidia Investor Relations tells people that between five to ten per cent of the GT200 die area is dedicated to GPU compute tasks. The GT300 goes way farther here, but let's be charitable and call it 10 per cent. This puts Nvidia at a 10 per cent areal disadvantage to ATI on the DX11 front, and that is before you talk about anything else. Out of the gate in second place.
On 55nm, the ATI RV790 basically ties the GT200b in performance, but does it in about 60 per cent of the area, and that means less than 60 per cent of the cost. Please note, we are not taking board costs into account, and if you look at yield too, things get very ugly for Nvidia. Suffice it to say that architecturally, GT200 is a dog, a fat, bloated dog.
Rather than go lean and mean for GT300, possibly with a multi-die strategy like ATI, Nvidia is going for bigger and less areally efficient. They are giving up GPU performance to chase a market that doesn't exist, but was a nice fantasy three years ago. Also, remember that part about ATI's DX10 being the vast majority of the current DX11? ATI is not going to have to bloat its die size to get to DX11, but Nvidia will be forced to, one way or another. Step 1) Collect Underpants. Step 2) ??? Step 3) Profit!
On the shrink from 55nm to 40nm, you about double your transistor count, but due to current leakage, doing so will hit a power wall. Let's assume that both sides can double their transistor counts and stay within their power budgets though, that is the best case for Nvidia.
If AMD doubles its transistor count, it could almost double performance. If it does, Nvidia will have to as well. But, because Nvidia has to add in all the DX11 features, or additional shaders to essentially dedicate to them, its chips' areal efficiency will likely go down. Meanwhile, ATI has those features already in place, and it will shrink its chip sizes to a quarter of what they were in the 2900, or half of what they were in the R770.
Nvidia will gain some area back when it goes to GDDR5. Then the open question will be how wide the memory interface will have to be to support a hugely inefficient GPGPU strategy. That code has to be loaded, stored and flushed, taking bandwidth and memory.
In the end, what you will end up with is ATI that can double performance if it choses to double shader count, while Nvidia can double shader count, but it will lose a lot of real world performance if it does.
In the R870, if you compare the time it takes to render 1 Million triangles from 250K using the tesselator, it will take a bit longer than running those same 1 Million triangles through without the tesselator. Tesselation takes no shader time, so other than latency and bandwidth, there is essentially zero cost. If ATI implemented things right, and remember, this is generation four of the technology, things should be almost transparent.
Contrast that with the GT300 approach. There is no dedicated tesselator, and if you use that DX11 feature, it will take large amounts of shader time, used inefficiently as is the case with general purpose hardware. You will then need the same shaders again to render the triangles. 250K to 1 Million triangles on the GT300 should be notably slower than straight 1 Million triangles.
The same should hold true for all DX11 features, ATI has dedicated hardware where applicable, Nvidia has general purpose shaders roped into doing things far less efficiently. When you turn on DX11 features, the GT300 will take a performance nosedive, the R870 won't.
Worse yet, when the derivatives come out, the proportion of shaders needed to run DX11 will go up for Nvidia, but the dedicated hardware won't change for ATI. It is currently selling parts on the low end of the market that have all the "almost DX11" features, and is doing so profitably. Nvidia will have a situation on its hands in the low end that will make the DX10 performance of the 8600 and 8400 class parts look like drag racers.
In the end, Nvidia architecturally did just about everything wrong with this part. It is chasing a market that doesn't exist, and skewing its parts away from their core purpose, graphics, to fulfill that pipe dream. Meanwhile, ATI will offer you an x86 hybrid Fusion part if that is what you want to do, and Intel will have Larrabee in the same time frame.
GT300 is basically Larrabee done wrong for the wrong reasons. Amusingly though, it misses both of the attempted targets. R870 should pummel it in DX10/DX11 performance, but if you buy a $400-600 GPU for ripping DVDs to your Ipod, Nvidia has a card for you. Maybe. Yield problems notwithstanding.
GT300 will be quarters late, and without a miracle, miss back to school, the Windows 7 launch, and Christmas. It won't come close to R870 in graphics performance, and it will cost much more to make. This is not an architecture that will dig Nvidia out of its hole, but instead will dig it deeper. It made a Laughabee. µ