The Inquirer-Home

What Nvidia should do now

Part Three The cock-up
Tue Sep 02 2008, 10:03

This is the Third and final part of a series of three articles getting to the nub of Nvidia's graphics chip woes. The series is the result of months of research conducted by diligent INQhack Charlie Demerjian, despite an in-box stuffed full of abuse. Part One can be found here and Part Two is here.

SOURCES CLOSE to Dell say they knew about the problem a year ago, and HP is on record as being aware in November, so there has been about a year to characterise the problem, design a solution and test it. Multiple sources involved with package engineering tell us that this is not nearly enough time to do a proper test regime, much less long-term reliability studies.

This new package and materials set does not appear to have been nearly as carefully vetted as it should have been. It may work but, then again, it may not. If the lack of power distribution changes is accurate, we may very well be reading about Nvidia Defective Chipsgate II in a couple of years.

How widespread is the problem? We told you about G84 and G86s as well as G92 and G94s. From the materials side, it appears that all non-R and non-F lot numbered parts made on the 65nm and 55nm processes are defective. The flaw is a downright idiotic choice of multiple materials coupled with poor chip design and inadequate testing. It is a case of errors compounding errors. They are all defective.

If this is the case, why aren't we seeing more defective desktop parts? That one is easy... thermal stress. It has two components that lead to a bump fracturing, the amount of the stress, that is the hot cold temperature delta, and the number of times the part is powered up and down, that is the heat cycle. Glass cups in the oven would be the amount of stress, the bended fork would be the number of cycles.

If you remember back to the Nvidia 8-K where they announced that "...customer use patterns are contributing factors." By customer usage patterns, they are referring mainly to thermal cycles, but you could also credit them with meaning high temperatures while the GPU is being pushed hard in gaming and the like.

Desktop systems are usually turned on once a day or so. Some people leave them on for weeks at a time, others may turn then on and off a few times in a day. The average desktop probably has about one heat cycle a day.

Laptops on the other hand are woken up and put to sleep many times a day. If you take a typical student who wakes up, checks his email, goes to three classes takes notes, goes to a coffee shop for a bit, goes home, watches a video or two, then goes to sleep, it is not hard to make a case for 10 or more power cycles a day. Every wake up/sleep or hibernate cycle is a heat cycle, so dozens are not out of the question.

The more cycles you put on it, and the more severe they are, the quicker these defective parts will die. A good way to look at it is to assign the lifespan of each critical bump an amount of stress it can take before it cracks. Lets call this number 100AU for Arbitrary Units. If a power on cycle is worth 4 AU, and a hardcore gaming session with the CPU OCd to within 1MHz of it crashing is worth 15, you can figure out when it should die. Remember, these are hypothetical numbers... the theory is the point.

When Dell, HP and others announce a BIOS 'fix', the reason it is so humorous is that all they are doing is lowering the amount of thermal stress on the chips when the fan would not normally be on. When the fan is going full tilt without the 'fix', the new 'updated thermal profiles' won't make a difference. When the fans are normally off or on low, the profiles will essentially lessen the stress from a four to a three. It is just there to allow the laptop to live through the warranty period so the companies don't have to pay for the fix. After that, if the defective chips burn out, it isn't their problem. The 'fix' doesn't fix anything at all.

In the end, it comes down to Nvidia screwing up badly on package engineering and testing, then trying as best they can to bury the problem while passing the buck. It appears that every Nvidia 65nm and 55nm part with high lead bumps and/or low Tg underfill are defective, it is just a question of how defective they are, and when they will die.

As far as we are able to tell, contrary to Nvidia's vague statements blaming suppliers, there are no materials defects at work here. Every material they used lived up to the claimed specs, and every material they used would have done the job while kept within the advertised parameters. Nvidia's engineering failures put overdue stress on the parts, and several failures compounded to make two generations of defective parts. The suppliers and subcontractors did exactly what they were told, Nvidia just told them to do the wrong thing.

When it started talking about this, Nvidia failed crisis management 101, and the coverup shows it doesn't care about consumers, just its bottom line. NV is doing exactly the wrong thing for the wrong reasons, and the lawyers circling with class action paperwork in hand are going to eat them alive.

The last time you had such a huge batch of defective GPUs, the company that did it swore up and down – just like Nvidia – that there was no problem despite forums filled with evidence to the contrary.

A few weeks later, they turned around and admitted there was a problem, and took a $1.1 Billion charge, placating customers and fending off lawsuits.

You know that as the Xbox 360 Red Ring of Death.

I wonder why Nvidia can't be that smart? µ

Share this:

Comments
Huh?

So can it explain why my old mobo's with integrated nvidia's chipset and onboard 6100 graphic card failed three times after being replaced by gigabyte after each failure, because now I feel guilty to gigabyte for blaming them about faulty mobo's that it should have been on nvidia side.

And to think that I planned to buy asus f9sg with 9300m chipset this weeks, I think I might have to reconsider my plan again and go with the integrated X3100 or X4500 if the price is right...

posted by : discontent user, 15 September 2008 Complain about this comment
Dear idiots on this comment board.

There is a class action suit on Nvidia


GG

posted by : Uglynerdman, 10 September 2008 Complain about this comment
This is proof that god exists

This is surely proof that god exists.

Nvidia are truly evil, and this is god paying them back.

How sweeeet....

I'm enjoying every second of this. I might have to pull up a chair and grab some popcorn to see what happens next....

posted by : 99flake, 07 September 2008 Complain about this comment
returning my HP w/ 8400M GS

My HP 17" laptop started to shutdown once in a while and still boots-up. Then last night it started, I have to wait 5 mins to reboot and happened more than 6 times. I plugged-in my USB backup HD and transferred my files and cleaned my account. Returning it today at COSTCO.

posted by : happy4arnel, 05 September 2008 Complain about this comment
WorkHardening2

I should add , no matter what gpu you want to take shots at ,if it has this variety of materials ,metal, glass silicon, ,all reacting to temperature fluctuations independently,your going to get this expansion and contraction,or workhardening,,and theres nothing can be done other than cooling the whole thing really well.. could have shown any gpu under the IR and see the same heat image...

posted by : Boilerhog, 05 September 2008 Complain about this comment
WORK HARDENING!

Bunch a nubs, the flexing of a metal until it gets harder to the point of fracture,is called ,in the metal trades anyway as WORKHARDENING,it will generate its own heat as this happens also ,just like the bending fork will get warm as it hardens. as it gradually gets harder to flex, finnally suffering catostrofic failure, can happen to a cranes boom,why not a graphics chip...
.none of my generations of nvidia cards have failed and i oc the lot..i've also seen many a lappy owned by the younger crowd sittin on the BED ! from time to time,tryin to suk air through the mattress.so i think this may be just normal ,yer just noticing it now...

posted by : Boilerhog, 05 September 2008 Complain about this comment
NVedia Turns 360 to 180

Er, excuse me? Did anybody forget about the XBox 360 ring of death?

The Nvedia GPUs were overheating and the XBoxes were being shipped back and forth for a fix a while back plus including an article of price cuts on XBoxes here right now.

One laptop I had in the pass, the ATI outlasted the actual lcd screen.

I heard of ATI way before NVedia, I use ATI now, I feel that I made the right choice long before all this.

Rule of thumb: The secret to my choice was simple, I looked at the outside physical structure of An ATI versus an Nvedia and the Nvedia immediately looked over cluttered electronically. But to reinforce my choice to ATI, I opted to a medium class card that for me was to ease any concern that the ATI would fail prematurely. A piece of the pie is better than no pie at all.

posted by : Phil, 04 September 2008 Complain about this comment
sounds like theres another gremlin to blame

Sounds to me like the engineering is not really to blame. The problem seems to be more pervasive, and it has to do with the bank. The greed of the industry refreshing the products in an unhealthy cycle. Most other industries make changes on a line up, but don't make a new product out of it. Like Chevy changing the suspension on an impala, why release a whole new brand with minor updates? Why not make a 8800 with support for dx10, and a little increase on clock speeds. Its like apple releasing the airport with g support, but holding onto the g drivers to then charge 7 dollars to update. Same thing with the Iphone, why release a premium product with 3g and bluetooth profiles when you can release it next year at the same price? I am not saying dont release premium versions of your products with more features along with value ones.

posted by : missingxtension, 04 September 2008 Complain about this comment
rohs? wtf are you taling about aki009?

@aki009

what is your problem with ROHS? 

Or to say it differently, have you ever read the series?

Nvidia was using high lead solder bumps. So this has NOTHING to do with ROHS.

And who let the nvidia fanbois out of the basement?

posted by : energyman, 04 September 2008 Complain about this comment
Material not carefully vetted?

Looks like the McCain presidential campaign tried to help out with the vetting process.
"Well, your video card sucks. But NVIDIA has a lot of experience making them!"

posted by : Hans Meiser, 04 September 2008 Complain about this comment
point #7

jxf011, so if my father promises to not snort crack, but does.... it's ok, because he told me not to?

Longevity testing.... I believe they can predict a products lifetime by powering it on until it gets to max temp, then turning it off. They do this repeatedly, probably hundreds of times a day. If they manage to get 95% of them to last 2000 cycles before failure, they predict it should last at least 5 years or something. 

Anyone have a laptop they feel like donating... we could script it to power on, run 3dmark06 then shutdown for 5 min or something. We'll see how many cycles it lives through. We could do the same with ATI cards, and maybe a few desktops thrown in for comparison.

posted by : Bounty, 04 September 2008 Complain about this comment
Chucky gets help

Well Chucky, I just found this for you. So bath in your 15 minutes of glory:
http://www.tradingmarkets.com/.site/news/Stock%20News/1858813/


posted by : R.R. Johnson, 03 September 2008 Complain about this comment
fanbois

I'll never understand why people are "fans" of companies. If you're an ATI fan, and they make a crap card (as they have done in the past) would you go out and buy said crap card, even though Nvidia's equivelant is 25% better? Same goes for the Nvidia fans if they release a rubbish card (as they, too, have done in the past). Surely all you are doing by "supporting" a manufacturer is limiting your own purchasing options and the gains that can be made from said purchase? I know that if i spend £250 on a graphics card tomorrow i wont be giving a thought to the colour of the stock cooler, I'll be going for the one with the best reviews and benchmarks. Ultimately you guys are the losers, not the card manufacturers.

As for the article, I've seen nothing about this on any other site. When another site has "breaking news!!! nvidia cards dropping like blind OAP's on a tight-rope" I'll take note.

posted by : someone, 03 September 2008 Complain about this comment
1 Fact For Sure

Is that Nvidia never have cared for the consumer all these fanboys in here can go on and on and deny it. But if there card just went like that and they wasnt getting the customer support that they should get boy the skies the limit for complaints.

This reporter is just telling the truth science could be brought in for facts on the materials thats shouldnt be used as someone else stated in there comments. nVidia only care about 1 thing creating chip selling it on getting to no.1 and then moving on it's been like that for years.

ATi at the moment is the card to have I still am using a 6600GT I was going to upgrade to a 8800GT but now ATi are too good to pass down.

posted by : Dave C, 03 September 2008 Complain about this comment
DAAMIT!

How much were you paid by DAAMIT for this article Charlie Dreamerjian?

At least you appeal to other fanATIc's.

posted by : Homebrew, 03 September 2008 Complain about this comment
Nvidia will never be smart.

When you have your CEO go on a bitchfest about how everyone is picking on you like a whiny little brat, you will never have enough intellect or social graces to make things right. This chip cock up is a perfect example.

Serious cock ups like this are what leads a company out of the marketplace. 

Puss off your vendors and your customers in one swift move? Brilliant.

posted by : Viscountalpha, 03 September 2008 Complain about this comment
Very good article series ...

I have to say that I really enjoyed the articles, it was simple enough for me to understand as I don't have a very thorough understanding, but he managed to water it down so that it was understandable, but more importantly believable.

I guess at this point I feel I have to, although begrudgingly, side in a little with the nVdia fanbois.

Personally, I don't think any high-tech company in this situation would act any differently, nor do I think that is hard to imagine this happening to ATI under the right set of circumstances.

As one commenter mentioned it is really a fault of the business climate of profits before responsibility. And please try to keep your Commie comments to a minimal, they can have their profits but just don't screw us over!

-Nobody

posted by : Nobody, 03 September 2008 Complain about this comment
Compal,Quanta? Dell and Hp are more a issue than Nivdia as a whole

I'd wager to say that the issue that compounds this problem more than nvidia under-engineering their chips is the ODMs of the world making puss-poor designs available to be sold to Oems like Dell/Hp/Acer/...etc.

Beyond that, Nvidia makes the chips but they do not make the actual graphics cards or MCMs.

They give a set of specs for each chip that specify a operating range they consider safe and it is the job of the people using their chips in systems to engineer and design their products to work in that range or better.

Nvidia has no control whatsoever over a OEM shoving a gpu into a poorly designed laptop and telling its users to "just run that fan all of the time and you'll be ok till the warranty is over."

It would be nice if everyone didn't low-ball the designs and made quality products from the start but that is no the reality of and industry I know of in this day and age.

As aluded to in the RRoD refence, when you cut corners in production or design for asthetics sake you get shafted and either lie and hope the parts last untill the warranties are over, or get stuck with you pants down and have to shell out hush money, or worse actually fix the issues at some expense.

Beyond that a GPU is much harder to engineer than a modern cpu for the costs they are sold for and life-cycles they work within.

Apple has pulled a similar bullcrap move with its 3g problem with the 2nd Iphone that they swear works fine yet many report problems. 

They deny problems with overheating laptops and batteries untill the warranty period is over and tell the consumer "too bad"

I hope all of their chips are faulty and they all fail very soon so the problems are addressed by market forces such as AMD/Ati knocking them out of business, AMD/Ati getting a more substancial market share at Nvidias expence or at the least forcing Nvidia's hand because of loss in shareholder value by virtue of lowered share prices shake up the leadership teams to crack some whips or be outright replaced.

Much in the same way that intel has totally kicked AMD's collective butt on the CPU front for a few years now has forced AMD to strive to make better products or be drive out of the CPU market. Nvidia has dropped the ball, maybe not as severe, but enough to change peoples opinions of them and give them more reason to look towards AMD/Ati for a gpu solution.

In the end this will be good for the gpu outlook in a "survival of the fittest" kind of way and I think handeled much better than the whole RRoD debacle was by Microsoft.

And don't think that Nvidia is getting away with this without disclosing who's fault it is free and clear. As any of you who follow stock will see they have taken a huge hit in share price as of late, and that is really were a company takes notice not mere"customer sentiment"

Would it be better had they gotten it right from the start? Is it not a big issue? Maybe yes, maybe no. But Nvidia could have done far worse as a whole and in the end I'd have more a issue with HP/Dell or whomever sold the faulty PC trying to playcate customers with a "bios fix" nad generally skirt the issue untill the customer is stuck without recorse since at the end of the day Nvidia doesn't not even fab or assemble the GPU's it designs but instead sells them for use by OEMs and ODMs who inturn sell them to the customer as functional and ultimately has to provide reasonable support for said parts.

posted by : gabe, 03 September 2008 Complain about this comment
Knowledge is Power, ignorance costs money

I usually don't care what the sticker says as long as it's a quality part that gets the job done. Both are capable architecture makers. 

For laptops I would definitely think twice before buying anything from nVidia at this time, but only if there were equivalent offering from someone else (S3 ans SIS are nearly non-existent) and right now neither AMD nor intel have anything in the GF8800M/9800M's class. So until something develops I would take an explicit 3 year warrant on a GF9800M over a mobile HD3650. 

As humourous as nVidia's reaction to this is, it's equally humourous that ATi/AMD once the leader in mobile graphics, is in such a poor position as to not really profit much from this other than a very small few percentage points, not the kind of turn around to put them back ontop.

I'd love to see an HD4K laptop part, but at the rate AMD is going I might be more interested in intel's offerings by the time they both arrive. However for near term I definitely would think twice before buying an nVidia part, but that too will pass just like the negativity towards NetBurst-P4s, who cares after they're fixed?

Just hope nV learn to produce quality parts more than learn better PR tactics.

posted by : Knightshader, 03 September 2008 Complain about this comment
Good job

I love these articles, Chuck. Won't ever change my mind about how I feel about Nvidia or ATI, but I love reading them. You have a way of making ATI fanboys feel secure about their purchase for the first time since the 9700pro. And being the ONLY "reporter" that's not ashamed to show his biased against any one side, you sure have a way of pushing everyone else to the nVidia side.

Seriously, I don't think any person in his right mind would take you seriously after trying to argue that Nvidia is cheating on benchmarks using PhysX. http://www.theinquirer.net/gb/inquirer/news/2008/06/23/nvidia-cheats-3dmark-177 See what I did there? It's a link to a source. Whenever you make a claim, it's a good idea to place on of those 'links' to prove that you're not just blowing smoke. Charlie, take note. Come on, man, Nvidia isn't cheating, they're just using their GPUs for more than just graphics processing, much in the same way ATI did when they made their GPGPU tech. Where were the "ATI is cheating on Folding@Home with GPGPU" articles? For that matter, why have you not mentioned how the 128 shaders on Nvidias last generation card can hold its own against ATIs current generation 800 shader 3850s? Where were the articles talking about how great the G92s were and how everyone should stay away from the Radeon 3000s?

I thought journalists were supposed to be unbiased. You don't actually get paid for what you do, do you Charlie? If so, it's a good thing I use ad blockers, I could never see myself supporting a site that gives you a pay check.

posted by : WTF, Chuck?, 03 September 2008 Complain about this comment
..credibility 101?

.As an interested party(I also have a horse in this race, albeit a small 1), I have read all 3(5?) of these articles and I can say from the mist i gather NV has been making money not only by market share but by using el-cheapo parts/techniques. Bravo Charlie, if it is true...though i wonder about the NV hating in your articles - for instance, the above is titled 'What Nvidia should do now', yet I see no mention of your opinion on a possible fix...? I have no interest in NV beyond use of their product, however if you are to be taken as credible, please, @ least, stay on topic...

posted by : interested party, 03 September 2008 Complain about this comment
Liar Liar

Liar liar plants for hire!

I own 3 laptops containing two of the chips you claim are bad. One of them I use for gaming on a very regular basis and have been for about a year. I have seen no sign's of bad chips yet and my gaming laptop is overclocked!

posted by : Todd, 03 September 2008 Complain about this comment
Go home nvidea spam boys

It seems like nvidea is actually paying people to spam the inquirers truthful and well researched articles.

I can hardly believe that people can be as dumb as for example ken...

I love the rants against Microsoft and nvidea it shows that the inquirer is independent.

posted by : geo, 03 September 2008 Complain about this comment
Tip of the RoHS iceberg

This NVidia issue is probably just the tip of the RoHS iceberg. The NVidia decision to switch to a high-lead solder was done for the products to be RoHS compliant (which in itself is counterintuitive, but go figure the mindset of those who wrote the regulations). Many other manufacturers have made the same decision for the same reasons.

But this is all fundamentally a government regulation inflicted problem. It was well known to those pushing the leadfree solder point of view that there would be problems with the longevity of electronic components and devices due to problems that non-eutectic solders (the traditional 63/37 PbSn stuff) have. This is why medical devices, servers, and other high reliability devices were excluded from RoHS regulation.

So what did RoHS leadfree solder regulations buy us? Texas Instruments estimates in a presentation they published some time ago that they reduced lead use by the amount contained in 20 car batteries per year. Similiar figures are typical for the industry, and total lead reduction due to leadfree solder is minimal. But the impact to product reliability are now coming home to roost.

Thanks a lot, euroregulators.

posted by : aki009, 03 September 2008 Complain about this comment
I'm still not happy about this situation

I could be described as a NVidia fan at times, though I have owned both, but this really doesn't make me view NVidia too favorably. 

Trying to put a lid on this is probably hurting NVidia among some of it's potential customers (like me who won't buy a 260 GTX even after price drops) because of the fear that if there does turn out to be a wide spread issue in desktop parts too, there won't be much support.

I have an 8800GT in my box right now. Since this story has broken, every little issue I have with 3D really makes me wonder - thankfully I have very few.

posted by : Artemis, 03 September 2008 Complain about this comment
Sucks to be a consumer

Normally I've had nothing but good things to say about Nvidia, I've been buying and using the hell out of their graphics cards for almost as long as they've been making them.

Unfortunately, this is a real problem, I had a GeForce 8800GT that died on me completely out of the blue before I read anything about the problems that some of these cards have been having. I came home after a weekend trip about two weeks ago to a computer that wouldn't boot up. The motherboard just spat out the beep error code for 'your graphics card is f*cked up' and I couldn't do a damn thing about it, except to get a new card.

Unfortunately, I bought a new card without realizing that other cards were having the same issues, and I picked up a 9600GT that is on the lists that I've seen of defective cards. So here's hoping I get at least a year out of this one, the last one died a few weeks short of working for one solid year.

I'll likely still buy nvidia's cards, I just wish that they would take exchanges for defective cards when there are problems like these.


posted by : Brian, 03 September 2008 Complain about this comment
Speaking of Thermal Stress - How about ATI 48XX Thermal Stress?

Given all the talk about thermal stress, why hasn't there been talk about the insane temperatures the 4850 and the 4870 run at? Something like 70-80+ Celsius seems to be the norm for these babies (which is 10-20 degrees hotter than the GTX260 and GTX280).

Last I checked, for every 10 degree increase in operating temperature, the life span of the chip is halved.

ATI may be the next Charlie victim given these temps we are seeing from their blow torches of a GPU.

posted by : G, 03 September 2008 Complain about this comment
Guess I'm next!

I purchased a new desktop in February and it came with an NVIDIA GeForce 8600 GS graphics card. I'm wondering how long it will last in this desktop as I do use it multiple times a day in and out of sleep cycle and in a non air conditioned room with temps in the high 70's. I am not a geek, just a home user and I was wondering if the Nvidia card could be swapped for an ATI card, or would that require a BIOS flash and something a home user shouldn't attempt?

posted by : dbm1rxb, 03 September 2008 Complain about this comment
And...

I notice a LOT of "nvidia fanboy" hate in the comments after this article.

And...

the amount of douchebags in here that would be willing to suck the Ceo of ATi's pen15 is overwhelming.

And pathetic.

Being an ATi fanboy is like being a cultistic follower of the special olympics.

posted by : ostar, 03 September 2008 Complain about this comment
Oh there will be

The Law suit is definetly comming, the Xbox settlement took a while, they are waiting to see if they will admit there failure. Besides it takes time to build an effective case.

posted by : Tim, 02 September 2008 Complain about this comment
OEM Lesson Learned?

Great series, but what is the takeaway from an OEM perspective? Did they design to the TDP, then have the TDP changed too late for anything but a kludge? Or was the first sign of a problem the influx of RMAs? What should Dell, HP, others have done differently if anything?

posted by : Clay Marley, 02 September 2008 Complain about this comment
Hmm ATI too?

" the last two Radeon parts that I had died after I got them configured up and in use for some fairly stressful 3D gaming? Damn things flamed out really fast, the replacement one dying just as fast- in a machine with clean power and low temps elsewhere. The irony is that the 8800 in it has lasted longer than the two Radeons combined"

Chuck, you on an in-depth and investigative report on this one too?

posted by : Focker, 02 September 2008 Complain about this comment
Awesome

Excellent work, best IT reporting I've seen in ages.

posted by : b, 02 September 2008 Complain about this comment
good comment jxf011

and good one Charlie!

I hope the blind fanboys will calm down and see the truth.

posted by : energyman, 02 September 2008 Complain about this comment
Hmmm

Me being a retired instrumentation tech, I have to take Charlie's word for it. I do think that Nvidia as well as ATI will suck the cash out of your account as fast as you will let them. Once a AMD and ATI user before they teamed up, I now have Core 2 and Nvidia for they had claimed the performance title not to long ago. I don't care what company the parts come from, just how they perform. 

I also do not buy the lastest, greatest stuff, my dad taught me to wait to see what happens with new stuff. He was right. So I sit with a P35 DFI LT T2R, some old OCZ memory, still get 4-4-4-12 out of four gig at 415mhz, that's 830 for most of you, and my old 8800gtx. Runs Vista 64 great Charlie. Never had a single problem that wasn't self inflected. 

So if you can't wait to see if a batch of whatever you want works in the real world, well, your just a little hasty aren't you?

And if you don't think that every company, big or little, is just out for the bottom line, I'd say you haven't lived long enough to get really screwed. 

I think Charlie is more than likely spot on, but still, remember the old STTNG saying, "Buyer beware."

posted by : sleepy, 02 September 2008 Complain about this comment
The perils of modern technology

Great article. It also helps to explain to people outside the trade why when Intel or AMD announces a new series of processors that there's often a delay in shipping the units -- there's a lot more to making a reliable product from a working prototype than most people think.

I once worked on a leading edge (wireless) product that took about a year to go from "working" to "capable of being sold to customers".

NVidia may be victims of Powerpointitis (wild -ass guess alert). Its very easy for the business managers to ignore or gloss over potential problems in development and production. Those of us who work in engineering know that its a constant struggle to keep their expectations realistic -- they want to report great things to the board (and shareholders) and its so easy to make nice looking slides and glossy brochures....

posted by : Martin, 02 September 2008 Complain about this comment
a quick cuestion

Are desktop parts affected as well?

What about geforce 9800GTX or 9800GX2 ?

posted by : Lior, 02 September 2008 Complain about this comment
temps for bumps

Charlie good work. 

I would like to see sme kind of writeup for ati, and how the 4870 takes the temps it does, and maybe how the gtx 280 is now doing?

Someone also commented about upgrading the cooling, affecting th bumps?

This won't happen to any measurable degree, as besides phase cooling etc, you won't get it below ambient, which it will get to when you switch the thing off!!!

Just a few notes to nv fanboys.
Failure rates can be taken with a pinch of salt, as microsoft were claiming less than 3% until the last minute on the xbox,

How can you honestly support a company that sells you a known defective product, then pushes you a short term "fix" that eats your battery just so you have to replace the laptop from your own pocket when the warranty is up? How moronic are you guys going to get!!!

I would however also like to know about any class action suits against them? i havnt heard anything yet.

And as for dell and hp not getting into court with nvidia for passing the buck? You seriously think that dell and hp will be paying full price for the 9400 and 9600m parts for their laptops? No, it will be a case of "you scratch my back, i'll scratch yours".


posted by : craig, 02 September 2008 Complain about this comment
Nice read

I've own both ati and nvidia and go back and forth between cards depending on which is better at the time so fanboy I'm not.

This article was well researched, well written and actually felt like reading a professional. Well done on the article. 

People complain about what he wrote, how he writes and small one line quotes without reading the whole article. Read the articles and use your brain to actually comprehend what he is saying. The parts aren't defective per se as designed individually nvidia picked the wrong parts for the thermal cycle stress they are putting it under. This stress is increased by increasing frequency of the heating / cooling cycle found in laptops and depending on how you use your desktop might effect it as well but since most desktop users leave a computer running for long periods of time it might not affect you during the warranty period. Lucky for me I have a card with a lifetime replacement :)

posted by : Doogie, 02 September 2008 Complain about this comment
paid FUD

Charlie, should Nvidia die, you'd lose your job as there's nothing to write anymore (I don't see you writing of anything else, except praising ATI every now and then with FUD). Have you thought about that?

posted by : az, 02 September 2008 Complain about this comment
Nice work Charlie

Haven't read such articles from you for a long time. Even biased against nVidia and you still pulled out some nice articles.

Keep up the good work and write articles of at least this caliber. I promise to read them all.

posted by : Felician Balint, 02 September 2008 Complain about this comment
Great 3 part series

Just have to say bravo to Charlie and his work. Great insight to the manufacturing of chips.

posted by : Charlie, 02 September 2008 Complain about this comment
Nvidia Has Always Been Leading Edge.

Whether Manufacturing error by End Card Makers with pooer designed chip recieved or plain crummy, Nvidia is Leading edge manufacturer of GPU.

Intel talks larrabee, yet Intel ALWAYS makes crummy Cards. Never has Intel product lasted much beyond its warrentee, yet it does get that far. Very clever at predicting failure time & executing it. Look at Charlies head for Closer example of what Intel can do.Smashed Pumkin?.

Nvidia is beyond these chip numbers now, with 2-- & now 3-- coming onboard. Best thing to do is find somewhere to relieve themselves of last of out going stock. maybe Blue Heaven. They need belittled conversation on flaws.

Anyway, Tomorrows Another Day.
drashek








posted by : LOL, 02 September 2008 Complain about this comment
Lets be objective people

What I don't really get is why people are still taking sides. This is not an nvidia fanboys vs nvidia haters issue, where it is a matter of preference or opinion. Not even the most assidious fanboy can deny that THERE IS A PROBLEM. THIS IS A FACT. A company like Nvidia would not go out and disclose an issue like this in a SEC filing, of all places, if the issue were not dead serious. The problem is the way Nvidia is handling the issue. They are saying: "yes, we are aware of the problem and we'll fix it. We are so pro-costumer that we we are willing to spend millions to fix something that was not our fault".
I don't buy it, for several simple FACTS:
The so called fix, recommended by Dell and HP, is to install a BIOS upgrade that will modify the fan's behavior. Anyone who thinks this will solve the problem is an idiot. It will only make the chips fail later rather than sooner. The second fix is a ONE YEAR warranty extension. WHAT? So I buy a top of the line Dell XPS or a Special Edition Pavilion dv6000 and then start praying after the two years are up that it doesn't fail? This is unnaceptable people. And here is the kicker: HP, Dell and Nvidia flat out refuse to tell us, their paying customers, if the boards they ship back to us STILL HAVE THE SAME DEFECTIVE CHIPS ON THEM. How can Nvidia say: "we stand by our customers" and then flat out refuse to inform their customers about this simple fact?
And let me finish with another little known fact: My company deals with laptop repairs here in South America, and here HP is NOT extending the warranty. In fact, they have the gall of telling us that HP laptops in South America are not using the affected components! Come on... they use the EXACT same motherboards with the EXACT PART NUMBERS! Why can they get away with this down here? Because here, our legal system does not contemplate Class Action Suits. It's always a single customer vs HP.
So these companies behavior can only lead me to believe that they are just trying to make the problem go away, unsuccessfully I hope.
My thanks to Charlie, who certainly is in my opinion, an Nvidia hater, but whose articles will only contribute to pressure Nvidia and partners to do more to solve their customers problems. As they should have done from day one.

posted by : Augustosamame, 02 September 2008 Complain about this comment
Re:KennyBoi

Hey Kenny Boi i can do selective analisys too: you make a lot of assumptions and therefore you must an ignorant fool!

What makes you think that they dont just (careful now, new word ahead...) *reuse* all the package engineering done for the previous generation? And while your doing so, why not develop and test the next gen of packages simultaneously?

And i'm sure that 'Chucky' knows the diference between stress and strain: one causes the other... Unlike what you apparently believe these are not dificult words! And even if 'Chucky' didn't know the diference he could still use them both interchangeably: you see, it happens that most of this site's readers are equiped with an inference engine called 'brain'...

DVO

posted by : DamienVessa, 02 September 2008 Complain about this comment
Multitasking

Of course there's enough time to test the stuff. If proper design and testing takes two years and you release a new product every year, you just need two new products in the pipeline at any one time.

The problem with insufficient heatsinks is that there are just too many laptops that fail. You can't assume that every one of these manufacturers used insufficient heatsinks as specified by Nvidia.

Desktops of course have more room for larger heatsinks. Hence less thermal stress overall.

Finally, the article states at the outset that it glosses over a lot of details, and can therefore be excused from not being scientifically rigid when distinguishing between stress, strain or whatever.

posted by : smurf, 02 September 2008 Complain about this comment
60 C & 1 PowerCycle p/d

Good series of articles, Charlie will be remembered and commemorated for quite some time; both by friend and foe. 

As an owner of an GF84 enabled laptop I already figured out (with tools like Rivatuner, AMD processor throttle ) that it would be wise to keep the GPU below 60 C, use XP as an OS ( instead of the default Vista installation ) and never hibernate.

I now agressively powersave/underclock both the GPU & CPU as they share the heatpipe cooling and am convinced these measures will make these chips outlast their 3-5 year lifespan.

Vista Aero really pulled the GPU @ all times..... would be interesting to see the OS stats on the fallout.

posted by : Aryan, 02 September 2008 Complain about this comment
about the "exaggerations"

Have you thought about the possibility, that maybe nVidia is designing more than one generation of their chips at one time?
I don't think they have just one team, that does one chip after the other...
So I think it is very much possible, that it takes a lot more than one year to get a chip from scratch to batch

posted by : Fraggy, 02 September 2008 Complain about this comment
Science

If you do not pay attention to the science that's what happens. You can find plenty of scientific papers on the net saying certain material combinations should not be used. No need to write long comments just google and see what happened or read the inq series. 

I guess some people think nvidia green causes less heat , oh yes nvidia green($) really cause less heat for some, for others i guess nvidia applies g-force to the head. Now understand where those names came from ...

Thanks to inq for the search ...

posted by : Disclosed, 02 September 2008 Complain about this comment
Questions, All defective != All fail, replies to comments

These comments cover all 3 of Charlie's interesting, educational, somewhat sensational, and totally fun to read 3 part article on Nvidia's engineering issues. Questions, a statement, and replies to other commenters follow.

Are the ATI 48x0 series chip/substrate materials high-Pb/high-Pb, high-pb/eutectic, eutectic/high-Pb, or eutectic/eutectic?

How about low Tg or high Tg substrates - which type is used for the ATI 48x0 series?

Charlie's thermal discussion makes me think that lowering the average load and no-load temps for a GPU with an aftermarket cooling solution is a double-edged sword: good - reduced thermal stress on bumps, bad - maybe the designers based the bump material and placement on certain chip temps. On the plus side, lower temps keep the underfill in a stronger state which should always be good.

My personal concern is I want a 4870 with a quiet and better cooling heatsink/fan combo (like the almost released Thermalright T-Rad2 and a 5v 120mm) but will this lower the GPU temps outside the engineer's desired range and screw up the bumps?

Before replies, a general statement:
"All 65mm and 55mm are *defective*"
This does not mean all these parts will *fail*. Most will serve their lives with no failure. But, the failure rates are higher, according to Charlie's sources and HP/Dell web pages, due to this encompassing defect. Note, all hardware/software has defects, it's just that some of them manifest themselves in failures, sometimes lots of failures. And then some companies try to sweep it under the rug ... that's what we have here.

Ok, now to some responses to other comments harshing on the content of the articles.

1. "Strain, not Stress"
This is the only criticism that is definitely right from a technical point of view. But, from a journalism and communication point of view, Charlie's transgression isn't quite so bad. I read the line with "harder" and "less stress" and knew what he meant even though he should have written "stronger" and "less deformation." This is no big deal since the flow of the writing should have made it clear to a non-mechanical engineer what Charlie was saying. And to the outraged commenter who pointed this out, it is a fallacy to say that mis-understanding (your opinion) or mis-communicating (my opinion) stress/strain completely invalidates everything Charlie says on this topic or in general. Also, some of your statements are not completely correct like "stress is pent up force" (it's force/area) and "stressing force" (no such thing, bad English and/or poor technical understanding).

2. "... how do we know if Dell or HP were realistic with the heat envelopes of their laptop designs?"
Do we even know that HP and Dell actually *tested* their laptop thermal designs? I know for a fact that major companies that we all know and love use testing data from outside manufacturers on at least some of their out sourced components and systems. I'm not saying I know this is true for NV or AMD GPUs but the big guys do it for some items. The process is the source manufacturer's engineers and tests a component/system. The source manufacturer gives the test data to the [insert household acronym here] company that reviews, questions, and maybe tests/checks some of the data. Designs are done at the big company and maybe some final testing is done. My point is that with some component/systems, the big company depends on at least some (or all) of the testing from the source manufacturer. This means bad/erroneous testing from the (very biased) source manufacturer can weed it's way all the way to the final big company product or multi-product system. Either the big company's testing is wrong or they were wrong for trusting the source manufacturer. And, importantly, don't think big companies with 2, 3, and 4 letter names always test perfectly when they do test. There are lots of super smart folks at all these tech companies but mistakes occur in process and, more frequently, in management when cranking out the latest technologies.

3. "... 100+ desktops all running high performance nVidia GPUs ... Where are the failures?"
Did you read all 3 parts of the article? Failures should be (are?) much more common with high thermal cycling, like with laptops. If your desktops go on and off once a day, they'll be much less likely to fail.

4. "... [Nvidia] was ready to spend $200 to fix a $20 part failure, it strikes me he could have been expressing a sincere change of heart...""
I think the cost to NV is $200 to the OEM to fix a $20 part. I don't see $180 of extra sincerity going to the end user. It's just the cost of doing a band-aid (sending another part, possibly defective) to the end user. How about paying the OEM $200 to fix the part *and* giving the end user a $50-$100 Nvidia coupon? That's taking care of the customer!

5. "List sources at companies ... Or better yet post their emails to you"
This is ridiculous since the sources of the information are surely risking their jobs and even their careers by contacting a journalist about engineering issues costing a multi-billion US$ company multi-100s of millions US$. Charlie wouldn't be a professional journalist if he handed over sources like some flamers have demanded.

6. "You ppl are such jerks. When a company is going through rough patch..."
Hilarious. The point isn't that NV is having a tough time, it's that NV is covering up. Even Intel and Apple have eventually bit the bullet and admitted to a big problem with big costs and helped the end user. Though with Apple they sure fight fessing up to the last minute. :)

7. "...lost all credibility when he declared in a brutal rant that he would never write about or use Vista again ... but kept writing about and using Vista."
Classic mistake in a freshman Philosophy course on logic. An analogy is a father tells a son "I'll never smoke, you shouldn't smoke, it's bad for you" and then later the father starts to smoke. The son says "everything you said about smoking is wrong!" Well, of course, smoking is still bad for the son (as Vista is bad for users) even though the father started smoking (or Charlie has/is using Vista).

8. "[generic technical/personal flame]"
Why bother clicking on links for Charlie's articles if he's so technically incompetent, biased, and un-fun to read. Why post flames when many flamers state this isn't first outrage of Charlie's? Just skip the article and don't post a comment! It's like a person replying to an email thread "I don't have time to reply to this thread" - so why reply?!

Anyway, keep up the great job Charlie!! Thanks, Jim. jxf011

posted by : jxf011, 02 September 2008 Complain about this comment
to ken and other fanboys

no its clearly not Nvidia fault, its not nvidia chip, not envidia enginees design...
oh wait!

posted by : cigonas, 02 September 2008 Complain about this comment
Re: Ken

"Chucky states that a year is far too short a time to properly engineer and test a change in materials such as the ones that he has been yapping about.

But, Nvidia releases new chips every year and a half to two years"

So your assuming Nvidia produces Chip A has a party, then starts on Chip B the next day lol

posted by : Matt, 02 September 2008 Complain about this comment
Bush lied.

Yeah, Bush lied and Nvidia didn't. Where can you get so much entertainment so cheap. Sic 'em Charlie. Go boy go!

posted by : Big Dog, 02 September 2008 Complain about this comment
8800GT and the last minute change

Remember oh so long ago when the Inq wrote about nV contacting a lot of partners about thermal profiles on the 8800gt?

We all put it down to a last minute change of speed in order to beat ATI on performance. It might very well be true. But at the same time, you have to wonder how much they knew and when they knew it when they made this request.

It's an interesting time.

Reminder to all, dont forget that PR spinners comment on these articles too. That's not directed at anyone, just something to keep in the back of your head.

posted by : GZ, 02 September 2008 Complain about this comment
Would this..

..also explain why the last two Radeon parts that I had died after I got them configured up and in use for some fairly stressful 3D gaming? Damn things flamed out really fast, the replacement one dying just as fast- in a machine with clean power and low temps elsewhere. The irony is that the 8800 in it has lasted longer than the two Radeons combined- I just hope it doesn't cark out, too. 

Bleh, whatever, I really don't want to go back to Radeons anyway, not a fan of the Linux "support" at all. Shame it's a two-horse race, really.

posted by : Curious Jeremy, 02 September 2008 Complain about this comment
Ken is obviously an Nvidia worshipper

His snide, sadistically long and pontificating rants at Charlie, only reinforce my belief that this Ken, whoever he thinks he is, is nothing more than a snivelling worshipper of Nvidia, who's only objective is to contribute additional FUD on top of what Nvidia has already spewed forth.

Nvidia has clearly screwed the pooch, and any babbling from a source who won't 'fess up to which dog he's backing in this fight, only makes me dismiss his self-anointed "credibility" and his comments.

We KNOW who Charlie is, who the heck are you, Ken?

posted by : Rich Wargo, 02 September 2008 Complain about this comment
Dear Ken,

take, a downer.
Then try to write something as coherent, informed, credible on technical, economic and social level.
And make it so that everyones FUD-o-meter doesn't break.


posted by : keese, 02 September 2008 Complain about this comment
Thank you

I for one read all three parts, enjoyed every one of them, and took on board what Charlie had to say.

What he has done is RESEARCH the topic in hand - pretty in-depth too - and while he may be biased towards nVidia's death, its a far better way than biased against it. i.e. if you think nVidia isn't at fault then you are going to miss important pieces of evidence that prove the point.

Don't know if you care Charlie, but I thoroughly appreciate your effort and time that you spent into explaining the situation; rather than simply saying the chips are failing.

posted by : Steve, 02 September 2008 Complain about this comment
GTFO my INQ

Oh christ. Here come the hysterical butthurt fanboys.

> Chucky states that a year is far too short a time to properly engineer and test a change in materials such as the ones that he has been yapping about.'t.
No, "kenny", he didnt state that, "multiple sources involved in package design" did.

Of course they can't do long term reliability studies on new chips. How can you test something for a year or more before releasing it, if it's going to be obsolete by then anyway.
Instead they just test the manufacturing process in general, but if that's new too then there are limits to the reliability testing that is possible.
And of course a year is too short, if you are talking about a proper reliability study.. That is unless you only want your chips to live for a year.

As for insufficient cooling, that isn't the point. These chips have failed due to _cracking_ not thermal degradation of silicon. As charlie says this is caused by a multitude of things, but ultimately fitting a bigger heatsink may make the problem WORSE, as it increases the thermal delta, one of the factors that contributes to this failure.

In depth investigation of failure rates? These are new chips. They haven't had TIME to fail naturally yet, but they will. And it will be interesting to see how many of them do.

Sheesh, get a life.

posted by : Jim, 02 September 2008 Complain about this comment
Kenny do you know what a pipeline is?

NVIDIA engineers may work on every chip for 2 years and more. At any given time there are 3-4 ongoing projects at different stages.

posted by : ken not, 02 September 2008 Complain about this comment
Re: More of Chucky's exaggerations

Ken,

I'd expect NV have multiple engineering projects running at once that overlap. So it wouldn't matter that the development period takes a year and they release product every 6 months.

posted by : Matt, 02 September 2008 Complain about this comment
The NV Fanboys commeth...

Is it a big surprise that the first response is from an NV Fanboy? Despite clear evidence of improper chip design, changes to chip packaging that aren't cheap to do and couldn't be "coincidence" to anyone who is thinking, and the obvious "pass the buck" tactic of Dell and HP providing a bios "fix" that only turns the fans up to lengthen the life of the part so that it will fail beyond warrantee, Fanboys dismiss it all and Charlie. When this all blows up in NV's face, the Fanboys will likely blame Charlie for it, since he's the only one who had the guts to report the truth and tell it like it is. Get over it fanboys. Your NV parts are junk. You've just had a three part scientific explanation as why they were improperly designed, why they are failing, and how NV and it's partners are attempting to cover it up and pass the buck. Charlie isn't making this stuff up. This isn't NV hate. It's TRUTH and it's just been explained to you. If you don't get it, take the articles to a friend in the science department of any engineering school, run it by them and see if they agree with your fanboy perspective. 

It's all coming out into the open and you will soon see that you are worshiping a company that cares more about it's wallet than you- The customer. You may as well join your Apple Jesus Phone fanboy buddies in a round robin love fest of stupidity. 

Keep after it Charlie. Ignore the idiots.

posted by : Hammer, 02 September 2008 Complain about this comment
Ken is a moron?

Well, mr. obvious fanboy, despite your fancy words you seem to be either very young or just dead stupid. If you think nVidia can engineer and produce a new GPU every 6 months, you're wrong. These things take a lot longer than that, and the only reason you don't perceive it like that is because GPU's are developed by different teams working in parallel. It may take 6 months just to set up a production line, nevermind developing a new GPU.

Great work, Charlie, I haven't seen this caliber of investigative reporting for quite a while!

posted by : Erick, 02 September 2008 Complain about this comment
I get emails from EuroPC offering a lot of refurb laptops with NVidia chips.

Google "europc" and you wil get:
EuroPC - Online store for clearance, surplus and refurbished computer products.

Check out the refurb laptops, lot of low priced good spec laptops with 8400, 8600 gpu's in them.

I wonder how much stock they have, and why they are knocking 20% or more of their normal prices. Could it be supply outstripping demand?

posted by : interested_party, 02 September 2008 Complain about this comment
Chucky's rants

Once again Chucky doesn't have enough hard evidence to backup his claims. Desktop GPUs for NVIDIA continue to be under 1% for RMAs (obviously he didn't research this) so his claims about desktop GPUs being defect should be taken with a grain of salt. However OEMs are taking this matter seriously with Dell having extended their warranties with notebooks affected by this issue. Chucky sounds so anti-NVIDIA that we will probably never see an anti-ATI article from him

posted by : R.R. Johnson, 02 September 2008 Complain about this comment
the question is:

how come there isn't a friggin' class action lawsuit going on?

posted by : Jean Chevreuil, 02 September 2008 Complain about this comment
aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?