• Home
  • News
  • Artificial Intelligence
  • Internet of Things
  • Open Source
  • Hardware
  • Software
  • Security
  • Resources
  • Industry Voice
  • SMB Spotlight
  • Newsletters
  • Resources
    • Inq-logo-120x194
      The new standard in wireless networks and supporting the future needs of clients

      802.11n is certainly not dead and whilst manufacturers are still recommending 802.11n deployments, enterprise IT managers should give some thought to and make plans for the eventual implementation of 802.11ac. This white paper discusses how 802.11ac is being designed to meet the demands of clients in the future, help you understand the technology, what is likely to happen in the transition from 802.11n to ac and how you can get ready to meet these new demands.

      Download
      Inq-logo-120x194
      A holistic view of application performance

      Enterprise organisations are constantly being asked to do more work with fewer people, as the size and complexity of infrastructure and applications continue to grow unabated. This guide is intended for companies, organisations, and IT professionals who are looking for a network and application monitoring tool that provides a holistic view of application performance, including performance monitoring, from the end user perspective.

      Download
      Find resources
      Search by title or subject area
      View all resources
  • Follow us
    • RSS
    • Twitter
    • LinkedIn
    • Newsletters
    • Facebook
    • Google+
    • YouTube
  • Newsletter
  • Industry Voice
  • SMB Spotlight
The Inquirer
The Inquirer
  • Home
  • News
  • Artificial Intelligence
  • Internet of Things
  • Open Source
  • Hardware
  • Software
  • Security
The Inquirer
  • Graphics

What Nvidia should do now

Part Three The cock-up

  • Charlie Demerjian
  • 02 September 2008
  • Tweet  
  • Facebook  
  • Google plus  
  •  
  •  
  • Send to  
0 Comments

This is the Third and final part of a series of three articles getting to the nub of Nvidia's graphics chip woes. The series is the result of months of research conducted by diligent INQhack Charlie Demerjian, despite an in-box stuffed full of abuse. Part One can be found here and Part Two is here.

SOURCES CLOSE to Dell say they knew about the problem a year ago, and HP is on record as being aware in November, so there has been about a year to characterise the problem, design a solution and test it. Multiple sources involved with package engineering tell us that this is not nearly enough time to do a proper test regime, much less long-term reliability studies.

This new package and materials set does not appear to have been nearly as carefully vetted as it should have been. It may work but, then again, it may not. If the lack of power distribution changes is accurate, we may very well be reading about Nvidia Defective Chipsgate II in a couple of years.

How widespread is the problem? We told you about G84 and G86s as well as G92 and G94s. From the materials side, it appears that all non-R and non-F lot numbered parts made on the 65nm and 55nm processes are defective. The flaw is a downright idiotic choice of multiple materials coupled with poor chip design and inadequate testing. It is a case of errors compounding errors. They are all defective.

If this is the case, why aren't we seeing more defective desktop parts? That one is easy... thermal stress. It has two components that lead to a bump fracturing, the amount of the stress, that is the hot cold temperature delta, and the number of times the part is powered up and down, that is the heat cycle. Glass cups in the oven would be the amount of stress, the bended fork would be the number of cycles.

If you remember back to the Nvidia 8-K where they announced that "...customer use patterns are contributing factors." By customer usage patterns, they are referring mainly to thermal cycles, but you could also credit them with meaning high temperatures while the GPU is being pushed hard in gaming and the like.

Desktop systems are usually turned on once a day or so. Some people leave them on for weeks at a time, others may turn then on and off a few times in a day. The average desktop probably has about one heat cycle a day.

Laptops on the other hand are woken up and put to sleep many times a day. If you take a typical student who wakes up, checks his email, goes to three classes takes notes, goes to a coffee shop for a bit, goes home, watches a video or two, then goes to sleep, it is not hard to make a case for 10 or more power cycles a day. Every wake up/sleep or hibernate cycle is a heat cycle, so dozens are not out of the question.

The more cycles you put on it, and the more severe they are, the quicker these defective parts will die. A good way to look at it is to assign the lifespan of each critical bump an amount of stress it can take before it cracks. Lets call this number 100AU for Arbitrary Units. If a power on cycle is worth 4 AU, and a hardcore gaming session with the CPU OCd to within 1MHz of it crashing is worth 15, you can figure out when it should die. Remember, these are hypothetical numbers... the theory is the point.

When Dell, HP and others announce a BIOS 'fix', the reason it is so humorous is that all they are doing is lowering the amount of thermal stress on the chips when the fan would not normally be on. When the fan is going full tilt without the 'fix', the new 'updated thermal profiles' won't make a difference. When the fans are normally off or on low, the profiles will essentially lessen the stress from a four to a three. It is just there to allow the laptop to live through the warranty period so the companies don't have to pay for the fix. After that, if the defective chips burn out, it isn't their problem. The 'fix' doesn't fix anything at all.

In the end, it comes down to Nvidia screwing up badly on package engineering and testing, then trying as best they can to bury the problem while passing the buck. It appears that every Nvidia 65nm and 55nm part with high lead bumps and/or low Tg underfill are defective, it is just a question of how defective they are, and when they will die.

As far as we are able to tell, contrary to Nvidia's vague statements blaming suppliers, there are no materials defects at work here. Every material they used lived up to the claimed specs, and every material they used would have done the job while kept within the advertised parameters. Nvidia's engineering failures put overdue stress on the parts, and several failures compounded to make two generations of defective parts. The suppliers and subcontractors did exactly what they were told, Nvidia just told them to do the wrong thing.

When it started talking about this, Nvidia failed crisis management 101, and the coverup shows it doesn't care about consumers, just its bottom line. NV is doing exactly the wrong thing for the wrong reasons, and the lawyers circling with class action paperwork in hand are going to eat them alive.

The last time you had such a huge batch of defective GPUs, the company that did it swore up and down – just like Nvidia – that there was no problem despite forums filled with evidence to the contrary.

A few weeks later, they turned around and admitted there was a problem, and took a $1.1 Billion charge, placating customers and fending off lawsuits.

You know that as the Xbox 360 Red Ring of Death.

I wonder why Nvidia can't be that smart? µ

  • Tweet  
  • Facebook  
  • Google plus  
  •  
  •  
  • Send to  
  • Topics
  • Graphics
  • Charlie vs NVidia
  • Nvidia

INQ Latest

Galaxy S7 Edge leak
Galaxy S8 specs, release date and price

Note 7 owners in Korea offered chance to bag a discounted Galaxy S8 next year

  • Phones
  • 24 October 2016
Smashed iPhones
Thieves have stormed an Apple store and stolen Apple phones

Stole enough to have two each and give five away

  • Hardware
  • 24 October 2016
Software bug
Rowhammer: Memory chip flaw enables hackers to root Android devices

Hardware-based attack requires no software vulnerability or user permission

  • Security
  • 24 October 2016
rinder-bot
UCL creates AI 'lawbot' that rules on cases with surprising accuracy

But can it master the American Smooth?

  • Software
  • 24 October 2016
blog comments powered by Disqus
Back to Top

Most read

New Apple MacBook
MacBook Pro leak points to Skylake, 2TB SSD and Magsafe USB-C adaptor
Google Pixel XL display
Google Pixel price, release date and specs: Nougat duo launch in the UK
DDoS code
Dyn DDoS attacker used a huge Mirai botnet of unprotected IoT devices
Intel chip
What you missed in tech last week: Intel CPU flaw, Dyn DDoS, Surface Pro borkage
Piracy
Cisco developers tech to automatically shut down pirate video streams
  • Contact
  • Marketing solutions
  • Enterprise IT Events
  • About Incisive Media
  • Terms & conditions
  • Privacy policy
  • RSS
  • Twitter
  • LinkedIn
  • Newsletters
  • Facebook
  • Google+
  • YouTube

© Incisive Media Investments Limited 2015

© Incisive Business Media (IP) Limited, Published by Incisive Business Media Limited, Haymarket House, 28-29 Haymarket, London SW1Y 4RX, are companies registered in England and Wales with company registration numbers 9177174 & 9178013

Digital publisher of the year 2010, 2013 & 2016

Digital publisher of the year 2010, 2013 & 2016