DESIGNER OF HOT GRAPHICS CHIPS Nvidia will pay a price for its folly with Fermi, as high performance computing (HPC) vendors are starting to look elsewhere for GPUs.
The INQUIRER can reveal that HPC icon Silicon Graphics International (SGI) has been looking at alternatives to Nvidia's Fermi line of graphics processing and floating-point computing GPGPU boards to offer customers in its servers. Presumably the Silicon Valley vendor, which is trying hard to rebuild itself from the ashes of two bankruptcies, wants to offer customers an energy efficient alternative.
SGI's senior director of server product marketing, Bill Mannel told The INQUIRER that he believes that Nvidia's rival chip design outfit AMD is "catching up very quickly" with its ATI brand of graphics chips. ATI's high end GPGPU cards are only barely nosed out by Nvidia's Fermi based Tesla boards in some applications. Mannel said he expects there to be "even performance capability" between AMD/ATI and Nvidia within the next 18 months.
When asked whether Nvidia's power hungry chip poses problems for the HPC vendor, Mannel said that incorporating Tesla boards in the firm's designs creates an "additional amount of work" and that the firm had to design new processes to test the Green Goblin's latest "hot cards".
The effect of these hot cards is stark, according to Mannel, who says that it leads to a "worse failure profile" in servers, meaning that vendors such as SGI have to spend more on design, manufacturing and maintenance. To accommodate Tesla boards Mannel said that SGI and its customers have had to "scale up the cooling infrastructure" to meet the higher ambient temperature demands in HPC data centres due to the additional heat output of the hardware.
When it comes to HPC and servers, cooling doesn't merely end with venting hot air out of the rack. Cooling the entire computer processing facility is just as vital to avoid a Fermi furnace and contributes significantly to HPC data centre costs. According to Mannel the heat challenges of the earlier generation of Tesla boards were so immense that SGI had to give up on cooling the previous generation of dual socket, dual Tesla designs and decided instead to focus on Fermi based Tesla boards. Perhaps unsurprisingly, Mannel and his team weren't pleased when Nvidia's latest and greatest Tesla cards tipped up.
Mannel said SGI had to forgo Nvidia's "P spec" passive cooling option in favour of the larger "S spec" cooling box. To underscore the formidable heat generation qualities of Fermi, Mannel said that SGI's HPC server cooling configurations vary greatly depending on how the Fermi based Tesla cards "are ganged together".
SGI like all vendors evaluates the hardware before selecting components and configurations it believes will be good fits for its products and customers. According to Mannel, when the firm first evaluated GPGPUs, Nvidia came out on top, but now things are looking rather different.
For SGI a number of options exist, however AMD is looking like the favourite to get the nod. AMD has recently bolstered its GPGPU effort by nabbing the engineer who has been labelled the brains behind Nvidia's HPC push, Manju Hegde, and is expected to renew its efforts to compete with Nvidia in the HPC arena.
SGI is unlikely to dump Nvidia completely, though. After all, the firm has spent millions and worked hard to incorporate the ill-conceived Fermi chips. However it clearly thinks that it needs alternatives so it can attract customers who don't want to rack up huge energy bills to run its servers.
For its part, Nvidia had done exceedingly well in the HPC area with Cuda, however thanks to Fermi it could lose the benefits of the hard work that Hegde put in before jumping ship. After the delays, the scaling back of performance and increases in power consumption and cooling needs, it's no surprise that SGI is looking elsewhere.
SGI might be just the first in a long line of HPC vendors and customers to grow tired of Nvidia's latest GPU chip design debacle. µ
How IT is being used to screw democracy around
But Brexit means the UK probably won't be affected
But Microsoft still denies culpability
With less than two months to go until it's official, we round up everything we know so far