The Inquirer-Home

Are Dell and Intel pushing for looser PCI Express specs?

The strange case of compliance and complaisance
Wed Sep 29 2004, 19:07
FOR THE PAST FEW MONTHS there have been rumours flying around the usual channels about ATI and PCI-Express compliance. Those rumours have been everything from the doomsayers with their 'ATI is fscked' statements to people saying it is nothing to worry about. After weeks of research and phone calls, I think I have found the truth, and it lies in the middle, with a ton of caveats.

First the rumours. ATI has two problems, a Common Mode Voltage (CM) and Jitter. The other thing is that months after introduction of the parts, they are not on the PCI SIG's compliance list for PCIe parts here. Each of these is a proverbial can of worms in itself, and the FUD is flying thick.

Lets address the FUD first. About the time this article was going to be published, last Friday, there was a flood of attacks on source material from ATI, and denial of the contents of the said documents. You can see some of it here and here.

Timing aside, the documents listed in the stories, list a number of documents, only one of which sound like the ones I based this story on, and that is a document from Gainward. The others seemed to relate to something else.

The one document that I've seen only served to back up my conclusions, not to form them. Additionally, none of the documents, to my knowledge, were prepared by Nvidia, as suggested in the second link.

Finally, the only set of 'hard' evidence that was presented to me, the eye diagrams in the Gainward document were backed up by an independent source that confirmed these findings. Overall, FUD aside, I feel very confident in the facts and figures I am presenting here no matter what the different spins from vendors are.

PCI SIG
Over a month ago, I was asked about the listing of ATI cards on the PCI SIG's compliance list. A quick phone call to the PCI SIG asking about ATI cleared everything up, they said, as we reported here that ATI was indeed compliant, and the only hold up was paperwork.

A month later, there is still one ATI card listed, and 22 NVidia ones. During the course of my calls to ATI over the last week, it became clear that the cards were not ever going to pass based on the plug fests of the past, but there was another plugfest where their cards were performing quite nicely, and based on that, they may well pass. So much for paperwork delays.

It was also said that there were a very small number of ATI engineers at the June plug fest, and part of the reason that only one card was certified was because of the inability to test as many cards as needed at plug fest #39.

There's a lot of finger pointing going on. The first finger pointed at a CM problem. What this means is that the signal strength of the PCIe transmitter is not as strong as some would like. The weak signal can lead to correctable errors on the low end, or if there are enough, of them, it can cause the PCIe bus to fail. The bus going down would most likely result in a non-posting situation.

One of the documents I saw was about the RV370 chip, more commonly known as the X300. It contained design guides for ATI reference boards. It has two columns labeled 'Design Kits for ASICs without CM issue' and 'Design Kits for ASICs with CM issue'. That pretty definitively shows that there is a problem, and it is known about enough to made a board level work around. There are four kits for 'non-issue' chips, and two for 'issue' chips. The columns further list compatible ASICs and part numbers, some of which have the same part number, but a different bin number. This tells me that there is a test for good/not good chips.

To borrow pictures from the Gainward doc, here is what the eye diagram looks like for an X300 purportedly purchased at retail.

Ati-x300-purchased-at-retail-eye-diagram

For the oscilloscope impaired pic, a little explanation is needed. When you see a wave on an oscilloscope, several things need to be considered. First is the wavelength. This is the horizontal size of a wave, measured from peak to peak. The more important measure is the amplitude, or the height of the wave. Both of these properties define the blue bands, how high and wide they are.

The red diamond in the middle is the area that defines compliance. If you can keep your signal from touching the red part, you are complaint. Since wavelength is tunable fairly easily, you rarely see the left and right corner being overrun. The top and bottom are the problem areas.

In this case, they are overrun fairly badly, pretty much ruling out compliance with the SIG. If you can increase your transmission power, you push the waves above and below the relevant keep out areas, and you get compliance. The further out you go, the more robust your solution is.

Out of fairness, the Gainward document shows an OEM X300 eye that is much better, but still overrun on the top, which could suggest failing compliance. The same doc has an Nvidia 5750 and a 6600, both passing with wide margins. While I don't know what batches of chips were tested, I would assume that the worst of the chips, the 'retail' one corresponds to the initial spin of the chip, and better one is the re-spin of the core that ATI did.

If this speculation is correct, it suggests that ATI won't pass PCIe compliance even with the new core. Once again, these numbers were independently verified, so I am confident of their veracity.

In ATI's defence however, reality intrudes in its favours. If things were as bad as the raw numbers show, there would be a lot of Dell machines that don't boot all that well, but in fact not only do they boot fine, but they work quite well. Compliance does not mean functionality, but lack does complicate things.

The next problem is jitter. Jitter can be a big problem with compliance because, to dumb things down again, it makes the blue line thicker. Here is another chunk of the Gainward doc, with the top being a GeForce 6600 and the bottom being a Radeon X700.

Geforce-6600-eye-diagram

Ati-x700-eye-diagram

If you look at the Nvidia diagram the blue lines are well out of the red box, and they are comparatively thin. The ATI chart has a thicker line that hits the top and bottom of the red box. If jitter was lower, and the band was thinner, ATI probably would not touch the keep out area, and would therefore be compliant.

This is a different chip, and a much newer one than the X300, X600 and X800 lines. This one may not have had a chance to try for PCI SIG compliance, so don't read much into its absence from the compliance list.

According to ATI press releases, it shipped over a million PCIe parts (See : http://www.ati.com/companyinfo/press/2004/4771.html), and that was over a month ago. There are a lot of cards out there in the wild now, and if there is a problem, it should have shown up by now.

The reason it hasn't is that ATI is being very good about working with OEMs, board and chipset vendors to make sure its products work with everything else out there. Talking to several board vendors, they confirmed that ATI is indeed working their proverbial asses off getting things to work correctly.

They tell me, as does ATI, that this is a new chip, a new bus, a new chipset, and a new everything. Problems are expected, in fact if there were none, it would be a story in itself. They also confirmed that the boards, all Intel chipset based as of now, had to have a lot of BIOS work done to fix issues. Once again, I don't have a clue whether it is the new bus problems, or ATI chip problems, but lots of hair pulling is claimed by everyone.

Work arounds
The intonation is that once "less robust chipsets" come out - that is PR code for non-Intel, there will be problems aplenty. The names commonly listed are Via, SIS Nvidia and ULi. From past experience, and the fact that mighty Intel needed lots of fixes, this looked like a very valid concern.

Having contacted some of the board and chip makers last week, they all say that there is nothing that can't be worked around with the ATI chips as far as making them work with their chipsets and boards. There may be problems, but they are solvable problems, each type of board may need an individual fix. With ATI working with them, they don't expect failure in the field.

But are the boards ATI shipped PCIe compliant? They are all called PCIe, but a quick check with the PCI SIG told me that to get the name PCIe on a product, all the company that makes it needs to do is to be a SIG member. If ATI came out with a brand of Yo-Yos, and NVidia a new type of avocado chocolate gum, they both could be called PCIe Yo-Yos and avocado chocolate gum. If you plug them into your new i915 or i925 based computer, it won't do very much, but it will be PCIe. As they say, membership has its privileges.

Now, if you want it to actually work with the machine, it needs to be PCIe compliant, and there is the rub. From paperwork to not enough engineers to 'soon' to core re-spins, it appears that ATI has a problem with compliance. The problem appears to be the on-die PCIe interface which is more or less common to all the modern ATI cores.

If ATI makes a change, re-spins the chips, and all is well, good for them. ATI worked with chipset vendors to get workarounds in all current boards, and those coming up soon. No problem here, in fact I applaud ATI for going through the effort to do this.

My reservations are what happens when the next gen chips come out, real soon by many accounts. Will board vendors care in mid-2005 that last year's X800 doesn't work now? Their boards are compliant, why should they care about things that are yesterday's news? I think this is where the problem is.

Buy a non-compliant board, and it is just that, non-compliant. In a year, all those $500 boards just may not work in your new upgraded Cedar Mill rig, and whose fault will that be? Can you blame the chipset vendor for putting out a 100% spec compliant chip? Can you yell at a vendor for not putting in a BIOS workaround for something that is no longer on sale, and doesn't work like it is supposed to anyway?

If you look at the economics, you can hardly blame ATI for not ripping and replacing everything. There is already a shortage of PCIe chips, and any delay would be very very bad for the consumer, I already have reports of people not being able to get a card to plug into their new LGA775 boards.

Additionally, re-spinning a core is really expensive, masks can cost hundreds of thousands of dollars each, and you need a bunch. Even if the cores were not about to be superseded with the upcoming fall refresh, it may simply not make sense. Pouring millions of dollars down this hole is borderline insanity.

Back to the rumour mill. One of the persistent ones involves Dell and Intel. Both companies are heavily invested in PCIe technology, and have a lot to lose if there are problems. For both, it is the future, and there is no plan B right now. They are also the ones who may take a large financial hit if there are problems involving stopping shipments or, worse yet, recalls.

What do you do if the chips won't be changed, and there are problems with leaving them as they are? Make it someone else's problem, the old governmental responsibility shuffle. For weeks I have been hearing that both companies are applying enormous pressure on the PCI SIG to change the spec. The cards won't come to the spec, time for the spec to come to the cards.

Jitters, critters
Now, if you think I am mad, lets step sideways a bit. When I was at IDF a few weeks ago, I talked to the PCIe SIG ( here). All went well, and one thing stuck out in my mind, but for the life of me, at the time, I didn't know why. I asked what the difference was between v1.0a and v1.1 was. They said it was cleaning up some loose ends, nothing major, and loosening up the timings and jitter. A little alarm went off in my head, but only a little one. Why would you loosen up a spec that is already in place, and has hundreds of parts that work with it?

If current technology can clear the hurdles set, why lower them? Answer: ATI. If you can't fix the cards, make the chipset vendors work harder so that the next generation won't have problems with the current bad crop of cards. It is ingenious, cunning, crafty, and somewhat evil, in a begrudging admiration sort of way.

So that is the cunning plan. Twist arms through proxies behind the scenes, and brush it all under the rug. If anyone asks, there is no problem, won't be any problems, and let's hope to god they don't read the specs closely. It took over two weeks of digging, but it finally all comes together.

This whole episode leaves a bad taste in my mouth though. Whenever companies refuse to admit there is a problem, it bugs me. If you want people who buy your products not to have problems, inform them. Trumpeting that you have PCIe parts shipping, and then not pointing out that they don't meet the very spec screamed in the headlines of your press release is disingenuous at best.

As far as problems in the industry go though, this is by no means a major one. It may cause a bunch of headaches next year, and board makers are already probably having the proverbial hissy fits, but that is all behind the scenes. In a year, when someone asks you why the X800 they bought last summer won't post in their new board, but works fine in the old one, you know what to ask Intel and Dell. µ

Share this:

Comments

There are no comments submitted yet. Do you have an interesting opinion? Then be the first to post a comment.

aboutus
Advertisement
Subscribe to INQ newsletters
Advertisement
INQ Poll

Authorities in several countries raided Megaupload recently, shut down all of its services, seized hundreds of servers and arrested several of its executives on criminal charges.

Do you think the move was justified?