The Inquirer-Home

AMD Quad FX shows off its true colours

Part 2: The play is the thing
Thu Nov 30 2006, 22:29
AMD Quad FX is finally here, and the easy questions about what it is have been answered. The tricky questions are why you would want one, and what for.

These machines are expensive to say the least, with pimped out models breaking $3,000 without trying very hard. You can get low end FX-70s for $599, and when cheaper mobos come out, a machine for $1,500 is do-able, but that is still pretty expensive. This leaves the Quad FX (QFX) parts squarely in the high end of the enthusiast market.

The other quad core part on the market that competes for the same enthusiast dollars is Kentsfield, the Intel 1S 4C part. It is priced the same as QFX and is quite comparable in overall system to system cost. That would pitch them against each other in the market, right?

Not really, they are both aimed at different workloads, and due to the architectures, will have very different strengths and weaknesses. Other than four cores and lots of dollars, they have fairly little architecturally in common. Kentsfield is the gamer box and QFX is the multitasking workstation, but both will easily trounce a Best Buy special with integrated graphics.

Neither of these are bad boxes in any way, they both are just much better in certain areas than others. It is a case of shades of goodness. This is dictated by the overarching architecture much more than any single part in the system.

If you look at the AMD system diagram for the QFX you will see a lot of dual links. There are two CPU sockets, two paths to memory, and two independent paths to the chipsets. Most of the components are the same as in a Kentsfield box, the RAM and GPUs are common as are all the peripherals. How they are wired up is not however.

Qfx-system-diagram

Kentsfield is far more of a traditional architecture, the two dice on a carrier being about the only odd part. Instead of two sockets, it gets to the four core count by having two Conroes under the heat spreader, but they are connected to one FSB, usually at 1333MHz. There is also one memory controller, and as far as I have seen, only the capability to have one north bridge.

This brings up an interesting set of tradeoffs from a system point of view. I don't think anyone would argue that the core Core 2^2 Duo Squared chip is slower than the FX-7x core for core. In fact, for single threaded apps it will win the vast majority of the benchmarks. This means for games, most of which are single threaded or barely multithreaded, Kentsfield is your choice.

The IPC, high clock rates and massive caches basically make it a win for Intel if the games are CPU bound. If they are bound by something other than the CPU, like a GPU, IO, or memory, that is where AMD QFX will start to show it's strengths. To get to an Intel CPU, all traffic, memory, GPU, IO and semaphores from passing ships must all travel down the same FSB.

The 1333FSB is no slouch, and it is more than enough to feed two CPUs and a high end graphics card. Toss in a second GPU, high end sound, an array of 10K Raptor HDs and things get a little stuffy. Throw in two more cores and run a program that utilizes them fully and you are into, well, contentious territory, pun intended.

Basically, the FSB ends up being a bottleneck, but it is a situationally dependent bottleneck. It may happen, or it may not. In the case of AMD, it has twice the memory bandwidth, twice the bandwidth to the NB(s), and the memory bandwidth is largely independent of the IO traffic. On a heavily loaded machine, AMD can utilize more of the available CPU power.

Intel makes up for this with large caches, and does a very good job of it, but to an extent, it is putting bandaids on the problem rather than fixing it. The fix will be coming with Nehalem in Q4/08, and it is called CSI. Until then, AMD rules in the platform bandwidth arena.

So, AMD wins every time, right? Not really. There is one other thing to consider, and that is latency. Intel has one path, the FSB, to the single memory controller. This gives a pretty predictable time to memory, and latency has a very small variance. The Opterons, and in this case the QFX boards have two controllers connected by a HT link. If CPUs 1 or 2 need a byte that is located in their local memory, they go get it. If they need memory located on the other socket, they have to go out across HT and incur a latency penalty. To make matters worse, every time a CPU on socket 1 needs to snoop the cache, it needs to snoop the caches on socket 2. This again adds a lot of latency, you can see it illustrated clearly here .

So Intel wins every time, right? Again, not really because AMD has twice the memory bandwidth to utilize. Remember when I said there were a lot of tricky questions, and a lot of long answers? Well, this is why, any question of who wins what depends on a list of things. CPU power, IO, disks, memory latency, memory bandwidth and sunspots can all add up to speed or crippling failure.

If your application plays to the strengths of Intel, well Kentsfield will absolutely clobber AMD. Games and older single threaded apps are good examples of this, and they will be the predominant type of software for much of 2007. If you have things that need heavy memory access, FP laden work is a good example, or your games actually utilize multiple cores effectively, well AMD will trounce Intel. It all boils down to what software do you use and how do you use it?

Most people compare the QFX machines to Kentsfield as simple gaming boxes, and this is wrong. You can do it, and there is nothing technically incorrect, but that is not what AMD has been promising for this machine. People have been assuming since it was first announced that it would be the killer gaming rig, but that is simply not the case.

Having been at the initial coming out party over the summer, AMD was very clear that QFX was about doing more at the same time on a single box, they use the term megatasking. Basically, the AMD architecture is much better suited to doing many different tasks at once. There is no single bottleneck to force all the data through, so one core can utilize many more parts of the system without interfering with the other cores.

The down side to all of this is they lack the peak single threaded horsepower of a Kentsfield core, and will lag on apps that don't need all that bandwidth. Basically take your pick of what you are going to run, a single game, or a few instances of an MMO, MP3s in the background, and maybe a game server. That more than anything will determine what you should buy.

A good way to illustrate this platform level bandwidth is with some pretty simple synthetic benchmarks, Sisoft Sandra and Sciencemark. Because of the way XP handles NUMA, basically badly, Vista RC2 was the platform of choice. The rest of the system was and FX-74, 4 GB of Corsair Dominator memory and a WD 500GB hard drive. The GPUs were Nvidia 7900s of one flavor or other, but that is totally irrelevant to synthetic CPU and memory benches.

I was looking to compare the QFX not against a Kentsfield, or an older FX, but to itself without the second CPU. How much does adding the second CPU help or hurt the system as a whole? In theory, the second CPU should add to bandwidth and latency, the former being a positive, the latter a negative.

The first set of numbers are from Sisoft Sandra 64 bit edition running under Vista RC2. You can see the red lines below show the synthetic memory bandwidth, the top being Int and the bottom FP. The middle bars are comparison chips that are not comparable between single and dual runs, ignore them.

The top set is with only a single FX-74 in the system, and the bottom has two sockets. As you can clearly see, the available bandwidth went from 8209/8235 to 14416/14403 about a 76% increase. That is pretty good scaling all told, and it shows that if you want memory bandwidth, the second FX chip is one heck of a kick in the pants.

Sandra-bandwidth-single

Sandra-bandwidth-dual

That kick is quickly followed by you falling on the floor with a thud. This thud is the latency hit as illustrated by Sciencemark. Due to the Sciencemark site being down during testing (Update: Use the German site and download from a European mirror), I had to use an older 32 bit version, but it still shows off the latency penalty clearly.

The top numbers are the single socket FX-74, the bottom are with the second socket occupied. In the best case, the latency is unchanged, but four out of the five tests all show a penalty ranging from 3 clock cycles to a high of 46. This is almost a 25% speed hit, with lower number being better of course.

Sciencemark2-32-single-latency

Sciencemark2-32-dual-latency

What it all comes down to is that the added socket comes with a lot of benefits, the extra cores, the added bandwidth, and some problems, mainly latency. Depending on your app, it will either be totally unnoticeable, a huge penalty, or somewhere in between. Most will be in the in between, and I would take an educated guess that it will be a pretty big overall benefit.

Where it won't help much is single threaded games. For this, Core 3-1 Trio-1 will almost always win on sheer horsepower. Going back to where we started though, this is exactly what AMD said it would be. You have a machine that does more things better at the same time.

The way I look at it, AMD is in a slightly worse position right now when you line up QFX against Kentsfield. Most games are only single threaded and the prevalent OS, XP, is pretty bad at NUMA optimizations and passing things between cores. If you have a static workload that won't change for the life of the box, get a Kentsfield.

If you plan on running Vista, using the computer as a day to day workstation with an MMO running when the boss^h^h^h^hwife isn't watching, then QFX starts to look better and better. The more you do, the more it shines. Right now, I have 19 windows open and 10 things running in the systray with two high rez monitors. None are extremely intensive apps, but a few cause the system to chug now and again.

With my workload, QFX is probably the better choice, but seeing as this machine is an Athlon 64/3000+ on a Socket 754 machine with 1G of ram (Specifically it is a bright yellow Monarch Hornet that I like a lot), almost anything would be a huge step up. It comes down to relative performance, and levels of strength, not any specific weakness.

Looking out however, the picture tilts more in AMD's favour. Games are becoming more multi-threaded, it may take months or years, but the next generation engines will make vastly better use of all four cores. Vista is going to come out, it sort of already is, and it will use more cores more efficiently. CPU power is rising, and the more you can do, the more your infrastructure has to support the added load.

This means that workloads are going to go from peaky ones that favor Intel to more even and balanced ones that favor AMD. The average workload is going from the Intel worldview where it is now to the AMD one. If you also take into account the mid-2007 upgrade path to 2 quad cores, things look even better for QFX.

How fast these changes happen will determine the winner. Some things are set, Vista being the big one there, others like game performance is a little more nebulous. No matter how it ends up the trend is pretty clear, the only variable is speed. µ

 

Share this:

blog comments powered by Disqus
Advertisement
Subscribe to INQ newsletters

Sign up for INQbot – a weekly roundup of the best from the INQ

Advertisement
INQ Poll

Heartbleed bug discovered in OpenSSL

Have you reacted to Heartbleed?