Now, things look a little better bandwidth-wise, with DDR3 routinely providing throughput at around half the CPU frequency - a 3 GHz Core 2 Extreme may be fed by DDR3-1500 memory, for instance.
However, despite the bandwidth improvements, it is memory latency, the time it takes to write or read the first word in a transfer in particular, that had advanced at a snail's pace all this time. And, it can impact the performance, despite all the caching, streaming and other "optimisations".
So, to make the best out of it, you should be aware of at least the basics of latency settings on a typical PC BIOS, and the performance effects, at least in the memory benchmarks.
Memory latency itself - the delays imposed every time a CPU or another requestor reads or writes something in memory - is so high that usually it's not counted in CPU clocks, but much longer memory bus clocks. Only some benchmarks like Sandra will show you exact CPU clock cycle latency penalty, where numbers can run into high tens easily.
However, that is actually the combined latency of the memory controller (whether in chipset or in the CPU) and the memory itself - usually, if all things same, it is somewhat lower on an AMD Athlon64 then on an Intel CPU, and lower on, say, Nforce 680i than P35 chipset, for instance. In either case, a few memory bus clocks latency difference equals many more CPU bus clocks, so there is a performance impact. Since every memory chip is set up as a set of banks, each a matrix of rows and columns, you need to choose the right bank, the right row and the right column, and there you are - at the right cell, the data you wanted.
The main four critical parameters related to this process are stated on each DIMM by memory module vendors as four numbers, say, 3-3-3-5 for a high-speed DDR2 or 9-9-9-20 for a slow DDR3 module, for instance. The meaning of each number is as follows:
tCL = CAS (Column Address Strobe) Latency
The most important parameter of all, CAS Latency (CL) would be the clock cycle number from the moment memory controller requests the memory to access a particular column in the current row, and the data from that column is finally read from the memory.
tRCD = RAS to CAS Delay, or delay between row and column access
Amount of clocks that passes between a Row Address Strobe (RAS) and a CAS in the memory bank. It reflects the delay between the computer defining the row and column of the given memory block and that particular read or write being done.
tRP = RAS Precharge
A.k.a. Row precharge time - how long is required to close an open row of memory, and open the next row.
tRAS = Active to Precharge delay
The least impacting on the performance, among the "big four" parameters, number of clock cycles for access to a specific memory data row between the data request and the precharge command.
OK, enough of textbook stuff now - what's the measurable impact of these? In this quick latency test, I put together a simple configuration, based on Intel QX6850 quad-core CPU running at 3.33 GHz FSB1333, set on the reference, Asus Striker Extreme Nforce mainboard with a pair of OCZ 1GB FlexXLC DDR2-1150 modules - high-end monsters with in-built optional Xtreme Liquid Convection water cooling connections. These didn't get hot even when pushed to 2.4 volts - but in this case, they only ran at a modest 2 volts and in-sync DDR2-667 setting.
So, I kept the FSB and memory frequency fixed, but changed the individual latency parameters. I started from the best one, 3-3-3-5, and moved it to 4-4-4-6. I ran both Sandra XI SP4a, and Passmark 6.1 Memory test, both under WinXP 32 SP2. I also ran the series with slow 4-4-4-6 2T command settings, as well as same latency but higher bandwidth of DDR2-889.
Here is the table:
Don't forget the impact of "command rate" setting, which can be either every clock (1T) or every other clock (2T). Every recent DDR2 DIMM should be able to handle 1T timing at up to DDR2-800 speed at least when there are just two DIMMs - if you put in four of them, it gets a bit more tricky.
As you can see, the impact varies greatly depending on which parameters are changed - there is very little difference between 3-3-3-5 and 3-3-4-5 or 3-3-3-6, but a lot of drop from 3-3-3-5 to 4-3-3-5. Then, further massive drop with 2T slow command rate, and then some speedup with 4-4-4-6, but at DDR2-889.
Also, as you can see in Passmark, some benchmark are latency sensitive, some others are bandwidth sensitive, while a few just don't care. The point is, if you are willing to fiddle with your RAM latency settings, and keep in mind there are another 5 to 6 secondary latency parameters to mess with, you could possibly extract more performance even without going to obnoxious MHz bandwidths and associated killer voltages. As for those other settings, it is a topic for another story. µ
Sign up for INQbot – a weekly roundup of the best from the INQ