Original Link: https://www.anandtech.com/show/1731
Athlon 64 Revision E: Unofficial DDR500 Support
by Anand Lal Shimpi on July 11, 2005 12:05 AM EST- Posted in
- CPUs
AMD’s 90nm Athlon 64s have been almost everything that the enthusiast community has wanted them to be. Being little more than a die shrink, the 90nm chips are cooler, can run faster, and are cheaper to make than their 130nm counterparts. But the improvements didn’t stop with the move to 90nm. More recently, AMD has released their Revision E 90nm Athlon 64 cores, which featured a number of small improvements.
One of the biggest improvements to Rev E on paper was the added support for SSE3 instructions, originally introduced on Intel’s 90nm Prescott based Pentium 4. When the Rev E cores had first arrived on the scene, we took a look at the performance improvements offered by SSE3 support and came up empty handed .
There were a number of other improvements made to the Rev E core, including an updated memory controller - boasting support for mismatched DIMM sizes per channel, improved memory access performance for integrated graphics cores and a few other performance tweaks that AMD hasn’t gone into much detail about.
One such barely mentioned improvement was support for a handful of new memory dividers. With an on-die memory controller, AMD has to be particularly careful about adopting new memory technologies, as the wrong choice could leave them with a bunch of CPUs that are basically un-sellable. Over a year ago, AMD had been talking about bringing support for faster than DDR400 speeds to the Athlon 64 - assuming JEDEC would ratify the specifications. AMD waited until the very latest possible moment to decide on whether DDR2 or a faster DDR1 memory controller would be in their future, which is why it took them until just a few months ago to really start talking about DDR2 support. Potentially as a backup plan, the Rev E chips include unofficial support for memory faster than DDR400, without overclocking the Hyper Transport bus.
AMD obviously didn’t speak much about support for these higher speed DRAM options, mainly because they are not official specs, and thus, AMD doesn’t officially support them. But, the fact of the matter is that many folks have faster-than-DDR400 memory, and the new Rev E CPUs can now take advantage of that.
The New Memory Speeds
There are a total of four new memory dividers unofficially supported by the Rev E Athlon 64s, but not all of them can be used by everyone. In order to understand why, you have to understand a bit about how memory speed is calculated by the Athlon 64's on-die memory controller.In a Pentium 4 system, the memory controller is located on the chipset, and derives its clock from the FSB frequency of the CPU through the use of a FSB:DRAM clock ratio. For example, with a 1:1 clock ratio, a 200MHz FSB clock would result in a 200MHz DRAM clock.
The Athlon 64 is a bit different, since it does not have a conventional FSB. So, instead of the memory clock being determined by a ratio of the FSB clock, it is determined by a few factors.
The basic equation is this:
DRAM Clock = CPU Clock / (ceil(CPU Clock Multiplier/Memory Divider))
Most of the elements of the equation are pretty obvious; the DRAM clock is the resulting memory frequency. Note that this is your non-DDR memory frequency. For example, if the DRAM Clock is 200MHz, we're talking about DDR400; if it is 166MHz, then we're talking about DDR333.The CPU Clock is the final CPU clock of your processor, which is made up of two components: the Hyper Transport clock and your currently selected clock multiplier. The HT clock is 200MHz by default, but can obviously be overclcoked. The CPU Clock Multiplier is set at the factory, but lower multipliers are unlocked for Athlon 64s, while all multipliers are unlocked for FX processors.
The Memory Divider is a ratio supported by the CPU's memory controller, and it is this set of ratios that has been expanded in the Rev E memory controller.
Finally, there's this "ceil()" function. The ceil() function is a pretty basic mathematical function that returns the smallest integer value greater than or equal to its argument (the number passed to the function in the parentheses). For example, ceil(5.5) = 6, and ceil(10.1) = 11. Pretty simple, right?
So, you plug in all of the variables of that equation, and solve, and you get your final DRAM clock.
You'll notice one very important thing about this equation: the DRAM clock is dependent on the Athlon 64's clock speed , which means that in order to achieve the same memory speed on all processors with differing clock speeds, the memory divider is going to have to, well, vary.
Prior to the Rev E CPUs, the Athlon 64's memory controller supported enough dividers to allow for DDR400 to be supported at all clock speeds; from 1.8GHz all the way up to the present-peak of 2.8GHz on the Athlon 64 FX-57. The Rev E CPUs support those same dividers, but add the following:
13/12, 7/6, 5/4 and 4/3
If you plug these ratios into the equation above, you can come up with a list of the new memory speeds unofficially supported by Rev E CPUs:CPU | Clock Speed | Memory Dividers | |||
13/12 | 7/6 | 5/4 | 4/3 | ||
AMD Athlon 64 FX-57 | 2.8GHz | 215MHz | 233MHz | 233MHz | 255MHz |
AMD Athlon 64 3800+ | 2.4GHz | 200MHz | 218MHz | 240MHz | 266MHz |
AMD Athlon 64 3500+ | 2.2GHz | 200MHz | 220MHz | 244MHz | 244MHz |
AMD Athlon 64 3200+ | 2.0GHz | 200MHz | 222MHz | 250MHz | 250MHz |
AMD Athlon 64 X2 4800/4600+ | 2.4GHz | 200MHz | 218MHz | 240MHz | 266MHz |
AMD Athlon 64 4200/4400+ | 2.2GHz | 200MHz | 220MHz | 244MHz | 244MHz |
The above table is comprised of all of the Socket-939 Venice (90nm Rev E) cores currently on the market, as well as the Athlon 64 X2 processors, which also support the new dividers.
You'll notice that not all of the dividers are useful, some resulting in the same old 200MHz DDR400 memory clocks while others offering duplicate speeds (e.g. the 7/6 and 5/4 dividers with the FX-57).
But at the same time, a number of them produce some very interesting, and potentially useful memory configurations without ever overclocking your CPU or the Hyper Transport bus. For example, at 233MHz, the Athlon 64 FX-57 can now run with unofficial DDR466 memory. And at 250MHz, the Athlon 64 3200+ can use DDR500 memory.
At DDR466, you get approximately 15% more memory bandwidth over a standard dual channel DDR400 configuration with an Athlon 64. At DDR500, you get a full 25% increase in memory bandwidth.
Historically, the Athlon 64 hasn't really been memory bandwidth bound, since the move to Socket-939, which gave it a full 128-bit wide memory bus, and more bandwidth than these CPUs could use.
With the move to dual core however, the effective memory bandwidth that each core gets is significantly reduced, as they both have to share the same 128-bit wide memory interface normally dedicated to a single processor. So in theory, the new dual core X2 line of processors could be a good candidate for these new memory dividers.
The other situation where higher clocked memory is important is with higher clock speed CPUs. The faster that your CPU clock gets, the quicker it can process data and thus, the faster that it needs information and the more memory bandwidth that it needs.
The lower clocked CPUs are less likely to see any real performance difference, with DDR400 being more than sufficient for their needs.
Enabling Support for the new Dividers
BIOS support for the new dividers must be enabled by your motherboard manufacturer. Apparently, AMD has been sharing the information with motherboard manufacturers if they ask, on how to implement it, but it is not given out by default.For our tests, we used DFI’s LANParty UT nForce4 Ultra-D, whose 704_22V6 BIOS dated 7/06/2005 supports the new dividers.
Not all of the configurations are supported by the LANParty UT nForce4 Ultra-D. Below is a table of what is supported by the board:
CPU | Clock Speed | Memory Dividers Supported by the DFI nF4 Ultra-D | |||
13/12 | 7/6 | 5/4 | 4/3 | ||
AMD Athlon 64 FX-57 | 2.8GHz | 215MHz | - | 233MHz | - |
AMD Athlon 64 3800+ | 2.4GHz | - | 218MHz | 240MHz | - |
AMD Athlon 64 3500+ | 2.2GHz | - | 220MHz | 244MHz | - |
AMD Athlon 64 3200+ | 2.0GHz | - | 222MHz | 250MHz | - |
AMD Athlon 64 X2 4800/4600+ | 2.4GHz | - | 218MHz | 240MHz | - |
AMD Athlon 64 4200/4400+ | 2.2GHz | - | 220MHz | 244MHz | - |
The dividers appear in the BIOS based on your selected clock speed, and appear as both an estimated DDR frequency as well as the actual memory divider that you are selecting. Because of the variation in actual DDR frequency, the estimated frequency is often wrong. For example, the FX-57 using the 5/4 divider results in a DDR466 speed, while the BIOS incorrectly refers to the 5/4 divider as enabling DDR500.
Note that not all software utilities support the new dividers, so applications like CPU-Z will not correctly report your memory frequency when using these dividers.
There are other motherboards with BIOS support for these new dividers, such as the ABIT Fatal1ty AN8-SLI with its latest BIOS. You can expect most enthusiast motherboard manufacturers to follow suit, if they haven’t already.
The Test
As we’ve already mentioned, we used the DFI LANParty UT nForce4 Ultra-D motherboard, with the 704 BIOS installed to enable support for the new memory dividers.Our memory of choice was the OCZ PC3500 Gold Edition GX, which can run at up to DDR500 at 2-2-2-5 timings at 3.3V. The beauty of this memory is that we can run at the same memory timings from DDR400 all the way up to DDR500, which is exactly what we wanted for this review.
We chose three CPUs to investigate the impacts of these new memory dividers: the Athlon 64 X2 4800+ (2.4GHz/1MB L2), the Athlon 64 X2 4200+ (2.2GHz/512KB L2) and the Athlon 64 FX-57 (2.8GHz/1MB L2).
We picked the X2 4800+ to see if the fastest dual core CPU can use the extra memory bandwidth. We chose the X2 4200+ to see if a reduction in L2 cache made the extra memory bandwidth more useful. And finally, we used the FX-57 to see if the highest stock clocked Athlon 64 processor could put the extra memory bandwidth to use.
Our usual CPU test suite was reduced significantly in order to weed out applications that would definitely not show any performance improvement. If you don’t see a particular test here that we’ve used in the past, it’s most likely because it showed an even smaller improvement than what we’ve seen here. This wasn’t done to make the new memory dividers look better, but rather to make the testing more manageable; once you see the results, you’ll understand why just focusing on this small sample is more than enough to get a good idea of how the performance will be impacted as a whole.
We used the latest nForce 6.53 and ForceWare 77.72 drivers for our test bed, and paired it with the newly released GeForce 7800 GTX.
High Speed Dual Core + New Memory Dividers
We’ll start off with the Athlon 64 X2 4800+.Featuring a 2.4GHz core clock, the 4800+ doesn’t necessarily meet our high clock speed requirement for needing a faster memory bus. Each core also features a 1MB L2 cache, which reduces its dependency on a higher speed memory bus. However, we are dealing with a dual core CPU here - which means that situations where both cores are being used are more likely to increase the chip’s memory bandwidth needs. Because of this, we’ll focus on improvements in multithreaded or multitasking environments, as well as looking at single threaded performance to measure the impact of the faster memory clocks.
According to our table of supported DDR frequencies by the DFI board, the 2.4GHz 4800+ gives us two options above DDR400 - mainly, 218MHz and 240MHz, or an unofficial DDR436 and DDR480, respectively.
Theoretical Memory Bandwidth Comparison
Just to make sure that these new dividers were actually doing something, we used the final 32-bit version of ScienceMark 2.0 to confirm that there were tangible increases in memory bandwidth:Memory Speed | ScienceMark 2.0 Memory Bandwidth (MB/s) | % Improvement over DDR400 |
DDR400 | 5378.08 | N/A |
DDR436 | 5495.33 | 2% |
DDR480 | 5851.52 | 9% |
With DDR436 offering only a 2% increase in peak theoretical memory bandwidth over DDR400, our only hopes for a performance increase are with the much higher bandwidth settings - i.e. DDR480.
Multimedia Content Creation Winstone 2004
Business applications barely made any use of the dual channel memory bus of Socket-939 CPUs, so we had no expectations to see any sort of performance boost from these new DDR speeds in tests like Business Winstone. Thus, we turn to Multimedia Content Creation Winstone 2004, whose Lightwave test is multithreaded and does take advantage of the X2’s dual core setup:Memory Speed | MMCC Winstone 2004 | % Improvement over DDR400 |
DDR400 | 41.9 | N/A |
DDR436 | 42.3 | 1% |
DDR480 | 42.7 | 2% |
The biggest performance difference that we see here is 2%, which is less than the 3% variation that we can see between test runs in this particular benchmark.
3D Rendering
3D rendering is another area where we see good use of dual core processors, but these tests also showed us a 0 - 1% increase in performance when comparing DDR480 to DDR400:Memory Speed | 3dsmax 6 - SPECapc Rendering Composite | % Improvement over DDR400 |
DDR400 | 2.78 | N/A |
DDR436 | 2.8 | 1% |
DDR480 | 2.8 | 1% |
Memory Speed | Cinebench 2003 | % Improvement over DDR400 |
DDR400 | 636 | N/A |
DDR436 | 639 | 0% |
DDR480 | 641 | 1% |
Even SPECviewperf 8 barely showed any performance increase (from 0 - 2%), and that suite of applications tends to be quite dependent on memory performance.
Video Encoding
DivX and Windows Media encoding tests have always been very memory bandwidth sensitive. Let’s take a look at the impact of the new memory dividers there:Memory Speed | DivX 6 + AutoGK | % Improvement over DDR400 |
DDR400 | 50.6 | N/A |
DDR436 | 51.3 | 1% |
DDR480 | 53.2 | 5% |
With a 5% improvement in performance, DivX 6 gives us the first indication of any truly tangible performance increases due to the higher DDR speeds unofficially supported by the new chips.
Memory Speed | Windows Media Encoder 9 (fps) | % Improvement over DDR400 |
DDR400 | 4.22 | N/A |
DDR436 | 4.24 | 0% |
DDR480 | 4.28 | 1% |
The same success isn’t seen in our WME test, with a 0 and 1% increase in performance at DDR436 and DDR480, respectively.
Gaming
Doom 3 is also a very good measure of the impact of memory bandwidth, as are most other 3D games:Memory Speed | Doom 3 (1024 x 768 fps) | % Improvement over DDR400 |
DDR400 | 121.9 | N/A |
DDR436 | 124.3 | 2% |
DDR480 | 127.2 | 4% |
Finally we see another situation where there’s a positive impact in memory performance. Here, DDR480 gives the X2 a 4% increase in frame rate at 1024 x 768. However, cranking the resolution up to 1600 x 1200 cuts that improvement down to 1%. The usefulness of the 10x7 numbers is in simulating situations where you are less GPU bound.
Overall, we’d say that there’s not that big of an improvement from using DDR480 with the Athlon 64 X2 4800+. The biggest performance boosts will occur in video encoding and games where you are not GPU bound, and even then, you should expect an increase in the 3% - 5% range.
Low End Dual Core + New Memory Dividers
With the X2 4800+, we saw that DDR436 offered basically no performance improvement, while DDR480 was a little more useful, bringing us anywhere between 1% and 5% of a performance improvement in our selection of tests. Given that the X2 4200+ has half the cache, its dependence on a faster memory bus should go up. But counteracting that relationship is the fact that the 4200+ runs 200MHz slower than the 4800+.According to our table of supported DDR frequencies by the DFI board, the 2.2GHz 4200+ gives us two options above DDR400 - mainly, 220MHz and 244MHz, or an unofficial DDR440 and DDR488, respectively.
Multimedia Content Creation Winstone 2004
We start off with MCC Winstone 2004 again:Memory Speed | MMCC Winstone 2004 | % Improvement over DDR400 |
DDR400 | 38.9 | N/A |
DDR440 | 39.1 | 1% |
DDR488 | 39.4 | 1% |
3D Rendering
3D rendering is another area where we see good use of dual core processors, but these tests also showed us a 0 - 1% increase in performance when comparing DDR480 to DDR400:Memory Speed | 3dsmax 6 - SPECapc Rendering Composite | % Improvement over DDR400 |
DDR400 | 2.54 | N/A |
DDR440 | 2.54 | 0% |
DDR488 | 2.54 | 0% |
Memory Speed | Cinebench 2003 | % Improvement over DDR400 |
DDR400 | 584 | N/A |
DDR440 | 586 | 0% |
DDR488 | 587 | 1% |
Despite the decrease in cache size, the faster memory bus didn’t do anything more for the X2 4200+.
Video Encoding
Memory Speed | DivX 6 + AutoGK | % Improvement over DDR400 |
DDR400 | 47.3 | N/A |
DDR440 | 47.9 | 1% |
DDR488 | 48.5 | 3% |
Here, DDR488 only gives us a 3% bump in performance. Nothing to write home about, but if your memory can support it, you might as well enable it.
Memory Speed | Windows Media Encoder 9 (fps) | % Improvement over DDR400 |
DDR400 | 3.88 | N/A |
DDR440 | 3.91 | 1% |
DDR488 | 3.92 | 1% |
Gaming
Memory Speed | Doom 3 (1024 x 768 fps) | % Improvement over DDR400 |
DDR400 | 108 | N/A |
DDR440 | 111.2 | 3% |
DDR488 | 113.6 | 5% |
In Doom 3, the X2 4200+ gets slightly more of a performance boost than the 4800+. But despite our theories, it seems that the X2 4200+ doesn’t really get any more of a performance boost than what the 4800+ did.
Just for curiosity's sake, we performed a DVDShrink encode while running the Doom 3 test, to see how two relatively memory bandwidth intensive tasks running simultaneously changed the picture, if at all. Note that alone, DVDShrink saw no performance boost due to DDR488 over DDR400:
Memory Speed | Doom 3 (1024 x 768 fps) w/ DVDShrink running | % Improvement over DDR400 |
DDR400 | 102.5 | N/A |
DDR488 | 109.2 | 6.5% |
The performance boost in Doom 3 in this scenario went up another 1.5%, to 6.5% for DDR488 over DDR400. It wasn’t a huge jump, but once you start getting into those heavy usage scenarios, then the faster memory speeds make a lot of sense for the dual core Athlon 64 X2s.
Other lighter multitasking scenarios offered no real difference in performance for the X2.
Memory Speed | DVD Shrink (Time in Mins) w/ Firefox & iTunes Running | % Improvement over DDR400 |
DDR400 | 9.9 | N/A |
DDR488 | 9.9 | 0% |
Single Core + New Memory Dividers
Next, we move on to the single core Athlon 64 FX-57 running at 2.8GHz. The FX-57’s high clock speed should increase its dependency on a faster memory bus, but by how much?The DFI board only offers support for 215MHz DDR430 and 233MHz DDR466 at 2.8GHz; given the lackluster improvement that we’ve seen from ~DDR433, we decided to only focus on DDR466 performance.
Multimedia Content Creation Winstone 2004
Memory Speed | MMCC Winstone 2004 | % Improvement over DDR400 |
DDR400 | 39.8 | N/A |
DDR466 | 39.8 | 0% |
3D Rendering
Memory Speed | 3dsmax 6 - SPECapc Rendering Composite | % Improvement over DDR400 |
DDR400 | 1.79 | N/A |
DDR466 | 1.8 | 1% |
Memory Speed | Cinebench 2003 | % Improvement over DDR400 |
DDR400 | 395 | N/A |
DDR466 | 396 | 0% |
Video Encoding
Memory Speed | DivX 6 + AutoGK | % Improvement over DDR400 |
DDR400 | 40.6 | N/A |
DDR466 | 41.7 | 3% |
Memory Speed | Windows Media Encoder 9 (fps) | % Improvement over DDR400 |
DDR400 | 2.61 | N/A |
DDR466 | 2.63 | 1% |
Gaming
Memory Speed | Doom 3 (1024 x 768 fps) | % Improvement over DDR400 |
DDR400 | 137.5 | N/A |
DDR466 | 141.9 | 3% |
Despite the high clock speed, the FX-57 doesn’t benefit any more from DDR466 than the dual core chips had seen with DDR480/488.
Final Words
Based on the tests that we’ve seen here today, AMD’s reluctance to move to higher bandwidth DDR2 offerings makes a lot more sense. The plain fact of the matter is that at the current clock speeds at which the Athlon 64 and X2 line are running, most desktop applications see virtually no benefit from higher bandwidth memory. It is possible that server usage models may show a greater performance boost, but it is highly unlikely for a mission critical server to be equipped with anything that isn’t an officially supported standard - especially memory.While some have been critical of AMD’s unwillingness to embrace DDR2 when Intel did, it would appear that the quest for more bandwidth simply wasn’t in AMD’s best interests. These Athlon 64 and X2 cores that we have here today are far better suited for use with low latency and lower priced DDR400 than anything that offers higher bandwidth.
Down the road, as CPU speeds and the sheer number of cores goes up, then higher bandwidth memories will definitely make much more sense. But for now, for the majority of the population, these new memory dividers won’t do much for you.
The performance improvements themselves aren’t tangible, but if you are trying to squeeze every last ounce of performance out of your system, then these new memory dividers offer you one more avenue to do so. If you have memory that can run at higher than DDR400 speeds without any reduction in latency, then by all means, explore the new dividers; just don’t expect them to change your life.
The one exception to the rule seems to be heavy multitasking scenarios. As we saw from our simple DVDShrink + Doom 3 test, when you run two very memory bandwidth dependent applications on a dual core processor at the same time, the benefits of these faster memory speeds really starts to show itself. We measured a 6.5% increase in performance in the aforementioned test, but next to no performance improvement in other lighter multitasking scenarios. As we continue to develop our multitasking benchmark suites, we will now start looking at how added memory bandwidth, made possible through these new dividers, changes the performance picture.