Insights into DDR5 Sub-timings and Latenciesby Dr. Ian Cutress on October 6, 2020 11:00 AM EST
Today we posted a news article about SK hynix’s new DDR5 memory modules for customers – 64 GB registered modules running at DDR5-4800, aimed at the preview systems that the big hyperscalers start playing with 12-18 months before anyone else gets access to them. It is interesting to note that SK Hynix did not publish any sub-timing information about these modules, and as we look through the announcements made by the major memory manufacturers, one common theme has been a lack of detail about sub-timings. Today can present information across the full range of DDR5 specifications.
When discussing memory, there are a few metrics to consider:
- Type, e.g. DDR4, DDR5
- Power consumption / voltage
When building a platform, a number of these factors all come into play – a system that implements oil and gas simulations might require terabytes of memory, regardless of power, of for smaller installations price might be the major concern. For specialist applications, persistent memory might be a focus, or a combination of bandwidth/latency will be key to driving performance.
In order for all these companies that build memory and systems to work together, a set of standards are developed by a consortium of all interested parties – this is called JEDEC. JEDEC creates the standards to ensure support for all compliant systems.
Users who are familiar with JEDEC specifications will note that consumer grade memory is often specified faster than what JEDEC lists – this is a feature in which processors that can support faster memory, when paired with memory qualified to be faster than JEDEC, can be paired together. This is why we see memory kits all the way up to DDR4-5000 in the market today that only work with a few select systems.
Read AnandTech’s Corsair DDR4-5000 Vengeance LPX Review
Super-Binned, Super Exclusive
For DDR4, JEDEC supports standards ranging from DDR4-1600 up to DDR4-3200. From the data rate, a peak transfer rate can be calculated (12.8 GB/s per channel for DDR4-1600, 25.6 GB/s per channel for DDR4-3200), however the latency requires additional information. The typical sub-timings offered with memory are:
- CAS: Column Address Strobe: the time between sending a column address and the response
- tRCD: Row to Column Delay: clock cycles to load a column when new row is opened
- tRP: Row Precharge Time: clock cycles to load data when wrong row is open
- tRAS: Row Active Time: minimum time between row active and precharge
These are typically reported as CAS-tRCD-tRP with tRAS sometimes added on. This means that in JEDEC’s DDR4 specification, the base DDR4-3200 metric allows for a 24-24-24 set of sub-timings. For latency calculations, we need both the data rate (3200 MT/s) and the CAS (24 clocks) to calculate the CAS in terms of nanoseconds, the real world latency (in this case, 15 nanoseconds).
The combination of data rate and CAS Latency has been used to compare single access latency numbers for memory over the years. Moving from the early iterations of DRAM, both data access rates and single access latencies have improved. However recently, due to physical limitations, while data rate has been increasing, access latency has been roughly consistent.
|Memory and Bandwidth, up to DDR4|
|*Not all of these are JEDEC Standards|
Pivoting to DDR5, JEDEC has enabled standards ranging from DDR5-3200 to DDR5-6400. It also has placeholders up to DDR5-8000, however the specifics of those standards are still a work in progress. At the end of DDR3, and through DDR4, JEDEC introduced additional sub-timing specifications for each data rate - for each of the data rates, JEDEC has specified an ‘A’ fast standard, a ‘B’ common standard, and a ‘C’ looser standard – technically the looser standard is more applicable to higher capacity modules. It means that each data rate can cast a wide range of performance based on the quality of the silicon used.
Starting with the lowest data rate, the DDR5-3200A standard supports 22-22-22 sub-timings. At a theoretical peak of 25.6 GB/s bandwidth per channel, this equates to a single access latency of 13.75 nanoseconds.
If we look at SK Hynix’s announcement of DDR5-4800, this could be DDR5-4800B which supports 40-40-40 sub-timings, for a theoretical peak bandwidth of 38.4 GB/s per channel and a single access latency of 16.67 nanoseconds.
Here is the full list, from DDR5-3200 to DDR5-6400, including some of the extra standards not yet finalized.
|DDR5 JEDEC Specifications|
You may remember our report in May 2018, where Cadence and Micron showed off some DDR5-4400 memory in a test platform. We were able to determine from the photographs provided that this system was running at a CAS Latency of 42 clocks. Since then, the JEDEC standard has come down in that speed bracket to support 32-40 clocks, indicating the evolution of the platform.
The table above is a bit cumbersome, so here's the same table showing only the 'A' fastest specifications for each data rate. This likely applies for any installation of the equivalent of 1 module per channel.
|JEDEC DDR5-A Specifications|
In terms of single access latency, we are ultimately not going to be any faster than we were by the end of the DDR3 era. DDR3-1866 at CL13 was already at 13.93 nanoseconds. This means that despite the increasing CAS latency values in clocks (going to CL46 at DDR5-6400), the actual single access latency is still roughly the same in real world time units.
It is interesting to note that the DDR5 specification has provision in the hardware registers for CAS Latencies from CL22 up to CL66. This might be interpolated to mean that even with a sufficiently binned DDR5 memory module, or with overclocking, CL22 might be the lowest possible for the hardware. We know that DDR5 now moves the voltage regulation for the memory onto the module, so that will be an additional area for memory manufacturers to differentiate themselves, especially when targeting the enthusiast market.
For users looking for an insight into how DRAM actually works, then I would like to direct you to our 2010 article entitled 'Everything You Always Wanted To Know About Memory (But Were Afraid To Ask)'. It's a great technical article that I still refer back to, and I still scratch my head over!
Source: JEDEC DDR5 Specification
- DDR5 Memory Specification Released: Setting the Stage for DDR5-6400 And Beyond
- SK Hynix: We're Planning for DDR5-8400 at 1.1 Volts
- Cadence DDR5 Update: Launching at 4800 MT/s, Over 12 DDR5 SoCs in Development
- Samsung to Produce DDR5 in 2021 (with EUV)
- Here's Some DDR5-4800: Hands-On First Look at Next Gen DRAM
- CES 2020: Micron Begins to Sample DDR5 RDIMMs with Server Partners
- SK Hynix Details DDR5-6400
- Keysight Reveals DDR5 Testing & Validation System
- SK Hynix Develops First 16 Gb DDR5-5200 Memory Chip, Demos DDR5 RDIMM
- Cadence & Micron DDR5 Update: 16 Gb Chips on Track for 2019
- Cadence and Micron Demo DDR5-4400 IMC and Memory, Due in 2019
Post Your CommentPlease log in or sign up to comment.
View All Comments
Spunjji - Wednesday, October 7, 2020 - linkIntel do have a long-standing habit of getting consumers to foot some of the early adoption tax for new memory technology. Tiger Lake already has support for LPDDR5, which is rumoured to be hitting products some time early next year - it does at least make sense there due to the potential power of its iGPU (driver issues notwithstanding).
TheWereCat - Thursday, October 8, 2020 - linkAh yes.
Z97 will support Broadwell and Z490 is PCIe4 ready.
qlum - Tuesday, October 6, 2020 - linkLooking at these specifications makes me wish there was a higher power variant of the ddr spec. I feel like the continuous push for lower power makes these latencies as high as they are. While I do understand laptops and servers having more benefits from lower power the desktop space generally would not care too much about it and would mostly benefit from lower latency. The difference in latencies between xmp profiles and stock on modern memory are really high. Most memory and memory controllers can handle quite a bit higher voltages than the spec calls for. With these voltages latencies could be reduced and it would be easier to reach higher speeds.
Sadly the desktop segment is not the largest so probably not the biggest consideration at least not anymore.
DiHydro - Tuesday, October 6, 2020 - linkI have to point out that 12 Nanoseconds at the speed of light is 3.598 meters (11.8 ft). This seems like a lot when the entire memory trace of a motherboard might be 0.3 m or about a foot. But when you account for clock instability and manufacturing tolerance, we are getting closer and closer to the constraints set by physics. I wouldn't be surprised if that lower floor of ~14ns we see in the specs has more to do with the limitations of the connectors, traces, and motherboard materials than it does with what the chips and memory controllers could do if they were directly soldered together.
DanNeely - Tuesday, October 6, 2020 - linksignals in a PCB only travel a bit over half the speed of light. That drops your theoretical number down to about 6.4ft.
DiHydro - Tuesday, October 6, 2020 - linkThanks for the info, that makes me more impressed with the latencies we see on the top end of the enthusiast RAM market. I would love to hear from a memory system engineer to see what they think about being able to go lower latency over higher bandwidth.
Tomatotech - Tuesday, October 6, 2020 - linkMany modern system RAM chips are indeed soldered in.
Apple was probably the first to do it on high-end laptops, but other companies followed suit. Nowadays the vast majority of fast systems have soldered in RAM chips - iPads, iPhones, Samsung Galaxies, Microsoft Surface Pro, Macbook Air, Macbook Pro, Google Pixels, many high end Chromebooks, the list goes on.
It doesn't seem to have led to any reduction in latency that I know of.
Gigaplex - Tuesday, October 6, 2020 - linkApparently the latency of HBM2 is roughly comparable to conventional RAM.
limitedaccess - Tuesday, October 6, 2020 - linkDIY desktop kits do use higher voltage (1.35V) compared to the JEDEC spec (1.2V) which enables the higher speeds.
ats - Tuesday, October 6, 2020 - linkthe issue isn't really power but cost. They've made/make variants of DRAM with much lower latencies but the raw cost per GB is almost an order of magnitude higher than with commodity dram. for most of the market, ~10ns array latency hits the sweet spot on cost vs performance.