Memory Frequency Scaling on Intel's Skull Canyon NUC - An Investigation
by Ganesh T S on August 29, 2016 8:00 AM ESTOverclocking has generally been the domain of enthusiasts with desktop rigs. However, increasing the CPU frequency beyond the official specification is not the only way to extract more performance from a computing system. Memory-bound workloads can benefit from memory hierarchies with increased bandwidth and/or lower latencies.
The memory controller in the Intel U-series processors (on which most, if not all, high performance SFF systems are based) is rated for operation only according to the standard JEDEC guidelines (1600 MHz for DDR3 and 2133 MHz for DDR4). However, the Skull Canyon NUC (NUC6i7KYK) can support DDR4 SODIMMs operating at 2133 MHz+. In this article, we explore the effects of varying DDR4 frequencies and latencies on the NUC6i7KYK.
Background
Intel's Skylake platform brought DDR4 DRAM into the mainstream market. DDR4 brings a host of improvements over DDR3, particularly in terms of operating at lower voltage and higher frequencies. Another important advantage is the maximum capacity per DIMM (moving from the usual 8GB in DDR3 to 16GB in DDR4). On standard non-overclocked systems, the DDR4 memory can operate at up to 2133 MHz. DDR4 DIMMs operating as high as 3733 MHz are available for desktop systems with full-sized memory slots. On the SODIMM side, we have seen various vendors introduce kits operating at more than 2133 MHz. However, there are very few systems utilizing DDR4 SODIMMs that also support for memory overclocking (a term I will use in this article to indicate DDR4 operation at more than 2133 MHz).
SODIMMs are used in notebooks and SFF PCs. The memory in most notebooks is rarely upgraded, and many high-performance notebooks are now opting to go with soldered DRAM instead of SODIMM slots. That leaves SFF PCs as the only viable option to explore the benefits of these new wave of DDR4 SODIMMs. Skylake-U is not an effective option because its memory controller is rated to operate at only 2133 MHz. However, the Skull Canyon NUC (NUC6i7KYK), which combines a Skylake-H CPU (Core i7-6770HQ) with the H170 chipset, is a completely different platform. The BIOS includes XMP support, resulting in the appropriate SODIMM kits operating at speeds higher than 2133 MHz automatically.
One of the advantages of DDR4 SODIMMs is that we can get up to 16GB on a single SODIMM. On systems like the NUC6i7KYK with two SODIMM slots, we can have up to 32GB of DRAM. To that end, for today's article we have procured 2x16GB DDR4 SODIMM kits rated for operation at more than 2133 MHz from Corsair, Crucial, G.Skill, Kingston and Patriot Memory to study the impact of memory overclocking on the performance of Skull Canyon.
The Core i7-6770HQ Memory Controller and Hierarchy
Our Skull Canyon NUC (NUC6i7KYK) review analyzed the platform and BIOS in great detail. The block diagram shows two channels of DDR4 memory operating at 1.2V for a 128b memory interface. When both memory slots are occupied, the two memory channels operate in dual-channel symmetric mode (also known as interleaved mode) and provide the best performance for real-world applications. Addresses are ping-ponged between the channels after each cache line boundary (64 bytes). In the case that only one slot is occupied, the operation is in single-channel asymmetric mode. The memory controller does support ECC RAM (since it is used in the Xeon E3-1500 v5 lineup also), but the feature is disabled in the Core i7-6770HQ.
Intel recommends that the SODIMMs used in the NUC6i7KYK support the Serial Presence Detect (SPD) data structure. This allows the BIOS to read the SPD data and program the chipset to accurately configure memory settings for optimum performance. If non-SPD memory is installed, the BIOS will attempt to correctly configure the memory settings, but performance and reliability may be impacted or the SO-DIMMs may not function under the determined frequency.
The other interesting aspect of the Core i7-6770HQ that bears relevance to the performance of the NUC6i7KYK is the internal memory hierarchy. The processor is one of the Skylake-H members to come with a 128MB On-Package Cache (eDRAM). The processor also features a 6MB L3 (LLC) cache, which is smarted among all 4 CPU cores. Each core also has a dedicated 256KB L2 cache (making for a total of 1MB L2 in the Core i7-6770HQ). Skylake's core architecture (32KB of I-cache and 32KB of D-cache) is well-known and has been analyzed before.
The LLC in the Core i7-6770HQ is only 6MB (1.5MB/core), while other members of the Skylake-H family with Iris Pro Graphics (same 128MB eDRAM configuration) have 2MB of LLC per core (total of 8MB). The most important aspect here is that the eDRAM is not available only to the GPU, but also to the other clients of the memory controller.
The BIOS of the NUC6i7KYK supports the Extreme Memory Profile (XMP), an Intel-developed JEDEC SPD extension for memory kits to indicate support for high-performance timings that are beyond the standard JEDEC standard. This allows 'overclocked' memory kits to be plug-and-play. The BIOS can read the extra SPD information at boot time and automatically set the memory timings to the overclocked configuration.
Evaluating Memory Frequency Scaling on the NUC6i7KYK
The rest of this review deals with the quantitative measurement of the effectiveness of different types of DRAM in the Skull Canyon NUC. In order to do this, we processed various benchmarks while keeping everything other than the DRAM SODIMMs constant. Each configuration was booted to BIOS multiple times to ensure that the SPD information was properly parsed and the optimal frequency / timing parameters chosen. Once the OS was booted, we also checked with multiple hardware monitoring tools that the parameters indicated by the BIOS for the DRAM SODIMMs were indeed what the OS was also seeing.
Intel NUC6i7KYK (Skull Canyon) Specifications | |
Processor | Intel Core i7-6770HQ Skylake-H, 4C/8T, 2.6 GHz (Turbo to 3.5 GHz), 14nm, 6MB L2, 45W TDP |
Memory | Various |
Graphics | Intel Iris Pro Graphics 580 |
Disk Drive(s) | Samsung SSD 950 PRO (512 GB; M.2 Type 2280 PCIe 3.0 x4 NVMe; 40nm; MLC V-NAND) |
Networking | Intel Dual Band Wireless-AC 8260 (2x2 802.11ac - 866 Mbps) Intel Ethernet Connection I219-LM GbE Adapter |
Audio | 3.5mm Headphone Jack Capable of 5.1/7.1 digital output with HD audio bitstreaming (HDMI) |
Miscellaneous I/O Ports | 4x USB 3.0 1x Thunderbolt 3 / USB 3.1 Gen 2 1x SDXC |
Operating System | Retail unit is barebones, but we installed Windows 10 Pro x64 |
Pricing (As configured) | $1027 |
Full Specifications | Intel Skull Canyon NUC6i7KYK Specifications |
In the next section, we will first take a look at the specifications of the five SODIMM kits that were evaluated in the NUC6i7KYK, along with the AIDA64 Memory Bench for each. Following this, we present the relevant benchmarks from Intel's Memory Latency Checker tool to determine the raw performance of the DRAM in the system. This is followed by our standard test suite for mini-PCs with a gaming focus - SYSmark 2014, Futuremark benchmarks and some select gaming titles. Prior to our concluding remarks, we take a look at a few miscellaneous aspects such as power consumption and pricing.
31 Comments
View All Comments
alacard - Monday, August 29, 2016 - link
I always find these kinds of articles funny, especially coming from Anandtech. When you test an SSD you test multi-tasking performance (your destroyer benchmarks), but you don't bother to do so with memory, even tho like an SSD, multi-tasking performance it's the only metric that actually matters.Just like RAM, take 50 different SSDs and run application startup and game loading tests on them and you will get almost exactly the same results across the board, and THIS IS WHY YOU HAVE A MULTI-TASK BENCHAMRK, because without seeing how the SSD can handle varied workloads the results are MEANINGLESS because at a baseline of loading single applications, SSDs are practically all the same.
It works the same with RAM. A user typically spends more time multi-tasking than running one thing, but you don't even bother testing multi-tasking performance on faster memory. What the hell is going on here? How many more of these useless articles are you guys going to keep churning out before you start actually investigating the true differences between RAM speeds and latency with meaningful benchmarks that will actually show the difference?
ganeshts - Monday, August 29, 2016 - link
Previous memory scaling reviews that have been linked above by Ian show that various SINGLE application workloads can benefit immensely from memory frequency scaling. Our intention here was to show that this is NOT the case with the Skull Canyon NUC. The numbers also point to the effectiveness of the eDRAM as a cache for all the components of the processor, and not just the GPU. In that, I believe the review has provided a definitive answer to comments like these : http://www.anandtech.com/comments/10343/the-intel-... ; Many people expected to get better gaming bench numbers with higher frequency memory in the Skull Canyon NUC, and I hope this article was able to resolve their doubts and helped them in choosing the right memory for their system.Second, when it comes to multi-tasking - higher capacity memory will ensure that applications will not swap out and will be readily available for resumption. In our evaluation, all SODIMMs are 32GB in capacity, and that is not a factor. In addition, DRAM is not like a SSD where we have a controller trying to manage wear levelling and other similar tasks.
Multi-tasking, when it comes to DRAM, is not a set of 'parallel accesses' that can benefit directly from faster memory. Any performance benefit that is obtained is when pressure on the caches causes evictions and the new data needs to be fetched in. I would imagine a proper large-sized real-life workload can cause a similar 'access trace' to the main memory (a full-length PCMark 8 workload would probably be the same as 7-Zip and mplayer active at the same time). In the Skull Canyon NUC, we also have the 128MB eDRAM to be rendered 'ineffective' - i.e, the applications need to even thrash that memory if they have to show better performance with the faster memories.
For what it is worth, the Intel Memory Latency Checker tool has 'multi-tasking' tests in the sense that accesses are simulated from all cores simultaneously. We do have the numbers for those, but, since we believe they are not reflective of the type of workloads for the Skull Canyon NUC, we chose not to publish them. I can upload and link those numbers later tonight.
PetarNL - Monday, August 29, 2016 - link
I suspect that the reason why you hadn't seen much benefit with higher DRAM bandwidths is the TDP limit on the iGPU. The situation might be different with the 65W model of the Iris Pro 580.The Skull Canyon Iris Pro 580 manages only a 10-15% boost over the iGPU in i7 5775c/r, despite having 50% more EU and a generational advantage. I would recommend that you redo this test once you get your hands on a i7 6785r based product.
Flying Aardvark - Wednesday, August 31, 2016 - link
Yes and thanks for following up on that! Literally no one else has. I'm surprised you guys pay that close of attention to the comments. It's a shame that Intel didn't put a little more TLC into Skull Canyon's R&D phase, to ensure every ounce of performance could be pulled out of this chip. But limited to a mere 45watts for the CPU/IGP combined, I suppose this was a likely outcome.There's just so many possible bottlenecks with a tiny system with a low heat/power requirement. Intel may have tightened the noose around the noose around this one just a bit much. A few design tweaks and it could really soar. Looking forward to the Kabylake or Cannonlake update.
Senti - Monday, August 29, 2016 - link
Just a note about how much memory progressed, including the worth of "premium" kits.The result of the same Intel Memory Latency Checker on my quite ancient i7-930 with overclocked to 1686MHz, no-name, no radiators and even mixed model (one set of chips made by Samsung, the other Hyundai) DDR3 memory:
Latency: 43.8
1:1 Reads-Writes BW: 29805.4.
Yes, it's triple-channel, but it doesn't help latency at all and even BW difference isn't great from what I remember testing it in dual-channel mode.
evilspoons - Monday, August 29, 2016 - link
At a cursory glance of the benchmarks (without doing statistical analysis on them, I mean) I'd say they tie so often it's irrelevant except occasionally the Patriot 2800 kit falls behind more than any of the other kits do. On the final page, I noticed it has the worst as-tested tRFC, tied-for-worst tRAS and tCL, and middle of the road everything else. Nothing to see here, move along!mr. president - Tuesday, August 30, 2016 - link
Any word on the performance cliff going from 1280x1024 to 1680x1050? Is that eDRAM in action or just different detail settings?1680x1050 is only around 30% more pixels. It's strange to see such non-linear scaling.
ganeshts - Tuesday, August 30, 2016 - link
They have different detail settings - usually, higher the resolution, the more the details.Similar trends have been observed in other gaming PCs also.
FlyingAarvark - Wednesday, August 31, 2016 - link
I'd have to disagree that the 128MB L4 is the reason the RAM doesn't matter. It's (45W) TDP starved foremost, once that's cleared up the RAM will come into play.While definitely a less than ideal setup being so power starved, I'm convinced to buy a Skull Canyon. Then wait for the 10nm update- things are getting interesting in nukeland.
Just a great little machine. Especially now that the last number of years I've backed off FPS/graphically intensive gaming. You can only play those so many decades when you started with Wolfenstein 3D. For League of Legends and probably some Hearthstone this hits the spot.
My nuke will be getting the cheapest RAM option that Crucial sells. :) I really hope Intel invests into these heavily, I'm convinced they're the future of PCs and I'd like to see AMD get into the NUC scene.
Dansolo - Friday, September 2, 2016 - link
The CPU benchmark for Photoscan stage2 seems a little iffy... 2800MHz RAM doing a fair bit better than the 3067 that has better timings? I don't believe that - gotta be something wrong with the test.