Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Smallby Andrei Frumusanu on April 6, 2021 11:00 AM EST
- Posted in
- Xeon Scalable
- Ice Lake-SP
Section by Ian Cutress
The launch of Intel’s Ice Lake Xeon Scalable processors has been in the wings for a number of years. The delays to Intel’s 10nm manufacturing process have given a number of setbacks to all of Intel’s proposed 10nm product lines, especially the high performance Xeon family: trying to craft 660 mm2 of silicon on a process is difficult at the best of times. But Intel has 10nm in a place where it is economically viable to start retailing large Xeon processors, and the official launch today of Intel’s 3rd Generation Xeon Scalable is on the back of over 200,000+ units shipped to major customers to date. The new flagship, the Xeon Platinum 8380, has 40 cores, offers PCIe 4.0, and takes advantage of the IPC gain in Intel’s Sunny Cove processor core. We’re testing it against the best in the market.
Intel’s 3rd Generation Xeon Scalable: 10nm Goes Enterprise
Today Intel is launching the full stack of processors under the 3rd Generation Xeon Scalable Ice Lake branding, built upon its 10nm process. These processors, up to 40 cores per socket, are designed solely for single socket and dual socket systems, competing in a market with other x86 and Arm options available. With this new generation, Intel’s offering is aimed to be two-fold: first, the generational uplift compared to 2nd Gen, but also the narrative around selling a solution rather than simply selling a processor.
Intel’s messaging with its new Ice Lake Xeon Scalable (ICX or ICL-SP) steers away from simple single core or multicore performance, and instead is that the unique feature set, such as AVX-512, DLBoost, cryptography acceleration, and security, along with appropriate software optimizations or paired with specialist Intel family products, such as Optane DC Persistent Memory, Agilex FPGAs/SmartNICs, or 800-series Ethernet, offer better performance and better metrics for those actually buying the systems. This angle, Intel believes, puts it in a better position than its competitors that only offer a limited subset of these features, or lack the infrastructure to unite these products under a single easy-to-use brand.
An Wafer of 40-core Ice Lake Xeon 10nm Processors
Nonetheless, the launch of a new generation of products and an expanded portfolio warrants the product to actually be put under test for its raw base performance claims. This generation of Xeon Scalable, Intel’s first on 10nm, uses a newer architecture Sunny Cove core. Benefits of this core, as explained by Intel, start with an extra 20% raw performance increase, enabled through a much wider core with an improved front end and a more execution resources. Outside of the core, memory bandwidth is improved both by increasing memory channels from six to eight, but also new memory prefetch techniques and optimizations that increases bandwidth up to 100% with another +25% efficiency. The mesh interconnect between the cores also uses updated algorithms to feed IO to and from the cores, and Intel is promoting better power management through independent power management agents inside each IP block.
On top of this, Intel is layering on accelerative features, stating that over the raw performance, software optimized for these accelerators will see a better-than-generational uplift. This starts with the basic core layout, especially as it pertains to SIMD commands such as SSSE, AVX, AVX2, and AVX-512: Intel is enabling better cryptography support across its ISA, enabling AES, SHA, GFNI, and other instructions to run simultaneously across all vector instruction sets. AVX-512 has improved frequencies during more complex bit operations for ICX with smarter mapping between instructions and power draw, offering an extra 10% frequency for all 256-bit instructions. On top of this is Intel’s Speed Select Technologies, such as Performance Profile, Base Frequency improvements, Turbo Frequency improvements, and Core Power assistance to ensure peak per-core performance or quality of service during a heavily utilized system depending on customer requirements. Other new features include Software Guard Extensions, enabling enclave sizes up to 512 GB per socket with select models.
Ice Lake’s Sunny Cove Core: Part 2
The Sunny Cove core has actually already been in the market. Intel has made a consumer variant of the core and a server variant of the core. Ice Lake Xeon has the server variant, with bigger caches and slightly different optimization points, but it’s the consumer variant that we have seen and tested in laptop form. Sunny Cove is part of Intel’s Ice Lake notebook processor portfolio, which we reviewed the performance back on August 1st 2019, which 614 days ago. That length of time between enabling a core for notebooks and enabling the same core (with upgrades for servers) on enterprise is almost unheard of, but indicative of Intel’s troubles in manufacturing.
Nonetheless, in our notebook testing of the Ice Lake core, we saw a raw +17-18% performance over the previous generation, however this was at the expense of 15-20% in frequency. Where the product truly excelled was in memory limited scenarios, where a new memory controller provided better-than-generational uplift. When it comes to this generation of Xeon Scalable processors with the new core, as you see in the review, in non-accelerated workloads we get very much a similar story. That being said, consumer hardware is very often TDP limited, especially laptops! With the new Ice Lake Xeon platform, Intel is boosting the peak TDP from 205 W to 270 W, which also gives additional performance advantages.
The Headline Act: Intel’s Xeon Platinum 8380
The head prefect of Intel’s new processor lineup is the Platinum 8380 - a full fat 40 core behemoth. If we put it side by side with the previous generation processors, there some key specifications to note.
|Intel Xeon Comparison: 3rd Gen vs 2nd Gen
Peak vs Peak
|40 / 80||Cores / Threads||28 / 56|
|2900 / 3400 / 3000||Base / ST / MT Freq||2700 / 4000 / 3300|
|50 MB + 60 MB||L2 + L3 Cache||28 MB + 38.5 MB|
|270 W||TDP||205 W|
|PCIe 4.0 x64||PCIe||PCIe 3.0 x48|
|8 x DDR4-3200||DRAM Support||6 x DDR4-2933|
|4 TB||DRAM Capacity||1 TB|
|4 TB Optane
+ 2 TB DRAM
|1 TB DDR4-2666
+ 1.5 TB
|512 GB||SGX Enclave||None|
|1P, 2P||Socket Support||1P, 2P, 4P, 8P|
|3 x 11.2 GT/s||UPI Links||3 x 10.4 GT/s|
6258R, 2P Variant
is only $3950
Between these processors, the new flagship has a number of positives:
- +43% more cores (40 vs 28),
- nearly double the cache,
- +33% more PCIe lanes (64 vs 48),
- 2x the PCIe bandwidth (PCIe 4.0 vs PCIe 3.0)
- 4x the memory support (4 TB vs 1 TB)
- SGX Enclave support
- +7% higher socket-to-socket bandwidth
- Support for DDR4-3200 Optane DCPMM 200-series
- Price is down 20%... or up 100% if you compare to 6258R
Though we should perhaps highlight some of the negatives:
- TDP is up +32% (270 W vs 205 W)
- ST Frequency is down (3400 MHz vs 4000 MHz)
- MT Frequency is down (3000 MHz vs 3300 MHz)
If we combine the specification sheet cores and all-core (MT) frequency, Ice Lake actually has about the same efficiency here as the previous generation. Modern high-performance processors often operate well outside the peak efficiency window, however Ice Lake being at a lower frequency would usually suggest that Ice Lake is having to operate closer to the peak efficiency point to stay within a suitable socket TDP than previous generations. This is similar to what we saw in the laptop space.
Features across all Ice Lake Xeon Scalable processors
We’ll dive into the different processors over on the next page, however it is worth noting some of the key features that will apply to all of Intel’s new ICL-SP family. Across the ~40 new processors, including all the media focused parts, the network focused processors, and all the individual optimizations used, all of the processors will have the following:
All Ice Lake Xeons will support eight channels of DDR4-3200 at 2DPC(new info)
- All Ice Lake Xeons will support 4 TB of DRAM per socket
- All Ice Lake Xeons will support SGX Enclaves (size will vary)
- All Ice Lake Xeons will support 64x PCIe 4.0 lanes
- All Ice Lake Xeons will support 2x FMA
- Platinum/Gold Xeons will support 3x UPI links at 11.2 GT/s, Silver is 2x links at 10.4 GT/s
- Platinum/Gold Xeons will support 200-series Optane DC Persistent Memory
In the past, Intel has often productized some of these features at will sell the ones that are more capable at a higher cost. This segmentation is often borne from a lack of competition in the market. This time around however, Intel has seen fit to unify some of its segmentation for consistency. The key one in my mind is memory support: at the start of the Xeon Scalable family, Intel started to charge extra for high-capacity memory models. But in light of the competition now offering 4 TB/socket at no extra cost, it would appear that Intel has decided to unify the stack with one memory support option.
Intel 3rd Generation Xeon Scalable: New Socket, New Motherboards
Ice Lake Xeons, now with eight memory channels rather than six, will require a new socket and new motherboards. Ice Lake comes with 4189 pins, and requires an LGA4189-4 ‘Whitley’ motherboard. This is different to the LGA4189-5 ‘Cedar Island’ in use for Cooper Lake, and the two are not interoperable, however they do share a power profile.
This actually brings us onto a point about Intel’s portfolio. Technically 10nm Ice Lake is not the only member of the 3rd Gen Xeon Scalable family – Intel has seen fit to bundle both 14nm Cooper Lake and 10nm Ice Lake under the same heading. Intel is separating the two by stating that Cooper Lake is focused at several specific high volume customers looking to deploy quad-socket and eight-socket systems with specific AI workloads. By comparison, Ice Lake is for the mass market, and limited to two socket systems.
Ice Lake and Cooper Lake both have the ‘3’ in the processor name indicating third generation. Users can tell which ones are Cooper Lake because they end in either H or HL – Ice Lake processors (as we’ll see on the next page) never have H or HL. Most Cooper Lake processors are Platinum models anyway, with a few Xeon Gold. As we go through this review, we’ll focus solely on Ice Lake, given that this is the platform Intel is selling to the mainstream.
In the lead up to this launch today, Intel provided us with a 2U system featuring two of the top models of Ice Lake Xeon: we have dual 40 core Xeon Platinum 8380s! At the same time, we have also spent time a dual Xeon Gold 6330 system from Supermicro, which has two 28-core processors, and acts as a good comparison to the previous generation Xeon Platinum 8280.
Our review today will cover the processor stack, our benchmarks, power analysis, memory analysis, and some initial conclusions.
Post Your CommentPlease log in or sign up to comment.
View All Comments
TomWomack - Wednesday, April 7, 2021 - linkIs it known whether there will be an IceLake-X this time round? The list of single-Xeon motherboard launches suggests possibly not; it would obviously be appealing to have a 24-core HEDT without paying the Xeon premium.
EthiaW - Wednesday, April 7, 2021 - linkBoeings and Airbuses are never actually sold at their nominal prices, they cost far less, a non-disclosed number, for big buyers after gruesome haggling, sometimes less than half the “catalogue” price.
I think this is exactly what's intel doing now: set the catalogue price high to avoid losing face, and give huge discount to avoid losing market share.
duploxxx - Wednesday, April 7, 2021 - linkwell easy conclusion.
EPYC 75F3 is the clear winner SKU and the must have for most of the workloads.
This is based on price - performance - cores and its related 3rd party sw licensing...
I wonder when Intel will be able to convince VMware to move from a 32core licensing schema to a 40core :)
They used to get all the dev favor when PAT was still in the house, I had several senior engineers in escalation calls stating that the hypervisor was optimised for Intel ...guess what even under optimised looking for a VM farm in 2020-2021-....you are way better off with an AMD build.
WaltC - Wednesday, April 7, 2021 - linkIf you can't beat the competition, then what? Ian seems to be impressed that Intel was finally able to launch a Xeon that's a little faster than its previous Xeon, but not fast enough to justify the price tag in relation to what AMD has been offering for a while. So here we are congratulating Intel on burning through wads more cash to produce yet-another-non-competitive result. It really seems as if Intel *requires* AMD to set its goals and to tell it where it needs to go--and that is sad. It all began with x86-64 and SDRAM from AMD beating out Itanium and RDRAM years ago. And when you look at what Intel has done since it's just not all that impressive. Well, at least we can dispense with the notion that "Intel's 10nm is TSMC's 7nm" as that clearly is not the case.
JayNor - Wednesday, April 7, 2021 - linkWhat about the networking applications of this new chip? Dan Rodriguez's presentation showed gains of 1.4x to 1.8x for various networking benchmarks. Intel's entry into 5G infrastructure, NFV, vRAN, ORAN, hybrid cloud is growing faster than they originally predicted. They are able to bundle Optane, SmartNICs, FPGAs, eASIC chips, XeonD, P5900 family Atom chips... I don't believe they have a competitor that can provide that level of solution.
Bagheera - Thursday, April 8, 2021 - linkPatr!ck Patr!ck Partr!ck?
evilpaul666 - Saturday, April 10, 2021 - linkIt only works in front of a mirror. Donning a hoodie helps, too.
Oxford Guy - Wednesday, April 7, 2021 - linkThere is some faulty logic at work in many of the comments, with claims like it's cheating to use a more optimized compiler.
It's not cheating unless:
• the compiler produces code that's so much more unstable/buggy that it's quite a bit more untrustworthy than the less-optimized compiler
• you don't make it clear to readers that the compiler may make the architecture look more performant simply because the other architectures may not have had compiler optimizations on the same level
• you use the same compiler for different architectures when using a different compiler for one or more other architectures will produce more optimized code for those architectures as well
• the compiler sabotages the competition, via things like 'genuine Intel'
Fact is that if a CPU can accomplish a certain amount of work in a certain amount of time, using a certain amount of watts under a certain level of cooling — that is the part's actual performance capability.
If that means writing machine code directly (not even assembly) to get to that performance level, so what? That's an entirely different matter, which is how practical/economical/profitable/effortful it is to get enough code to measure all of the different aspects of the part's maximum performance capability. The only time one can really cite that as a deal-breaker is if one has hard data to demonstrate that by the time the hand-tuned/optimized code is written changes to the architecture (and/or support chips/hardware) will obsolete the advantage — making the effort utterly fruitless, beyond intellectual curiosity concerning the part's ability. For instance, if one knows that Intel, for instance, is going to integrate new instructions (very soon) that will make various types of hand-tuned assembly obsolete in short order, it can be argued that it's not worth the effort to write the code. People made this argument with some of AMD's Bulldozer/Piledriver instructions, on the basis that enough industry adoption wasn't going to happen. But, frankly... if you're going to make claims about the part's performance, you really should do what you can to find out what it is.
Oxford Guy - Wednesday, April 7, 2021 - linkOne can, though, of course... include a disclaimer that 'it seems clear enough that, regardless of how much hand-tuned code is done, the CPU isn't going to deliver enough to beat the competition, if the competition's code is similarly hand-tuned' — if that's the case. Even if a certain task is tuned to run twice as fast, is it going to be twice as fast as tuned code for the competition's stuff? Is its performance per watt deficit going to be erased? Will its pricing no longer be a drag on its perceived competitiveness?
For example, one could have wrung every last drop of performance out of Bulldozer but it wasn't going to beat Sandy Bridge E — a chip with the same number of transistors. Piledriver could beat at least the desktop version of Sandy in certain workloads when clocked well outside of the optimal (for the node's performance per watt) range but that's where it's very helpful to have tests at the same clock. It was discovered, for instance, that the Fury X and Vega had basically identical performance at the same clock. Since desktop Sandy could easily clock at the same 4.0 GHz Piledriver initially shipped with it could be tested at that rate, too.
Ideally, CPU makers would release benchmarks that demonstrate every facet of their chip's maximum performance. The concern about those being best-case and synthetic is less of a problem in that scenario because all aspects of the chip's performance would be tested and published. That makes cherry-picking impossible.
mode_13h - Thursday, April 8, 2021 - linkThe faulty logic I see is that you seem to believe it's the review's job to showcase the product in the best possible light. No, that's Intel's job, and you can find plenty of that material at intel.com, if that's what you want.
Articles like this should focus on representing the performance of the CPUs as the bulk of readers are likely to experience it. So, even if using some vendor-supplied compiler with trick settings might not fit your definition of "cheating", that doesn't mean it's a service to the readers.
I think it could be appropriate to do that sort of thing, in articles that specifically analyze some narrow aspect of a CPU, for instance to determine the hardware's true capabilities or if it was just over-hyped. But, not in these sort of overall reviews.