Intel Unveils Lunar Lake Architecture: New P and E cores, Xe2-LPG Graphics, New NPU 4 Brings More AI Performance
by Gavin Bonshor on June 3, 2024 11:00 PM ESTIntel this morning is lifting the lid on some of the finer architectural and technical details about its upcoming Lunar Lake SoC – the chip that will be the next generation of Core Ultra mobile processors. Once again holding one of their increasingly regular Tech Tour events for media and analysts, Intel this time set up shop in Taipei just before the beginning of Computex 2024. During the Tech Tour, Intel disclosed numerous facets of Lunar Lake, including their new P-Core design codenamed Lion Cove and a new wave of E-cores that are a bit more like Meteor Lake's pioneering Low Power Island E-Cores. Also disclosed was the Intel NPU 4, which Intel claims delivers up to 48 TOPS, surpassing Microsoft's Copilot+ requirements for the new age of AI PCs.
Intel's Lunar Lake represents a strategic evolution in their mobile SoC lineup, building on their Meteor Lake launch last year, focusing on enhancing power efficiency and optimizing performance across the board. Lunar Lake dynamically allocates tasks to efficient cores (E-cores) or performance cores (P-cores) based on workload demands by leveraging advanced scheduling mechanisms, which are assigned to ensure optimal power usage and performance. Still, once again, Intel Thread Director, along with Windows 11, plays a pivotal role in this process, guiding the OS scheduler to make real-time adjustments that balance efficiency with computational power depending on the intensity of the workload.
Intel CPU Architecture Generations | |||||
Alder/Raptor Lake | Meteor Lake |
Lunar Lake |
Arrow Lake |
Panther Lake |
|
P-Core Architecture | Golden Cove/ Raptor Cove |
Redwood Cove | Lion Cove | Lion Cove | Cougar Cove? |
E-Core Architecture | Gracemont | Crestmont | Skymont | Crestmont? | Darkmont? |
GPU Architecture | Xe-LP | Xe-LPG | Xe2 | Xe2? | ? |
NPU Architecture | N/A | NPU 3720 | NPU 4 | ? | ? |
Active Tiles | 1 (Monolithic) | 4 | 2 | 4? | ? |
Manufacturing Processes | Intel 7 | Intel 4 + TSMC N6 + TSMC N5 | TSMC N3B + TSMC N6 | Intel 20A + More | Intel 18A |
Segment | Mobile + Desktop | Mobile | LP Mobile | HP Mobile + Desktop | Mobile? |
Release Date (OEM) | Q4'2021 | Q4'2023 | Q3'2024 | Q4'2024 | 2025 |
Lunar Lake: Designed By Intel, Built By TSMC (& Assembled By Intel)
While there are many aspects of Lunar Lake to dive into, perhaps it's best we start with what's sure to be the most eye-catching: who's building it.
Intel's Lunar Lake tiles are not being fabbed using any of their own foundry facilities – a sharp departure from historical precedence, and even the recent Meteor Lake, where the compute tile was made using the Intel 4 process. Instead, both tiles of the disaggregated Lunar Lake are being fabbed over at TSMC, using a mix of TSMC's N3B and N6 processes. In 2021 Intel set about freeing their chip design groups to use the best foundry they could – be it internal or external – and there's no place that's more apparent than here.
Overall, Lunar Lake represents their second generation of disaggregated SoC architecture for the mobile market, replacing the Meteor Lake architecture in the lower-end space. At this time, Intel has disclosed that it uses a 4P+4E (8 core) design, with hyper-threading/SMT disabled, so the total thread count supported by the processor is simply the number of CPU cores, e.g., 4P+4E/8T.
The build-up of Lunar Lake combines a synergetic collaboration between Intel’s architectural design team and TSMC's manufacturing process nodes to bring the latest Lion Cove P-cores to Lunar Lake, which boosts Intel's architectural IPC as you would expect from a new generation. At the same time, Intel also introduces the Skymont E-cores, which replace the Low Power Island Cresmont E-cores of Meteor Lake. Notably, however, these E-cores don't connect to the ring bus like the P-cores, which makes them a sort of hybrid LP E-core, combining the efficiency gains of the more advanced TSMC N3B node with the double-digit gains in IPC over the previous Crestmont cores.
The entire compute tile, including the P and E-cores, is built on TSMC's N3B node, while the SoC tile is made using the TSMC N6 node.
At a higher level, Intel is once again using their Foveros packaging technology here. Both the compute and SoC (now the "Platform Controller") tiles sit on top of a base tile, which provides high-speed/low-power routing between the tiles, and further connectivity to the rest of the chip and beyond.
In another first for a mainstream Intel Core product, the Lunar Lake SoC platform also includes up to 32 GB of LPDDR5X memory on the chip package itself. This is arranged as a pair of 64-bit memory chips, offering a total 128-bit memory interface. As with other vendors using on-package memory, this change means that users can't just upgrade DRAM at-will, and the memory configurations for Lunar Lake will ultimately be determined by what SKUs Intel opts to ship.
With Lunar Lake, Intel is also strongly focusing on AI, as the architecture integrates a new NPU called NPU 4. This NPU is rated for up to 48 TOPS of INT8 performance, thus making it Microsoft Copilot+ AI PC ready. This is the bar all of the PC SoC vendors are aiming for, including AMD and Qualcomm too.
Intel's integrated GPU will also be a contributing player here. While not the highly efficient machine that the dedicated NPU is, the Arc Xe2-LPG brings dozens of additional T(FL)OPS of performance with it, and some additional flexibility an NPU doesn't come with. Which is why you'll also see Intel rating the performance of these chips in terms of total platform TOPS – in this case, 120 TOPS.
Intel's collaboration with Microsoft further enhances workload management through the fabled Intel Thread Director, optimized for applications such as the Copilot assistant. Given the time of the introduction of Lunar Lake, it somewhat sets the stage for a Q3 2024 launch, which coincides with the holiday 2024 market.
Intel Lunar Lake: Updating Intel Thread Director & Power Management Improvements
To say that energy efficiency is a key goal for Lunar Lake would be an understatement. For as much as Intel is riding high in the mobile PC CPU market (AMD's share there is still but a fraction), the company has been feeling the pressure over the last few years from customer-turned-rival Apple, whose M-series Apple Silicon has been setting the bar for power efficiency over the last few years. And now with Qualcomm attempting to do the same things for the Windows ecosystem with their forthcoming Snapdragon X chips, Intel is preparing to make their own power play.
Intel's Thread Director and power management updates for Lunar Lake show various and significant improvements compared to Meteor Lake. The Thread Director uses a heterogeneous scheduling policy, initially assigning tasks to a single E-core and expanding to other E-cores or P-cores as and when needed. OS containment zones are designed to limit tasks to specific cores, which directly improves power efficiency and delivers the performance needed by the right core for the workload at hand. Integration with power management systems and a quad array of Power Management Controllers (PMC) further allows the chip, in concert with Windows 11, to make context-aware adjustments, ensuring optimal performance with minimal power usage and wastage.
Lunar Lake's scheduling strategy effectively handles power-sensitive applications. One example Intel gave is that video conferencing tasks are kept within the efficiency core cluster, utilizing the E-cores to maintain performance while reducing power consumption by up to 35%, as shown by Intel's provided data. These improvements are achieved through collaboration with OS developers such as Microsoft for seamless integration for optimizing for the best balance between power consumption and performance.
Focusing on the power management system for Lunar Lake, Intel uses its SoC power management, operating in efficiency, balance, and performance modes tailored and designed to adapt to whatever the demands of the workload at the time of operation. This multi-layered approach allows the Lunar Lake SoC to operate efficiently. Again, much like the Intel Thread Director, the PMCs can balance power usage with performance needs.
Intel further plans to enhance the Thread Director by increasing scenario granularity, implementing AI-based scheduling hints, and enabling cross-IP scheduling within Windows 11. These enhancements essentially equate to workload management designed to boost overall power efficiency and deliver performance across various applications when needed without wasting power budget by allocating lighter tasks to the higher power P-cores.
Over the next few pages, we'll explore the new P and E cores and Intel's update to ther integrated Arc Xe (Xe2-LPG) graphics.
91 Comments
View All Comments
The Hardcard - Wednesday, June 5, 2024 - link
There will be Lion Cove with hyperthreading. It is designed such that it can be physically left out or included in depending on the value to each product.It was left out of Lunar Lake as the primary goal here is performance per watt and battery life superiority over Apple and Qualcomm.
Server Lion Cove will absolutely have hyperthreading. Rumors are Arrow Lake will have it as well.
TMDDX - Wednesday, June 5, 2024 - link
Is on chip "AI" the new connected standby for NSA spying?ballsystemlord - Wednesday, June 5, 2024 - link
Shhhhh, you're not supposed to say that. It's classified. ;)sharath.naik - Wednesday, June 5, 2024 - link
So would this have on package memory, what is the size of memory? how many P cores how many E cores? So many questions no answers. Is this like a paper launch?sharath.naik - Wednesday, June 5, 2024 - link
Never mind I was wrong. 4E+4P and up to 32 GB RAM. I wish they had option for 64GB, but 32GB is a good numberstephenbrooks - Wednesday, June 5, 2024 - link
The wider Lion Cove core looks pretty impressive, I'll be interested to see how it does in desktops.name99 - Wednesday, June 5, 2024 - link
"In total, this puts 240KB of cache within 9 cycles' latency of the CPU cores"Does it? If they do things the usual Intel way the L1 is inclusive of the L0...
Other options are possible, of course, but were they implemented?
mode_13h - Thursday, June 6, 2024 - link
I wonder if the tag RAM for the L0, L1D, and L2 are all separate? It would be interesting if they grouped it all together in a tree-structured lookup and put that as close as possible to the core's load/store unit. The actual data memory of the caches could be the only part that's physically separate.Bruzzone - Wednesday, June 5, 2024 - link
It's worth the wait to Lunar and Arrow?Or take advantage of the Intel and AMD current generation clearance sales?
Intel is flooding the channel with Raptor desktop and mobile in the last eight weeks apparently to sustain a Core supply bridge' into Lunar and Arrow. Intel is also sucking the financial capital out of the channel in an effort to block or slow the procurement of anything other than Intel.
In parallel fighting it out for surplus control, AMD is also engaged sucking financial capital out of the channel by flooding the channel specifically with Raphael desktop.
Where Meteor Lake and AMD Phoenix, Hawks and Granite Ridge continue as intermediate 'Al' technologies into Strix mobile and Arrow desktop. Not that I care about AI functionality currently.
14th desktop channel available + 98% in the prior eight weeks
13th desktop + 24.6%
12th desktop + 33.4%
Intel desktop all up;
14th desktop available today = 24.9%
13th desktop = 37% that is 48.4% more than 14th
12th desktop = 37.9% equivalent with 13th
Specific Intel mobile;
Intel Meteor Lake mobile channel available gains + 216%. Within Meteor Lake Core SKUs are 10.3%. Among total, H performance mobile = 43.9% and U low power mobile = 56%. Meteor Lake associated are 11% of all Raptor Lake 13th mobile.
14th mobile H + 16% in week and 30% of all Meteor and 36% of all 13th Raptor mobile H.
13th mobile itself gains + 5.1%
13th H specifically gains + 8.6%
13th P clears down < 3.2%
13th U gains + 4.8%
12th Alder mobile all up + 13.2% in the prior eight weeks
12th H specifically = flat
12th P clears down < 3.2%
12th U clears down < 2.6%
I will have AMD desktop and mobile supply, trade-in and sales trend up later today at my SA comment line. Here are some immediate observations;
5900XT and 5800XT on AMD so said pricing is sufficient to push Vermeer channel holdings down in price at so said $359 and $249 now pulled by AMD in the moment. The channel might not have been happy with that regulating price move on how much R5K there is too clear from the channel. R5K channel available is up + 68% since March 9 when R5K was 68% of all R7K and today 98% of R7K available.
R7K desktop since March 9 channel supply volume available + 18%. R9K will minimally dribble out allowing R7K and R5K to clear? R9K might have to be priced up on specific SKUs to accomplish the same dribbling out objective allowing AMD back generation to clear?
Notably 3600 gains in the channel + 94% in the prior month.
3600X came back to secondary resale + 35%.
3700X is up + 15.8% that's all trade-in.
AMD might have to adjust R9K desktop top SKU and R5K desktop regulating SKUs not to interfere with the channel's ability to liquidate especially Vermeer from channel inventory holdings plus R7K SKUs that will follow in a first in first out channel sales system.
In summary, there is plenty of Intel and AMD product in the channel. The PC market remains in a downward deflationary price spiral until at least q1 2025 aimed to clear existing inventories for channel financial reclaim to buy next generation.
Subsequently there's this inventory bridge to traverse to Intel and AMD next generation products and through the summer into q4 it's never been a better time to buy a PC. I don't think desktop and mobile prices will be as low as they are heading into year end and for a long time following.
For Intel at least flooding the channel with product indicates Intel is buying time.
mb
BushLin - Wednesday, June 5, 2024 - link
Thanks for the uncited nonsense Mike, we were all on tenterhooks.