Arm Announces Neoverse V2 and E2: The Next Generation of Arm Server CPU Cores
by Ryan Smith on September 15, 2022 8:00 AM EST- Posted in
- CPUs
- Arm
- Servers
- Neoverse
- ARMv9
- CMN-700
- Neoverse V2
- Neoverse E2
Just under four years ago, Arm announced their Neoverse family of infrastructure CPU designs. Deciding to double-down on the server and edge computing markets by designing Arm CPU cores specifically for those markets – and not just recycling the consumer-focused Cortex-A designs – Arm set about tackling the infrastructure market in a far more aggressive manner. Those efforts, in turn, have increasingly paid off handsomely for Arm and its partners, whom thanks to the likes of products like Amazon’s Graviton and Ampere Altra CPUs have at long last been able take a meaningful piece of the server CPU market.
But as Arm CPUs finally achieve the market penetration that eluded them in the previous decade, Arm needs to make sure it isn’t resting on its laurels. Of the company’s three lines of Neoverse core designs –the efficient E, flexible N, and high-performance V – the company is already on its second generation of N cores, aptly dubbed the N2. Now, the company is preparing to update the rest of the Neoverse lineup with the next generation of V and E cores as well, announcing today the Neoverse V2 and Neoverse E2 cores. Both of these designs are slated to bring the Armv9 architecture to HPC and other server customers, as well as significant performance improvements.
Arm Neoverse V2: Armv9 Graces High-Performance Computing
Leading the charge for Arm’s new CPU core IP is the company’s second-generation V-series design, the Neoverse V2. The complete V2 platform, codenamed Demeter, marks Arm’s first iteration on their high-performance V-series cores, as well as the transition of this core lineup from the Armv8.4 ISA to Armv9. And while this is only Arm’s second go at a dedicated high-performance core for servers, make no mistake: Arm aims to be ambitious. The company is claiming that Neoverse V2 CPUs will offer the highest single-threaded integer performance available in the market, eclipsing next-generation designs from both AMD and Intel.
While this week’s announcement from Arm is not a full-on deep-dive of the new architecture – and, more annoyingly, the company is not talking about specific PPA metrics – Arm is offering a high-level look at some of the changes and features that will be coming with the V2 platform. To be sure, the V2 IP is already finished and shipping to customers today (most notably NVIDIA), but Arm is playing coy to some degree with what they’re saying about V2 before the first chips based on the IP ship in 2023.
First and foremost, the bump to Armv9 brings with it the full suite of features that come with the latest Arm architecture. That includes the security improvements that are a cornerstone feature of the architecture (and especially handy for cloud shared environments) along with Arm’s newer SVE2 vector extensions.
On the latter, Arm is making an interesting change here by reconfiguring the width of their vector engines; whereas V1 implemented SVE(1) using a 2 pipeline 256-bit SIMD, V2 moves to 4 pipes of 128-bit SIMDs. The net result is that the cumulative SIMD width of the V2 is not any wider than V1, but the execution flow has changed to process a larger number of smaller vectors in parallel. This change makes the SIMD pipeline width identical to Arm’s Cortex parts (which are all 128-bit, the minimum size for SVE2), but it does mean that Arm is no longer taking full advantage of the scalable part of SVE by using larger SIMDs. I expect we’ll find out why Arm is taking this route once they do a full V2 deep dive, as I’m curious whether this is purely an efficiency play or something more akin to homogenizing designs across the Arm ecosystem.
Past that, it’s likely worth noting that while Arm’s presentation slides put bfloat16 and int8 matmul down as features, these are not new features. Still, Arm is promising that V2’s SIMD processing will provide microarchitecture efficiency improvements over the V1.
More broadly, V2 will also be introducing larger L2 cache sizes. The V2 design supports up to 2MB of private L2 cache per core, double the maximum size of V1. V2 will also be introducing further improvements to Arm’s integer processing performance, though the company isn’t going into further detail at this point. From an architectural standpoint, the V1 borrowed a fair bit from the Cortex-X1 CPU design, and it wouldn’t be too surprising if that was once again the case for the V2, borrowing from the X2. In which case consumer chips like the Snapdragon 8 Gen1 and Dimensity 9000 should provide a loose reference on what to expect.
For the Demeter platform Arm will be reusing their CMN-700 mesh fabric, which was first introduced for the V1 generation. CMN-700 is still a modern mesh design with support for up to 144 nodes in a 12x12 configuration, and is suitable for interfacing with DDR5 memory as well as PCIe 5/CXL 2 for I/O. As a result, strictly speaking the V2 isn’t bringing anything new at the fabric level – even the 512MB of SLC could be done with a V1 + CMN-700 setup – but this does mean that the CMN-700 mesh and its features is now a baseline moving forward with V2.
The Neoverse V2 core, in turn, is going to be the cornerstone of the upcoming generation of high-performance Arm server CPUs. The de facto flagship here will be NVIDIA’s Grace CPU, which will be one of the first (if not the first) V2 design to ship in 2023. NVIDIA had previously announced that Grace would be based on a Neoverse design, so this week’s announcement from Arm finally confirms the long-held suspicion that Grace would be based on the next-generation Neoverse V core.
NVIDIA, for its part, has their fall GTC event scheduled to take place in just a few days. So it’s likely we’ll hear a bit more about Grace and its Neoverse V2 underpinnings as NVIDIA seeks to promote the chip ahead of its release next year.
Neoverse E2: Cortex-A510 For Use With N2
Alongside the Neoverse V2 announcement, Arm is also using this week’s briefing to announce the Neoverse E2 platform. Unlike the V2 reveal, this is a much smaller scale announcement, and Arm is only offering a handful of technical details. Ultimately, E2’s day in the sun will be coming a bit later on.
That said, the E2 platform is being delivered to partners with an eye towards interoperability with the existing N2 platform. For this, Arm has paired the Cortex-A510 CPU, Arm’s little/high-efficiency Cortex CPU core, and paired that with the CMN-700 mesh. This is intended to give server operators/vendors further flexibility by providing an alternative CPU core to the N2, while still offering the modern I/O and memory features of Arm’s mesh. Underscoring this, the E2 system backplane is even compatible with the N2 backplane.
Neoverse Next: Poseidon, N-Next, and E-Next
Finally, Arm’s announcement this week provides a glimpse at the company’s future roadmap for all three Neoverse platforms, where, unsurprisingly, Arm is working on updated versions of each of the platforms.
Notably, all three platforms call for adding PCIe 6 support as well as CXL 3.0 support. This would come from the next iteration of Arm’s CMN mesh network, which as Arm already does today, is shared between all three platforms.
Meanwhile, it’s interesting to see the Poseidon name once again pop up in Arm’s roadmaps. Going back to Arm’s very first Neoverse roadmap, Poseidon was the name attached to Arm’s 5mn/2021 platform, a spot since taken by N2 and V1/V2 in various forms. With V2 not landing in hardware until 2023, Poseidon/V3 is still years off, but there’s likely some significance to Arm keeping the codename (such as new microarchitecture).
But first out of the gate will be the N-Next platform – the presumable Neoverse N3. With the Neoverse N platform a generation ahead of the rest (N2 was first announced in 2020), it’ll be the next platform due for a refresh. N3 is due to be available to partners in 2023, with Arm broadly touting generational performance and efficiency improvements.
39 Comments
View All Comments
mode_13h - Saturday, September 17, 2022 - link
A multi-$Trillion company is big enough to enter any market it really wants. Their revenues dwarf even Intel's. There's virtually nothing they couldn't get into, if they thought it were critical to their core business strategy.mode_13h - Saturday, September 17, 2022 - link
I think there's a compelling reason for them to care about cloud, and that's for easy portability between phone <-> laptop/desktop <-> cloud. It'd be a big win if they could support a "compile once, run anywhere" model, even if "anywhere" were still restricted to just Apple platforms.Also, cloud computing is a massive market and Apple needs to keep growing. At some point, they can't afford to ignore a market that big.
MintBoy - Sunday, September 18, 2022 - link
They will, once the computing horizon gets to the point where connectivity is fast enough end gets to the point where we pivot back to thin-clients, which is inevitable IMHO. There'll always be a niche for powerful end-user wielded hardware, but Apple won't turn down the opportunity to 'lease' computing power to the general public for a monthly recurring fee.PeachNCream - Thursday, September 15, 2022 - link
As others have already implied, ARM CPUs are competitive - thus the design has landed in a few noteworthy products. Not only do they compete in performance metrics, they do well in density and perf/watt metrics.Unlike Joe Average PC User such as you, for-profit companies tend not to build brand-loyalties that limit their options as they aim to maximize returns on their investments. If they do things like that, they risk getting left behind by more nimble, flexible competitors.
DougMcC - Thursday, September 15, 2022 - link
No one ever got fired for buying <strikethrough>IBM</strikethrough> <strikethrough>Dell</strikethrough> <strikethrough>Lenovo</strikethrough> Apple. Irrational brand loyalty among for-profit companies is very much a thing.mode_13h - Saturday, September 17, 2022 - link
Don't forget Microsoft!> Irrational brand loyalty among for-profit companies is very much a thing.
True, but it's powered by a different dynamic than consumers, as you imply. Among business customers, risk-aversion tends to be a powerful force.
Also, if you blaze a new path by going with a new market entrant, instead of the conventional option, it's a lot more work to justify your decision. So, another big factor is simply laziness.
ballsystemlord - Thursday, September 15, 2022 - link
In comparing them to apple, you first have to realize that (until the M1 was released?) all benchmarks were limited to what apple allowed you to install on their iphones.As in, if Nvidia could choose what benchmarks got run on AMD GPUs, would you trust the results?
The Hardcard - Thursday, September 15, 2022 - link
That is inaccurate. If you have a developer account, you can run whatever you want to put in the work of compiling. The SPEC suite that Andrei ran is not in the App Store.ballsystemlord - Thursday, September 15, 2022 - link
Actually, I did not know that.But then as was also pointed out in the reviews, and I should have mentioned it before, the code can (and sometimes will), take different paths depending on the device and it's capabilities. For example, the settings games use are programmed by the developers and adjusted on-the-fly in at least some cases, you can't set a baseline for benchmarking via the options menu.
ballsystemlord - Thursday, September 15, 2022 - link
But more importantly, although they loose massively in many things, they do have very competitive performance in others. (See ARM server CPU reviews on this site to see what I mean.)Now however mortifying their performance is in those benchmarks that they loose in, and badly, the big companies purchase thousands of units at a time. So, if they need to run one specific workload, ARM could be the core to buy for that workload.
Sort of like having an ASIC accelerator instead of a CPU.