Amazon Unveils Graviton4: A 96-Core ARM CPU with 536.7 GBps Memory Bandwidth

by Anton Shilov on November 29, 2023 4:30 PM EST

35 Comments | Add A Comment

35 Comments

Nowadays many cloud service providers design their own silicon, but Amazon Web Services (AWS) started to do this ahead of its rivals and by now its Annapurna Labs subsidiary develop processors that can well compete with those from AMD and Intel. This week AWS introduced its Graviton4 SoC, a 96-core ARM-based chip that promises to challenge renowned CPU designers and offer unprecedented performance to AWS clients.

"By focusing our chip designs on real workloads that matter to customers, we are able to deliver the most advanced cloud infrastructure to them," said David Brown, vice president of Compute and Networking at AWS. "Graviton4 marks the fourth generation we have delivered in just five years, and is the most powerful and energy efficient chip we have ever built for a broad range of workloads."

The AWS Graviton4 processor packs 96 cores that offer on average 30% higher compute performance compared to Graviton3 and is 40% faster in database applications as well as 45% faster in Java applications, according to Amazon. Given that Amazon did not reveal many details about its Graviton4, it is hard to attribute performance increases to any particular characteristics of the CPU.

Yet, NextPlatform believes that the processor uses Arm Neoverse V2 cores, which are more capable than V1 cores used in previous-generation AWS processors when it comes to instruction per clock (IPC). Furthermore, the new CPU is expected to be fabricated using one of TSMC's N4 process technologies (4nm-class), which offers a higher clock-speed potential than TSMC's N5 nodes.

"AWS Graviton4 instances are the fastest EC2 instances we have ever tested, and they are delivering outstanding performance across our most competitive and latency sensitive workloads," said Roman Visintine, lead cloud engineer at Epic. "We look forward to using Graviton4 to improve player experience and expand what is possible within Fortnite.”

In addition, the new processor features a revamped memory subsystem with a 536.7 GB/s peak bandwidth, which is 75% higher compared to the previous-generation AWS CPU. Higher memory bandwidth improves performance of CPUs in memory intensive applications, such as databases.

Meanwhile, such a major memory bandwidth improvement indicates that the new processor employs a memory subsystem with a higher number of channels compared to Graviton3, though AWS has not formally confirmed this.

Graviton4 will be featured in memory-optimized Amazon EC2 R8g instances, which is particularly useful to boost performance in high-end databases and analytics. Furthermore, these R8g instances provide up to three times more vCPUs and memory than Graviton 3-based R7g instances, enabling higher throughput for data processing, better scalability, faster results, and reduced costs. To ensure security of AWS EC2 instances, Amazon equipped all high-speed physical hardware interfaces of Graviton4 CPUs.

Graviton4 R8g is currently in preview, these instances will be available widely in the coming months.

Sources: AWS, NextPlatform

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

35 Comments

View All Comments

mode_13h - Monday, December 4, 2023 - link
I've never heard of Neoverse V2 supporting SMT. You'd think ARM would've mentioned that, when they announced it.

> among the first ARM-based server CPUs to use SMT was/is actually
> the one from Huawei/HiSilicon.

There were other ARM cores with SMT. The Cortex-A65 was used in a couple self-driving SoCs and supports 2-way SMT.

In terms of server CPUs, the Neoverse E1 supposedly has it. Cavium/Marvell's ThunderX2 & ThunderX3 have 4-way SMT.
29a - Monday, December 4, 2023 - link
"I don't know why the writer is acting like the uarch and memory are some kind of mystery worthy of speculation."

Because he's terrible, AI could write much better articles.
Wadiest - Wednesday, November 29, 2023 - link
Neat, that's nearly 70% the memory bandwidth of a 2021 Mac Studio (M1 Ultra).

I know, it's a facetious comparison, but I find it curious either way you look at it - whether that server CPU makers just can't seem to beat Apple's barely-more-than-a-laptop-CPU, or that Apple so massively over-designed their M-series processors in this regard.
bubblyboo - Wednesday, November 29, 2023 - link
Because M# processor memory bandwidth is shared between the CPU and GPU, whereas here it's only the CPU bandwidth.
mode_13h - Thursday, November 30, 2023 - link
Yup. Comparing CPU + GPU vs. CPU-only. GPUs are notoriously bandwidth-hungry.
name99 - Friday, December 1, 2023 - link
Yes, and no.
Of course it is true that GPUs are bandwidth-hungry. But it's also true that data centers are notoriously underprovisioned with bandwidth. Dick Sites (one of the Google performance engineers) has frequently complained about this.

I suspect it costs money to provide bandwidth (not to mention designing your machine differently, eg without using standard motherboards and DIMMs), and that this is part of what you are getting when you pay for Apple (in spite of the loud voices who claim that there are no advantages to on-SoC RAM...)
mode_13h - Saturday, December 2, 2023 - link
The GPU comment was made to explain why Apple's client SoC has so much bandwidth. The reason why server CPUs can't easily do the same is due to their memory scalability requirements.

Of course, there are exceptions. Intel's Xeon Max has HBM, which can be used as a "cache", to avoid compromising on scalability. Nvidia's Grace uses on-package LPDDR5X, I guess with a memory scaling strategy of either adding more nodes. Maybe, in the future, they plan on pools of additional memory being accessed over CXL. Long-term, we seem to be headed for memory tiers, where one form or another of on-package memory comprises the fast tier.

Anyway, if you're curious how a server CPU would perform with 1 TB/s of bandwidth:

https://www.phoronix.com/review/xeon-max-ubuntu-23...
lemurbutton - Wednesday, November 29, 2023 - link
Why are you comparing a server CPU to a consumer SoC?
erinadreno - Wednesday, November 29, 2023 - link
M-series chips are using LPDDR memories, which are aimed at high bandwidth at the cost of latency. IFRC the latency of M1 is ~100ns and zen3 chips were capable of ~60ns latency. Considering you need to do 1 fetch, 1 decode before any of the bandwidth is usable for numerical calculation.

I'd say LPDDR is more like GDDR rather conventional DDR.
mode_13h - Thursday, November 30, 2023 - link
LPDDR is about low-power. You can hit the same bandwidth numbers using DDR5, but at considerably higher power.

Amazon Unveils Graviton4: A 96-Core ARM CPU with 536.7 GBps Memory Bandwidth

Post Your Comment

35 Comments

View All Comments

mode_13h - Monday, December 4, 2023 - link

29a - Monday, December 4, 2023 - link

Wadiest - Wednesday, November 29, 2023 - link

bubblyboo - Wednesday, November 29, 2023 - link

mode_13h - Thursday, November 30, 2023 - link

name99 - Friday, December 1, 2023 - link

mode_13h - Saturday, December 2, 2023 - link

lemurbutton - Wednesday, November 29, 2023 - link

erinadreno - Wednesday, November 29, 2023 - link

mode_13h - Thursday, November 30, 2023 - link

Log in

Don't have an account? Sign up now