A little less than 2 years ago, we investigated the first Arm server SoC that had a chance to compete with midrange Xeon E5s: the Cavium ThunderX. The SoC showed promise, however the low single-threaded performance and some power management issues relegated the 48-core SoC to more niche markets such as CDN and Web caching. In the end, Cavium's first server SoC was not a real threat to Intel's Xeon.

But Cavium did not give up, and rightfully so: the server market is more attractive than ever. Intel's datacenter group is good for about 20 Billion USD (!) in revenue per year. And even better, profit margins are in 50% range. When you want to profits and cash flow, the server market far outpaces any other hardware market. So following the launch of the ThunderX, Cavium promised to bring out a second iteration: better power management, better single thread performance and even more cores (54).

The trick, of course, is actually getting to a point where you can take on the well-oiled machine that is Intel. Arm, Calxeda, Broadcom, AppliedMicro and many others have made many bold promises over the past 5 years that have never materialized, so there is a great deal of skepticism – and rightfully so – towards new Arm Server SoCs.

However, the new creation of underdog Cavium deserves the benefit of the doubt. Much has changed – much more than the name alone lets on – as Cavium has bought the "Vulcan" design from Avago. Vulcan is a rather ambitious CPU design which was originally designed by the Arm server SoC team of Broadcom, and as a result has a much different heritage than the original ThunderX. At the same time however, based on its experience from the ThunderX, Cavium was able to take what they've learned thus far and have introduced some microarchitectural improvements to the Vulcan design to improve its performance and power.

As a result, ThunderX2 is a much more "brainiac" core than the previous generation. While the ThunderX core had a very short pipeline and could hardly sustain 2 instructions per clock, the Vulcan core was designed to fetch 8 and execute up to 4 instructions per clock. It gets better: 4 simultaneous threads can be active (SMT4), ensuring that the wide back-end is busy most of the time. 32 of those cores at clockspeeds up to 2.5 GHz find a home in the new ThunderX2 SoC.

With up to 128 threads running and no less than eight DDR4 controllers, this CPU should be able to perform well in all server situations. In other words, while the ThunderX (1) was relegated to niche roles, the ThunderX2 is the first Arm server CPU that has a chance to break the server market open.

Sizing Things Up: Specifications Compared
POST A COMMENT

98 Comments

View All Comments

  • JohanAnandtech - Thursday, May 24, 2018 - link

    I have been trouble shooting a Java problem for the last 3 weeks now - for some reason our specific EPYC test system has some serious performance issues after we upgraded to kernel 4.13. This might be a hardware/firmware... issue. I don't know. I just know that the current tests are not accurate. Reply
  • npz - Thursday, May 24, 2018 - link

    Large Pages should be used whenever possible on Intel. You do waste some more memory, but it's worth it for most workloads as can be seen in your Intel Java benchmark. We've tested it for IO devices and enabling large pages in drivers to do DMA to shows a big difference for some high throughput devices. Reply
  • junky77 - Thursday, May 24, 2018 - link

    What? A 2.5GHZ ARM core is around 60-70% of a 3.8GHZ Skylake core?? For 3.8GHZ, the ARM is probably at least as fast? Reply
  • Wilco1 - Thursday, May 24, 2018 - link

    Probably around 90% since performance doesn't scale linearly with frequency. Note these are throughput parts so won't clock that high. However a 7nm version might well reach 3GHz. Reply
  • AJ_NEWMAN - Thursday, May 24, 2018 - link

    If Caviums tweaked 16nm hits 3GHz - it would to be unreasonable to aim for 4GHz for a 7nm part.

    With 2.3 times as many transistors available - it will be interesting to see what else they beef up?

    HIgher IPC? 64 cores? 16 memory controllers? CCIX - or perhaps they will compete with Fujitsu and add some Supercomputer centric hardware?

    AJ
    Reply
  • meta.x.gdb - Thursday, May 31, 2018 - link

    Wonder why the VASP code limped along on ThunderX2 while OpenFOAM saw such gains. I'm pretty familiar with both codes. VASP is mostly doing density functional theory, which is FFT-heavy... Reply
  • Meteor2 - Tuesday, June 26, 2018 - link

    All I want to say (all I can say) is that Anandtech has some of the best writers and commenters in this field. Fantastic article, and fantastic discussion. Reply
  • paldU - Saturday, July 7, 2018 - link

    A typo in Page 2. "it terms of performance per dollar" should be " in terms of performance per dollar". Reply

Log in

Don't have an account? Sign up now