Knowing your market is a key fundamental of product planning, marketing, and distribution. There’s no point creating a product with no market, or finding you have something amazing but offer it to the wrong sort of customers. When AMD started offering high-core count Threadripper processors, the one market that took as many as they could get was the graphics design business – visual effects companies and those focused on rendering loved the core count, the memory support, all the PCIe lanes, and the price. But if there’s one thing more performance brings, it’s the desire for even more performance. Enter Threadripper Pro.

computational graphics goes brrrrrrr

There are a number of industries that, when looking from the outside, an enthusiast might assume that using a CPU is probably old fashioned – the question is asked as to why hasn’t that industry moved fully to using GPU accelerators? One of the big ones is machine learning – despite the push to dedicated machine learning hardware and lots of big businesses doing ML on GPUs, most machine learning today is still done on CPUs. The same is still true with graphics and visual effects.

The reason behind this typically comes down to the software packages in use, and the programmers in charge.

Developing software for CPUs is easy, because that is what most people are trained on. Optimization packages for CPUs are well established, and even for upcoming specialist instructions, these can be developed in simulated environments. A CPU is designed to handle almost anything thrown at it, even super bad code.

By contrast, GPU compute is harder. It isn’t as difficult as it used to be, as there are wide arrays of libraries that enable GPU compilation without having to know too much about how to program for a GPU, however the difficulty lies in architecting the workload to take advantage of what a GPU has to offer. A GPU is a massive engine that performs the same operation to hundreds of parallel threads at the same time – it also has a very small cache and accesses to GPU memory are long, so that latency is hidden by having even more threads in flight at once. If the compute part of the software isn’t amenable to that sort of workload, such as being structurally more linear, then spending 6 months redeveloping for a GPU is a wasted effort. Or even if the math works out better on GPU, trying to rebuild a 20-year old codebase (or older) for GPUs still requires a substantial undertaking by a group of experts.

GPU compute is coming on leaps and bounds ever since I did it in the late 2000s. But the fact remains is that there are still a number of industries that are a mix of CPU/GPU throughput. These include machine learning, oil and gas, financial, medical, and the one we’re focusing on today is visual effects.

A visual effects design and rendering workload is a complex mix of dedicated software platforms and plugins. Software like Cinema4D, Blender, Maya, and others rely on the GPU to showcase a partially rendered scene for these artists to work on in real time, also relying on strong single core performance, but the bulk of compute for the final render will depend on what plugins are being used for that particular product. Some plugins are GPU accelerated, such as Blender Cycles, and the move to more GPU-accelerated workloads is taking its time – ray tracing accelerated design is an area that is getting a lot of GPU attention, for example.

There are always questions as to which method produces the best image – there’s no point using a GPU to accelerate the rendering time if it adds additional noise or reduces the quality. A film studio is more than likely to prioritize a slow higher-quality render on CPUs than a fast noisy one on GPUs, or alternatively, render a lower resolution image and then upscale with trained AI. Based on our conversations with OEMs that supply the industry, we've been told that a number of studios will outright say that rendering their workflow on a CPU is the only way they do it. The other angle is memory, as the right CPU can have 256 GB to 4 TB of DRAM available, whereas the best GPUs can only supply 80 GB (and those are the super expensive ones).

The point I’m making here is that VFX studios still prefer CPU compute, and the more the better. When AMD launched its new Zen-based processors, particularly the 32 and 64 core count models, these were immediately earmarked as potential replacements for the Xeons being used in these VFX studios. AMD’s parts prioritized FP compute, a key element in VFX design, and having double the cores per socket was also a winner, combined with the large amount of cache per core. This latter part meant that even though the first high-core count parts had a non-uniform memory architecture, it wasn’t as much of an issue as with some other compute processes.

A number of VFX companies as far as we understand focused on AMD’s Threadripper platform over the corresponding EPYC. When both of these parts first arrived to market, it was very easy for VFX studios to invest in under-the-desk workstations built on Threadripper, while EPYC was more for the server rack installations and not so much for workstations. Roll around to Threadripper 3000, and EPYC 7002, and now there are 64 cores, 64 PCIe 4.0 lanes, and lots of choice. VFX studios still went for Threadripper, mostly due to offering higher power 280 W in something that could easily be sourced by system integrators like Armari that specialize in high-compute under-desk systems.  They also asked AMD for more.

AMD has now rolled out its Threadripper Pro platform, addressing some of these requirements. While VFX is always core compute focused, the TR Pro now gives double the PCIe lanes, double the memory bandwidth, support for up to 2TB of memory, and Pro-level admin support. These PCIe lanes could be extended to local storage (always important in VFX) as well as large RAMDisks, and the admin support through DASH helps keep the company systems managed together appropriately. AMD’s Memory Guard is also in its Pro line of parts, which is designed to enable full memory encryption.

Beyond VFX, AMD has cited world leadership compute with TR Pro for product engineering with Creo, 3D visualization with KeyShot, model design in architecture with Autodesk Revit, and data science, such as oil and gas dataset analysis, where the datasets are growing into the hundreds of GB and require substantial compute support.

Threadripper Pro vs Workstation EPYC (WEPYC)

Looking at the benefits that these new processors provide, it’s clear to see that these are more Workstation-style EPYC parts than ‘enhanced’ Threadrippers. Here’s a breakdown:

AMD Zen 2 High-End Comparison
AnandTech Threadripper Threadripper
Pro
Enterprise
EPYC
Cores 32-64 12-64 8-64
1P Flagship TR 3990X TR Pro 3995WX EPYC 7702P
MSRP $3990 $5490 $4425
TDP 280 W 280 W 200 W
Base Freq 2900 MHz 2700 MHz 2000 MHz
Turbo Freq 4300 MHz 4200 MHz 3350 MHz
Socket sTRX40 sTRX4: WRX80 SP3
L3 Cache 256 MB 256 MB 256 MB
DRAM 4 x DDR4-3200 8 x DDR4-3200 8 x DDR4-3200
DRAM Capacity 256 GB 2 TB, ECC 4 TB, ECC
PCIe 4.0 x56 + chipset 4.0 x120 + chipset 4.0 x128
Pro Features No Yes Yes

To get these new parts starting from EPYC, all AMD had to do was raise the TDP to 280 W, and cut the DRAM support. If we start from a Threadripper base, there are 3-4 substantial changes. So why is this called Threadripper Pro, and not Workstation EPYC?

We come back to the VFX studios again. Having already bought in to the Threadripper branding and way of thinking, keeping these parts as Threadripper helps smooth that transition – this vertical had kind of already said they preferred Threadripper over EPYC, from what we are told, and so keeping the naming consistent means that there is no real re-education to do.

The other element is that the EPYC processor line is somewhat fractured: there are standard versions, high performance H models, high frequency F models, and then a series of custom designs under B, V, and others for specific customers. By keeping this new line as Threadripper Pro, it keeps it all under one umbrella.

Threadripper Pro Offerings: 12 core to 64 core

AMD announced these processors in the middle of last year, along with the Lenovo Thinkstation P620 as being the launch platform. From my experience, the Thinkstation line is very well designed, and we’re testing our 3995WX in a P620 today.

AMD Ryzen Threadripper Pro
AnandTech Cores Base
Freq
Turbo
Freq
Chiplets L3
Cache
TDP Price
SEP
3995WX 64 / 128 2700 4200 8 + 1 256 MB 280 W $5490
3975WX 32 / 64 3500 4200 4 + 1 128 MB 280 W $2750
3955WX 16 / 32 3900 4300 2 + 1 64 MB 280 W $1150
3945WX 12 / 24 4000 4300 2 + 1 64 MB 280 W *
*Unsure if this is a special OEM model

When TR Pro was announced with Lenovo, we weren’t sure if any other OEM would have access to Threadripper. When we asked OEMs earlier in that year about it, before we even knew if TR Pro was a real thing, they stated that AMD hadn’t even marked the platform on their roadmap, which we reported at the time. We have since learned that Lenovo had the 6-month exclusive, and information was only supplied to other vendors (ASUS, GIGABYTE, Supermicro) after it had been announced.

To that end, AMD has since announced that Threadripper Pro is coming to retail, both for other OEMs to design systems, or for end-users to build their own. Despite using the same LGA4094 socket as the other Threadripper and EPYC processors, TR Pro will be locked down to WRX80 motherboards. We currently know of three, such as the Supermicro and GIGABYTE models, plus we have also had the ASUS Pro WS WRX80E-SAGE SE Wi-Fi model in house for a short hands-on, although we weren’t able to test it.

Of the four processors listed above, the top three are going on sale. It’s worth noting that only the 64-core comes with 256 MB of L3 cache, while the 32-core comes with 128 MB of L3. AMD has kept that these chiplet designs only use as many chipsets as is absolutely necessary, keeping L3 cache per core consistent as well as the 8-cores per chiplet (the EPYC product line varies this a bit).

The fourth processor, the 12-core, would appear to be an OEM-only specific processor for prebuilt systems.

Threadripper Pro versus The World

These Threadripper Pro offerings are designed to compete against two segments: first is AMD themselves, showcasing anyone who is using a high-end professional system built on first generation Zen hardware that there is a lot of performance to be had. The second is against Intel workstation customers, either using single socket Xeon W (which tops out at 28 cores), or a dual socket Xeon system that costs more or uses a lot more power, just because it is dual socket, but also has a non-uniform memory architecture.

We have almost all these in this test (we don't have the 7702P, but we do have the 7742), and realistically these are the only processors that should be considered if the 3995WX is an option for you:

3995WX Comparison Offerings
AnandTech Core SEP 1P
2P
TDP Base
Freq
Peak
Freq
DDR PCIe DDR
Cap
TR Pro 3995WX 64C $5490 1P 280W 2700 4200 8x3200 128x 4.0 2 TB
TR 3990X 64C $3990 1P 280W 2900 4300 4x3200 64x 4.0 ¼ TB
EPYC 7702P 64C $4425 1P 200W 2000 3350 8x3200 128x 4.0 4 TB
EPYC 7742 64C $6950 2P 225W 2250 3400 8x3200 128x 4.0 4 TB
Xeon 6258R 28C $3950 2P 205W 2700 4000 6x2933 48x 3.0 1 TB
Xeon W-3175X 28C $2999 1P 255W 3100 4300 6x2933 48x 3.0 ½ TB

Intel tops out at 28 cores, and there is no getting around that. Technically Intel has the AP processor line that goes up to 56 cores, however these are for specialist systems and we haven’t had one physically sent to us for testing. Those are also $20k+ per CPU, and are two CPUs in the same system bolted under one package.

The AMD comparison points are the best Threadripper option and the best available EPYC Processor, albeit the 2P version. The best comparison here would be the 7702P, the single socket variant and much more price competitive, however we haven’t got this in for testing, instead we have AMD's EPYC 7742, which is the dual socket version but slightly higher performance.

Test Setup
AMD TR Pro TR Pro
3995WX
Lenovo
Thinkstation
P620
BIOS
S07K
T0EA
Lenovo Custom Kingston
8x16 GB
DDR4-3200 ECC
AMD TR TR 3990X MSI
Creator
TRX40
BIOS
1.50
Thermaltake
280mm AIO
Corsair
4x8 GB
DDR4-3200
AMD
EPYC
EPYC 7742 Supermicro H11DSI BIOS
2.1
Noctua
NH-U14S
TR4-SP3
SK Hynix
16x32 GB
DDR4-3200
ECC
Intel
Xeon
Xeon Gold 6258R ASUS ROG
Dominus
Extreme
BIOS 0601 Asetek
690LX-PN
SK Hynix
6x32 GB
DDR4-2933
ECC
Xeon W-3175X DDR4-2666
ECC
GPU Sapphire RX 460 2GB (CPU Tests)
PSU Various (inc. Corsair AX860i)
SSD Crucial MX500 2TB
Silverstone SST-FHP141-VF 173 CFM fans also used. Nice and loud.

We must thank the following companies for kindly providing hardware for our multiple test beds. Some of this hardware is not in this test bed specifically, but is used in other testing.

Hardware Providers for CPU and Motherboard Reviews
Sapphire
RX 460 Nitro
NVIDIA
RTX 2080 Ti
Crucial SSDs Corsair PSUs
Kingston
DDR4 RDIMM
ADATA DDR4 Silverstone
Coolers
Noctua
Coolers

Users interested in the details of our current CPU benchmark suite can refer to our #CPUOverload article which covers the topics of benchmark automation as well as what our suite runs and why. We also benchmark much more data than is shown in a typical review, all of which you can see in our benchmark database. We call it ‘Bench’, and there’s also a link on the top of the website in case you need it for processor comparison in the future.

Does 8-Channel Memory Matter?
Comments Locked

118 Comments

View All Comments

  • YB1064 - Tuesday, February 9, 2021 - link

    You are kidding, right? Intel has become the poor man's AMD in terms of performance.
  • kgardas - Wednesday, February 10, 2021 - link

    From general computing point of view yes, but from specific point no. Look at 3d particle movement! 3175x with less than half cores, at least $1k cheaper is able to provide more than 2x perf of the best AMD. So if you have something hand optimized for avx512, then old, outdated intel is still able to kicks amd ass and quite with a style.
  • Spunjji - Wednesday, February 10, 2021 - link

    @kgardas - Sure, but not many people can just throw their code at one of only a handful of programmers in the world with that level of knowledge and get optimised code back. That particle movement test isn't an industry-standard thing - it's Ian's personal project, hand-tuned by an ex-Intel engineer. Actual tests using AVX512 aren't quite so impressive because they only ever use it for a fraction of their code.
  • Fulljack - Thursday, February 11, 2021 - link

    not to mention that any processor that run in avx512 will have it's clockspeed tanked. unless your program maximize the use of avx512, the net progress will result slower application than using avx/2 or none at all.
  • sirky004 - Tuesday, February 9, 2021 - link

    what's you deal with AVX 512?
    Usual workload with that in mind is better to offload in GPU.
    There's a reason why Linus Torvald hate that "power virus"
  • kgardas - Wednesday, February 10, 2021 - link

    Usually if you write the code, it's more easier to add few avx512 intrinsic calls then to rewrite the software for GPU offload. But yes, GPU will be faster *if* the perf is not killed by PCIe latency. E.g. you need to interact with data on CPU and perform just few calcs on GPU so moving data cpu -> gpu -> cpu -> loop over, will kill perf.
  • kgardas - Wednesday, February 10, 2021 - link

    AFAIK, Linus hates that avx512 is not available everywhere in x86 world. But this will be the same case with upcoming AMX, so there is nothing intel may do about it. Not sure if AMD will need to pay some money for avx512/amx license or not...
  • Qasar - Wednesday, February 10, 2021 - link

    sorry kgardas but linus HATES avx512:
    https://www.extremetech.com/computing/312673-linus...
    https://www.phoronix.com/scan.php?page=news_item&a...
    "I hope AVX512 dies a painful death, and that Intel starts fixing real problems instead of trying to create magic instructions to then create benchmarks that they can look good on… "
    where you got that he likes it. and chances are, unless intel makes amx available with no issues, amx maybe the same niche as avx 512 is.
  • kgardas - Wednesday, February 10, 2021 - link

    Yes, I know he hates the stuff, but not sure about the right reason. In fact I think AVX512 is best AVX so far ever. I've read some of his rants and it was more about avx512 is not everywhere like avx2 etc. etc. Also Linus was very vocal about his departure from Intel workstation to AMD and since AMD does not provide avx512 yet it may well be just pure engineering laziness -- don't bother me with this stuff, it does not run here. :-)
  • Qasar - Wednesday, February 10, 2021 - link

    i dont think it has to be do with laziness, it has to do with the overall hit you get in performance when you use it, not to mention the power usage, and the die space it needs. from what i have seen, it still seems to be a niche, and over all not worth it. it looks like amd could add avx512 to zen at some point, but maybe, amd has decided it isnt worth it ?

Log in

Don't have an account? Sign up now