In the past few years, we’ve seen NVIDIA shift their mobile strategy dramatically with time. With Tegra 2 and 3, we saw multiple design wins in the smartphone space, along with the tablet space. These SoCs often had dedicated GPUs that were quite unlike what we saw in NVIDIA’s desktop and laptop GPUs, with a reduced feature set and unique architecture. However, with Tegra K1 we saw a distinct shift in NVIDIA’s SoC strategy, as the Tegra K1 was the first mobile SoC to achieve parity in GPU architecture with desktop GPUs. In the case of the Tegra K1, this meant a single Kepler SMX which made for truly incredible GPU performance. However, in the time since we’ve seen companies like Apple release new SoCs such as the A8X, which managed to bring largely similar performance with less power.

NVIDIA of course is never content to stay idle for too long and let themselves be surpassed, which has led to the company developing their Tegra SoCs on a roughly yearly cadence. In NVIDIA’s development timeline, 2014 brought us the Tegra K1-32, the company’s first SoC to integrate a modern high-performance GPU architecture (Kepler), and later on the TK1-64, a K1 with NVIDIA’s own in-house developed ARMv8 Denver CPU in place of the 32bit ARM A15 CPU.


NVIDIA's GTC 2014 Tegra Roadmap

2014 also brought a revision to NVIDIA’s public Tegra roadmap: for 2015 NVIDIA would bump the previously announced Parker SoC and release a new in its place, Erista. This schedule change ended up being quite a surprise due to the fact that NVIDIA had up until this time released their roadmaps roughly 2 years out, whereas Erista was put on the map less than a year before it was to be announced. More unusual, NVIDIA offered up minimal details of Erista; it would be powered by a Maxwell GPU with no additional information on the CPU or manufacturing process. Parker on the other hand – NVIDIA’s planned Denver + Maxwell + 16nm FinFet part – fell off the schedule, with NVIDIA not officially canceling it but also saying little else about it.

Now in 2015 and with the launch of the Tegra X1, we can finally begin putting the picture together. Erista as it turns out is something of a rapid release product for NVIDIA; what had been plans to produce a 16nm FF part in 2015 became plans to produce a 20nm part, with Erista to be that part. To pull together Erista NVIDIA would go for a quick time-to-market approach in SoC design, pairing up a Maxwell GPU with ARM Cortex A57 & A53 GPUs, to be produced on TSMC’s 20nm SoC process.

CPU

We’ll kick off our look at Erista then with a look at Erista’s CPUs, where one of the bigger surprises in Erista is the CPU configuration. While Tegra K1 had a version with NVIDIA’s custom Denver core for the CPU, NVIDIA has elected to use ARM’s Cortex A57 and A53 in the Tegra X1. The A57 CPU cluster has 2MB of L2 cache shared across the four cores, with 48KB/32KB L1s (I$+D$) per core. The A53 cluster has 512KB of L2 cache shared by all four cores and 32KB/32KB L1s (I$+D$) per core. NVIDIA representatives stated that this was done for time to market reasons.

However, rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM’s CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications. It’s important to note that NVIDIA’s solution is cache coherent, so this system won't suffer from the power/performance penalties that one might expect given experience with previous SoCs that use cluster migration.

Although cluster migration is usually a detriment to power and performance (especially without a cache coherent interconnect), NVIDIA claims that Tegra X1 significantly outperforms Samsung System LSI’s Exynos 5433 in performance per watt with 1.4x more performance at the same amount of power or half the power for the same performance on the CPU. It's likely that this difference comes from optimizing the implemented logic and differences between Samsung and TSMC 20nm processes rather than differences in thread management. NVIDIA continues to use System EDP (Electrical Design Point) management to control throttling and turbo rather than ARM's IPA (Intelligent Power Allocation) drivers, and in general it seems that while the CPU architecture is ARM's reference design it's quite clear that everything else is distinctly NVIDIA in design.

All of this of course raises the question of where’s Denver, and what is to happen to it? The use of the ARM Cortex A57 and A53, as NVIDIA tells it, was based on a time-to-market decision, and that NVIDIA could bring an off-the-shelf Cortex-based SoC to the market sooner than they could another Denver SoC. On the surface there is at least some evidence to agree with this, as NVIDIA would be able to buy a 20nm design from ARM versus spending the time porting Denver over from TSMC’s 28nm process to their 20nm process. At the same time however integrating an ARM CPU core into an SoC is not an easy task – logic LEGO this is not – so it’s hard to say whether this actually saved NVIDIA much time.  

In any case, much like the still in the dark Parker design, NVIDIA is holding their cards close to their chest and hinting that Denver will still be in future generation products. If Parker is still part of NVIDIA’s plans – and they are without a doubt working on some kind of 16nm FF Tegra SoC – then we may still see Denver as part of the original Parker plan. Otherwise at this point while there’s no reason not to believe NVIDIA about the fate of future Denver SoCs, it’s anyone’s guess when we might see it again.

Uncore

Outside of the CPU and GPU, NVIDIA has also dramatically improved the rest of Tegra X1 in comparison with Tegra K1. We see a move from 64-bit wide LPDDR3 to 64-bit wide LPDDR4 on the memory interface, which improves peak memory bandwidth from 14.9 GB/s to 25.6 GB/s and improves power efficiency by around 40%. In addition, the maximum internal display resolution moves from 3200x2000 at 60 Hz to 3840x2160 at 60 Hz with support for VESA’s display stream compression. For external displays, we see a significant improvement with support for HDMI 2.0 and HDCP 2.2, which means that 4K60 is supported in contrast with the Tegra K1, which only supported 4K30.

The ISP is largely similar to the one we see in the Tegra K1 when it comes to feature set, but JPEG encode and decode rate is now five times as fast, going from 120 MP/s to 600 MP/s. For video encode and decode, we see support for 4K60 H.265 and VP9, with support for 10 bit color on decode on H.265. In addition to support for new codecs in the hardware blocks, we see that H.264 and VP8 now support 4K60, a notable improvement over the 4K30 that was the maximum for Tegra K1. We also see that the storage controller now supports eMMC 5.1 for faster storage performance.

Tegra X1's GPU: Maxwell for Mobile
POST A COMMENT

194 Comments

View All Comments

  • Jumangi - Monday, January 5, 2015 - link

    Apple would never use Nvidia at the power consumption levels it brings. The power is pointless to them if it can't be put into a smartphone level device. Nvidia still doesn't get why nobody in the OEM market wants their tech for a phone. Reply
  • Yojimbo - Monday, January 5, 2015 - link

    But the NVIDIA SOCs are on a less advanced process node, so how can you know that? You seem to be missing the whole point. The point is not what Apple wants or doesn't want. The point is to compare NVIDIA's GPU architecture to the PowerVR series 6XT GPU. You cannot directly compare the merits of the underlying architecture by comparing performance and power efficiency when the implementations are using different sized transistors. And the question is not the level of performance and power efficiency Apple was looking for for their A8. The question is simply peak performance per watt for each architecture. Reply
  • OreoCookie - Tuesday, January 6, 2015 - link

    @Yojimbo
    The Shield was released with the Cortex A15-based Tegra K1, not the Denver-based K1. The former is not competitive with regards to CPU performance, the latter plays in the same league. AFAIK the first Denver-based K1 product was the Nexus 9. Does anyone know of any tablets which use the Denver-based K1?
    Reply
  • lucam - Wednesday, January 7, 2015 - link

    Apple sell products that have an year life cycle, don't sell chips and therefore they don't need to do any marketing in advance as NV does punctually at any CES. Reply
  • TheJian - Monday, January 5, 2015 - link

    It's going finfet 16nm later this year (parker). As noted here it's NOT in this chip due to time to market and probably not as much gained by shrinking that to 20nm vs. going straight to 16nm finfet anyway. Even Qcom went off the shelf for S810 again for time to market.

    Not sure how you get that Denver is a disappointment. It just came out...LOL. It's a drop in replacement for anyone using K1 32bit (pin compatible), so I'm guessing we'll see many more devices pop up quicker than the first rev, but even then it will have a short life due to X1 and what is coming H2 with Denver yet again (or an improved version).

    What do you mean K1 is in ONE device? You're kidding right? Jeez, just go to amazon punch Nvidia K1 into the search. Acer, HP, NV shield, Lenovo, Jetson, Nexus9, Xiaomi (mipad not sold on amazon but you get the point)...The first 4 socs were just to get us to desktop gpu. The real competition is just starting.

    Building the cpu wasn't just for mobile either. You can now go after desktops/higher end notebooks etc with NO WINTEL crap in them and all the regular PC trimmings (high psu, huge fan/heatsink, hd's, ssd's etc etc, discrete gpu if desired, 16-32GB of ram etc). All of this timed perfectly with 64bit OS getting polished up for MUCH more complicated apps etc. The same thing that happened to low-end notebooks with chromebooks, will now happen with low end PC's at worst and surely more later as apps advance on android etc and Socs move further up the food chain in power and start running desktop models at 4ghz with fan/heatsinks (with a choice of discrete gpu when desired). With no Wintel Fee (copy of windows + Intel cpu pricing), they will be great for getting poor people into great gaming systems that do most of what they'd want otherwise (internet, email, docs, media consumption). I hope they move here ASAP, as AMD is no longer competition for Intel CPU wise.

    Bring on the ARM full PC like box! Denver was originally supposed to be x86 anyway LOL. Clearly they want in on Intel/AMD cpu territory and why not at CPU vs. SOC pricing? NV could sell an amped up SOC at 4ghz for $110/$150 vs. Intel's top end i5/i7's ($229/339). A very powerful machine for $200 less cash but roughly ~perf (when taking out the Windows fee also, probably save $200 roughly). Most people in this group won't miss the windows apps (many won't even know what windows is, grew up on a phone/tablet etc). Developing nations will love these as apps like Adobe Suite (fully featured) etc get moved making these cheap boxes powerful content creators and potent gamers (duh, NV gpu in them). If they catch on in places like USA also, Wintel has an even bigger headache and will need to drop pricing to compete with ARM and all it's ecosystem brings. Good times ahead in the next few years for consumers everywhere. This box could potentially run android, linux, steamos, chrome in a quadboot giving massive software options etc at a great price for the hardware. Software for 64bit on Arm will just keep growing yearly (games and adv apps).
    Reply
  • pSupaNova - Tuesday, January 6, 2015 - link

    Agree totally with your post, NVdia did try to put good mobile chips in netbooks with the ION & ION2 and Intel blocked them.

    Good to see that they have stuck at the job and now are in the position to starting eating Intels lunch.
    Reply
  • darkich - Monday, January 5, 2015 - link

    That's just not true.

    The K1 has shipped in three high end Android Tablets - Nvidia shield, Xiaomi MiPad, and Nexus 9.

    Now, how many tablets got a Snapdragon 805?
    Exynos 5433?

    Tegra K1 market performance is simply the result of the fact that high end tablet market is taken up by Apple, and that it doesn't compete in mod range and low end.
    Reply
  • darkich - Monday, January 5, 2015 - link

    *mid range Reply
  • GC2:CS - Monday, January 5, 2015 - link

    It's the result of too high power compustion, that OEM's prefer to keep low.

    That's why tegra K1 is used by just foolish chinesse manufacteurs (like tegra 4 in a phone) like xiaomi, google in a desperate need for a non Apple high end 64-bit chip (to showcase how much it's 64-bit) and nvidia themselves.
    Reply
  • Yojimbo - Monday, January 5, 2015 - link

    I think you're right that the K1 is geared more towards performance than other SOCs. The K1 does show good performance/watt, but it does so with higher performance, using more watts. And you're right that most OEMs have preferred a lower power usage. But it doesn't mean that the K1 is a poor SOC. NVIDIA is trying to work towards increasing the functionality of the platform by allowing it to be a gaming platform. That is their market strategy. It is probably partially their strategy because those are the tools they have available to them; that is their bread-and-butter. But presumably they also think mobile devices can really be made into a viable gaming platform. Thinking about it in the abstract it seems to be obvious... Mobile devices should at some point become gaming platforms. NVIDIA is trying to make this happen now. Reply

Log in

Don't have an account? Sign up now