In the past few years, we’ve seen NVIDIA shift their mobile strategy dramatically with time. With Tegra 2 and 3, we saw multiple design wins in the smartphone space, along with the tablet space. These SoCs often had dedicated GPUs that were quite unlike what we saw in NVIDIA’s desktop and laptop GPUs, with a reduced feature set and unique architecture. However, with Tegra K1 we saw a distinct shift in NVIDIA’s SoC strategy, as the Tegra K1 was the first mobile SoC to achieve parity in GPU architecture with desktop GPUs. In the case of the Tegra K1, this meant a single Kepler SMX which made for truly incredible GPU performance. However, in the time since we’ve seen companies like Apple release new SoCs such as the A8X, which managed to bring largely similar performance with less power.

NVIDIA of course is never content to stay idle for too long and let themselves be surpassed, which has led to the company developing their Tegra SoCs on a roughly yearly cadence. In NVIDIA’s development timeline, 2014 brought us the Tegra K1-32, the company’s first SoC to integrate a modern high-performance GPU architecture (Kepler), and later on the TK1-64, a K1 with NVIDIA’s own in-house developed ARMv8 Denver CPU in place of the 32bit ARM A15 CPU.


NVIDIA's GTC 2014 Tegra Roadmap

2014 also brought a revision to NVIDIA’s public Tegra roadmap: for 2015 NVIDIA would bump the previously announced Parker SoC and release a new in its place, Erista. This schedule change ended up being quite a surprise due to the fact that NVIDIA had up until this time released their roadmaps roughly 2 years out, whereas Erista was put on the map less than a year before it was to be announced. More unusual, NVIDIA offered up minimal details of Erista; it would be powered by a Maxwell GPU with no additional information on the CPU or manufacturing process. Parker on the other hand – NVIDIA’s planned Denver + Maxwell + 16nm FinFet part – fell off the schedule, with NVIDIA not officially canceling it but also saying little else about it.

Now in 2015 and with the launch of the Tegra X1, we can finally begin putting the picture together. Erista as it turns out is something of a rapid release product for NVIDIA; what had been plans to produce a 16nm FF part in 2015 became plans to produce a 20nm part, with Erista to be that part. To pull together Erista NVIDIA would go for a quick time-to-market approach in SoC design, pairing up a Maxwell GPU with ARM Cortex A57 & A53 GPUs, to be produced on TSMC’s 20nm SoC process.

CPU

We’ll kick off our look at Erista then with a look at Erista’s CPUs, where one of the bigger surprises in Erista is the CPU configuration. While Tegra K1 had a version with NVIDIA’s custom Denver core for the CPU, NVIDIA has elected to use ARM’s Cortex A57 and A53 in the Tegra X1. The A57 CPU cluster has 2MB of L2 cache shared across the four cores, with 48KB/32KB L1s (I$+D$) per core. The A53 cluster has 512KB of L2 cache shared by all four cores and 32KB/32KB L1s (I$+D$) per core. NVIDIA representatives stated that this was done for time to market reasons.

However, rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM’s CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications. It’s important to note that NVIDIA’s solution is cache coherent, so this system won't suffer from the power/performance penalties that one might expect given experience with previous SoCs that use cluster migration.

Although cluster migration is usually a detriment to power and performance (especially without a cache coherent interconnect), NVIDIA claims that Tegra X1 significantly outperforms Samsung System LSI’s Exynos 5433 in performance per watt with 1.4x more performance at the same amount of power or half the power for the same performance on the CPU. It's likely that this difference comes from optimizing the implemented logic and differences between Samsung and TSMC 20nm processes rather than differences in thread management. NVIDIA continues to use System EDP (Electrical Design Point) management to control throttling and turbo rather than ARM's IPA (Intelligent Power Allocation) drivers, and in general it seems that while the CPU architecture is ARM's reference design it's quite clear that everything else is distinctly NVIDIA in design.

All of this of course raises the question of where’s Denver, and what is to happen to it? The use of the ARM Cortex A57 and A53, as NVIDIA tells it, was based on a time-to-market decision, and that NVIDIA could bring an off-the-shelf Cortex-based SoC to the market sooner than they could another Denver SoC. On the surface there is at least some evidence to agree with this, as NVIDIA would be able to buy a 20nm design from ARM versus spending the time porting Denver over from TSMC’s 28nm process to their 20nm process. At the same time however integrating an ARM CPU core into an SoC is not an easy task – logic LEGO this is not – so it’s hard to say whether this actually saved NVIDIA much time.  

In any case, much like the still in the dark Parker design, NVIDIA is holding their cards close to their chest and hinting that Denver will still be in future generation products. If Parker is still part of NVIDIA’s plans – and they are without a doubt working on some kind of 16nm FF Tegra SoC – then we may still see Denver as part of the original Parker plan. Otherwise at this point while there’s no reason not to believe NVIDIA about the fate of future Denver SoCs, it’s anyone’s guess when we might see it again.

Uncore

Outside of the CPU and GPU, NVIDIA has also dramatically improved the rest of Tegra X1 in comparison with Tegra K1. We see a move from 64-bit wide LPDDR3 to 64-bit wide LPDDR4 on the memory interface, which improves peak memory bandwidth from 14.9 GB/s to 25.6 GB/s and improves power efficiency by around 40%. In addition, the maximum internal display resolution moves from 3200x2000 at 60 Hz to 3840x2160 at 60 Hz with support for VESA’s display stream compression. For external displays, we see a significant improvement with support for HDMI 2.0 and HDCP 2.2, which means that 4K60 is supported in contrast with the Tegra K1, which only supported 4K30.

The ISP is largely similar to the one we see in the Tegra K1 when it comes to feature set, but JPEG encode and decode rate is now five times as fast, going from 120 MP/s to 600 MP/s. For video encode and decode, we see support for 4K60 H.265 and VP9, with support for 10 bit color on decode on H.265. In addition to support for new codecs in the hardware blocks, we see that H.264 and VP8 now support 4K60, a notable improvement over the 4K30 that was the maximum for Tegra K1. We also see that the storage controller now supports eMMC 5.1 for faster storage performance.

Tegra X1's GPU: Maxwell for Mobile
POST A COMMENT

194 Comments

View All Comments

  • juicytuna - Monday, January 5, 2015 - link

    Well said. Apple's advantage is parallel development and time to market. Their GPU architecture is not that much *better* than their competitors. In fact I'd say that Nvidia has had a significant advantage when it comes to feature set and performance per watt on a given process node since the K1. Reply
  • GC2:CS - Monday, January 5, 2015 - link

    Maybe an adventage in feature set, but performance per watt ?

    So if you want to compare than For example xiaomi miPad, consumes around 7,9W, when running gfx bench battery life test and that is with performance throttled down to around 30,4 fps on screen a very similar tablet, the iPad mini with retina display and it's A7 processor (actually a 28nm part !) consumes just 4,3W and that is running at 22,9 fps for the whole time.

    So I am asking where is that "class leading" efficiency and "significant adventage when it comes to performace per watt" that nvidia is claiming to achieve, because I actually don't see anything like that.
    Reply
  • Yojimbo - Monday, January 5, 2015 - link

    Looking at the gfxbench website, under "long-term performance" I see 21.4 fps listed for the iPad Mini Retina and 30.4 fps listed for the Mi Pad, maybe this is what you are talking about. That is a roughly 40% advantage in performance for the Mi Pad. I can't find anything that says about throttling or the number of Watts being drawn during this test. What I do see is another category listed immediately below that says "battery lifetime" where the iPad Mini Retina is listed at 303 minutes and the Mi Pad is listed at 193 minutes. The iPad Mini Retina has a 23.8 watt-hour battery and the Mi Pad has a 24.7 watt-hour battery. So this seems to imply that the iPad Mini Retina is drawing about 4.7 watts and the Mi Pad is drawing about 7.7 watts, and it comes out to the Mi Pad using about a 63% more power. 40% more performance for 63% more power is a much closer race than the numbers you quoted (Yours come out to about a 33% increase in performance and an 84% increase in power consumption, which is very different.), and one must remember the circumstances of the comparison. Firstly, it is a comparison at different performance levels (this part is fair, since juicytuna claimed that NVIDIA has had a performance per watt advantage), secondly, it is a long-term performance comparison for a particularly testing methodology, and lastly and most importantly, it is a whole-system comparison, not just comparing the GPU power consumption or even the SOC power consumption. Reply
  • GC2:CS - Monday, January 5, 2015 - link

    Yeah exactly, when you got two similar platforms with different chips, I think it's safe to say that tegra pulls significally more than A7, because those ~3 additional wats (I don't know where you got your numbers, I know xiaomi got 25,46Wh, and that iPad lasts 330 minutes, A7 iPad's also push out T-rex at around 23 fps since iOS8 update) have to go somewhere. What I am trying to say that imagine how low powered the A7 is if the entire iPad mini at half brightness consumes 4,7W, how huge those 3W that more or less come from the SoC actually are.
    You will increase the power draw of the entire tablet by over a half, just to get 40% more performance out of your SoC. The tegra K1 in miPad has a 5W TDP, or more than entire iPad mini ! Yet it can't deliver performance that's competitive enough at that power.
    Like you are a 140 lb man, that can lift a 100 pounds, but you will train a lot untill you will put on 70 pounds of muscles (pump more power intro the soc) to weight 210 or more and you could still only lift like 140 pounds. What a dissapointment !

    What I see is a massive increase in power compustion, with not-so massive gains in performace, which is not typical to efficient architectures like nvidia is claiming Tegra k1 is.
    That's why I think nvidia just kind of failed to deliver on their promise of "revolution" in mobile graphics.
    Reply
  • Yojimbo - Monday, January 5, 2015 - link

    I got my benchmark and battery life numbers from the gfxbench.com website as I said in my reply. I got the iPad's battery capacity from the Apple website. I got the Mi Pad's battery capacity from a review page that I can't find again right now, but looking from other places it may have been wrong. WCCFtech lists 25.46 W-h like you did. I don't know where you got YOUR numbers. You cannot say they are "two similar platforms" and conclude that the comparison is a fair comparison of the underlying SOCs. Yes the screen resolutions are the same, but just imagine that Apple managed to squeeze an extra .5 watts from the display, memory, and all other parts of the system than the "foolish chinesse manufacteurs (sic)" were able to do. Adding this hypothetical .5 watts back would put the iPad Mini Retina at 5.2 watts, and the Mi Pad would then be operating at 40% more performance for 48% (or 52%, using the larger battery size you gave for the MiPad) more power usage . Since power usage does not scale linearly with performance this could potentially be considered an excellent trade-off.

    Your analogy, btw, is terrible. The Mi Pad does not have the same performance as does the bulked-up man in your analogy, it has a whole 40% more. Your use of inexact words to exaggerate is also annoying: "I see massive increases in power compustion, with not-so massive gains in performace"and "You increase the power draw by over half just to get 40% more performance". You increase the power by 60% to get 40% more performance. That has all the information. But the important point is that it is not an SOC-only measurement and so the numbers are very non-conclusive from an analytical standpoint.
    Reply
  • GC2:CS - Tuesday, January 6, 2015 - link

    What I see from those numbers is a fact that Tegra is nowhere near 50% more efficient than A7 like nvidia is claiming.

    When Gfx bench battery life test runs the display and the SoC are two major power drawers so I thought is reasonable to make other power using parts neglible.
    So the entire iPad mini pulls 4,9W (I don't know why I should add another 0,5 W if it doesn't pull that much) and miPad pulls 7,9W. Those are your numbers which actually favor nvidia a bit.

    To show you that there is no way around that fact I will lower the compustion of miPad by a W just to favor nvidia even more.

    Now when we got 4,9 and 6,9W for both tablets I will substract around 1,5W for the display power, which should be more or less the same for both tablets.

    So we got 3,4 and 5,4W of all things but the display power compustion, and most of this will be the SoC power. And we got that the tegra k1 uses more or less 50% more power than A7 for 40% more performance in a scenario that favors nvidia so much it's extremelly unfair.

    And even if we take this absurd scenario and scale back the power compustion of tegra K1 down quadratically: 1,5*(1,4)^(-2) we still get that even at A7 level of performance K1 will consume over 75% power of A7 for the same performance.
    That is an number that is way, way, way off in favor of nvidia and yet it still doesn't come close to "50% more efficient" claim that would require the K1 to consume just 2/3 the power for the same performance.

    So please tell me how can you assume that increasing the power draw of the ENTIRE tablet by 60%, just to get 40% more GPU performance out of your SoC, which is a SINGLE part, just a subset of total tablet power draw, can be interpreted as nvidia's SoC is more efficient. Because whatever I will spin that I am not seeing 3x performance and 50% more efficiency from K1 tablets compared to A7 tablets. I see that that K1 tablets throttle to nowhere near 3x faster than A7 iPads and they run down their battery significally faster. And if the same is true for the tegra X1, I don't know why anybody should be excited about those chips.
    Reply
  • Yojimbo - Tuesday, January 6, 2015 - link

    You don't think it's possible to save power in any other component of the system than the SOC? I think that's a convenient and silly claim. You can't operate under the assumption that the rest of the two very different systems draw the exact same amount of power and so all power difference comes from the SOC. Obviously if you want to compare SOC power draw you look at SOC power draw. Anything else is prone to great error. You can do lots of very exact and careful calculations and you will probably be completely inaccurate. Reply
  • juicytuna - Monday, January 5, 2015 - link

    That's comparing whole SOC power consumption. There's now doubt Cyclone is a much more efficient architecture than A15/A7. Do we know how much this test stresses the CPU? Can it run entirely on the A7s or is it lighting up all 4 A15s? Not enough data.

    Furthermore, the performance/watt curve on these chips is non linear so if the K1 was downclocked to match the performance of the iPad I've no doubt its results would look much more favourable. I suspect that is why they compare the X1 to the A8X at same FPS rather than at the same power consumption.
    Reply
  • Jumangi - Monday, January 5, 2015 - link

    No it should be done on the actual real world products people can buy. That's the only thing that should matter ever. Reply
  • Yojimbo - Monday, January 5, 2015 - link

    Not if one wants to compare architectures, no. There is no reason why in an alternate universe Apple doesn't use NVIDIA's GPU instand of IMG's. In this alternate universe, NVIDIA's K1 GPU would then benefit from Apple's advantages the same way the Series 6XT GPU benefits in the Apple 8X, and then the supposed point that GC2:CS is trying to make, that the K1 is inherently inferior, would, I think, not hold up. Reply

Log in

Don't have an account? Sign up now