Introducing PowerTune Technology With Boost

Since the 7970GE’s hardware is identical to the 7970, let’s jump straight into AMD’s software stack.

With the 7970GE AMD is introducing their own form of GPU Boost, which they are calling PowerTune Technology With Boost (PT Boost). PT Boost is a combination of BIOS and Catalyst driver changes that allow AMD to overdrive the GPU when conditions permit, and can be done without any hardware changes.

In practice PT Boost is very similar to NVIDIA’s GPU Boost. Both technologies are based around the concept of a base clock (or engine clock in AMD’s terminology) with a set voltage, and then one or more boost bins with an associated voltage that the GPU can move to as power/thermal conditions permit. In essence PT Boost allows the 7970GE to overvolt and overclock itself to a limited degree.

With that said there are some differences in implementation. First and foremost, AMD isn’t pushing the 7970GE nearly as far with PT Boost as NVIDIA has the GTX 680 with GPU Boost. The 7970GE’s boost clock is 1050MHz, a mere 50MHz better than the base clock, while the GTX 680 can boost upwards of 100MHz over its base clock. So long as both companies go down this path I expect we’ll see the boost clocks move higher and become more important with successive generations, just like what we’ve seen with Intel and their CPU turbo boost, but for the time being GPU turboing is going to be far shallower than what we’ve seen on the CPU.

At the same time however, while AMD isn’t pushing the 7970GE as hard as the GTX 680 they are being much more straightforward in what they guarantee – or as AMD likes to put it they’re being fully deterministic. Every 7970GE can hit 1050MHz and every 7970GE tops out at 1050MHz. This is as opposed to NVIDIA’s GPU Boost, where every card can hit at least the boost clock but there will be some variation in the top clock. No 7970GE will perform significantly better or worse than another on account of clockspeed, although chip-to-chip quality variation means that we should expect to see some trivial performance variation because of power consumption.

On that note it was interesting to see that because of their previous work with PowerTune AMD has far more granularity than NVIDIA when it comes to clockspeeds. GK104’s bins are 13MHz apart; we don’t have an accurate measure for AMD cards because there are so many bins between 1000MHz and 1050MHz that we can’t accurately count them. Nor for that matter does the 7970GE stick with any one bin for very long, as again thanks to PowerTune AMD can switch their clocks and voltages in a few milliseconds as opposed to the roughly 100ms it takes NVIDIA to do the same thing. To be frank in a desktop environment it’s not clear whether this is going to make a practical difference (we’re talking about moving less than 2% in the blink of an eye), but if this technology ever makes it to mobile a fast switching time would be essential to minimizing power consumption.

Such fast switching of course is a consequence of what AMD has done with their internal counters for PowerTune. As a reminder, for PowerTune AMD estimates their power consumption via internal counters that monitor GPU usage and calculate power consumption based on those factors, whereas NVIDIA simply monitors the power going into the GPU. The former is technically an estimation (albeit a precise one), while the latter is accurate but fairly slow, which is why AMD can switch clocks so much faster than NVIDIA can.

For the 7970GE AMD is further refining their PowerTune algorithms in order to account for PT Boost’s voltage changes and to further improve the accuracy of the algorithm. The big change here is that on top of their load based algorithm AMD is adding temperatures into the equation, via what they’re calling Digital Temperature Estimation (DTE). Like the existing PowerTune implementation, DTE is based on internal counters rather than an external sensor (i.e. a thermal diode), with AMD using their internal counters and knowledge about the cooling system to quickly estimate the GPU’s temperature similar to how they estimate power consumption, with a focus on estimating power in and heat out in order to calculate the temperature.

The end result of this is that by estimating the temperature AMD can now estimate the leakage of the chip (remember, leakage is a function of temperature), which allows them to more accurately estimate total power consumption. For previous products AMD has simply assumed the worst case scenario for leakage, which kept real power below AMD’s PowerTune limits but effectively overestimated power consumption. With DTE and the ability to calculate leakage AMD now has a better power estimate and can push their clocks just a bit higher as they can now tap into the headroom that their earlier overestimation left. This alone allows AMD to increase their PT Boost performance by 3-4%, relative to what it would be without DTE.

AMD actually has a longer explanation on how DTE works, and rather than describing it in detail we’ll simply reprint it.

DTE works as a deterministic model of temperature in a worst case environment, as to give us a better estimate of how much current the ASIC is leaking at any point in time. As a first order approximation, ASIC power is roughly a function of: dynamic_power(voltage, frequency) + static_power(temperature, voltage, leakage).

Traditional PowerTune implementations assume that the ASIC is running at a worst case junction temperature, and as such always overestimates the power contribution of leaked current. In reality, even at a worst case ambient temp (45C inlet to the fansink), the GPU will not be working at a worst case junction temperature. By using an estimation engine to better calculate the junction temp, we can reduce this overestimation in a deterministic manner, and hence allow the PowerTune architecture to deliver more of the budget towards dynamic power (i.e. frequency) which results in higher performance. As an end result, DTE is responsible for about 3-4% performance uplift vs the HD7970 GHz Edition with DTE disabled.

The DTE mechanism itself is an iterative differential model which works in the following manner. Starting from a set of initial conditions, the DTE algorithm calculates dTemp_ti/dt based on the inferred power consumption over a previous timeslice (is a function of voltage, workload/capacitance, freq, temp, leakage, etc), and the thermal capacitance of the fansink (function of fansink and T_delta). Simply put, we estimate the heat into the chip and the heat out of the chip at any given moment. Based on this differential relation, it’s easy to work back from your initial conditions and estimate Temp_ti, which is the temperature at any timeslice. A lot of work goes into tuning the parameters around thermal capacitance and power flux, but in the end, you have an algorithmic way to always provide benefit over the previous worst-case assumption, but also to guarantee that it will be representative of the entire population of parts in the market.

We could have easily done this through diode measurements, and used real temperature instead of digital temperature estimates…. But that would not be deterministic. Our current method with DTE guarantees that two parts off the shelf will perform the same way, and we enable end users to take advantage of their extra OC headroom on their parts through tools like Overdrive.

By tapping into this headroom however AMD has also increased their real power consumption at lower temperatures and leakages, which is why despite the identical PowerTune limits the 7970GE will consume more power than the 7970. We’ll get into the numbers in our benchmarking section, but it’s something to keep in mind for the time being.

Finally, on the end-user monitoring front we have some good news and some bad news. The bad news is that for the time being it’s not going to be possible to accurately monitor the real clockspeed of the 7970GE, either through AMD’s control panel or through 3rd party utilities such as GPU-Z. As it stands AMD is only exposing the base P-states but not the intermediate P-states, which goes back to the launch of the 7970 and is why we have never been able to tell if PowerTune throttling is active (unlike the 6900 series). So for the time being we have no idea what the actual (or even average) clockspeed of the 7970GE is. All we can see is whether it’s at its boost P-state – displayed as a 1050MHz core clock – or whether it’s been reduced to its P-state for its base clock, at which point the 7970GE will report 1000MHz.

The good news is that internally of course AMD can take much finer readings (something they made sure to show us at AFDS) and that they’ll finally be exposing these finer readings to 3rd party applications. Unfortunately they haven’t given us an expected date, but soon enough their API will be able to report the real clockspeed of the GPU, allowing users to see the full effects of both PT Boost and PT Throttle. It’s a long overdue change and we’re glad to see AMD is going to finally expose this data.

AMD Radeon HD 7970 GHz Edition Review Dueling Drivers, A PR Do-Over, & The Test
Comments Locked

110 Comments

View All Comments

  • piroroadkill - Friday, June 22, 2012 - link

    While the noise is bad - the manufacturers are going to spew out non-reference, quiet designs in moments, so I don't think it's an issue.
  • silverblue - Friday, June 22, 2012 - link

    Toms added a custom cooler (Gelid Icy Vision-A) to theirs which reduced noise and heat noticably (about 6 degrees C and 7-8 dB). Still, it would be cheaper to get the vanilla 7970, add the same cooling solution, and clock to the same levels; that way, you'd end up with a GHz Edition clocked card which is cooler and quieter for about the same price as the real thing, albeit lacking the new boost feature.
  • ZoZo - Friday, June 22, 2012 - link

    Would it be possible to drop the 1920x1200 definition for test? 16/10 is dead, 1080p has been the standard for high definition on PC monitors for at least 4 years now, it's more than time to catch up with reality... Sorry for the rant, I'm probably nitpicking anyway...
  • Reikon - Friday, June 22, 2012 - link

    Uh, no. 16:10 at 1920x1200 is still the standard for high quality IPS 24" monitors, which is a fairly typical choice for enthusiasts.
  • paraffin - Saturday, June 23, 2012 - link

    I haven't been seeing many 16:10 monitors around thesedays, besides, since AT even tests iGPU performance at ANYTHING BUT 1080p your "enthusiast choice" argument is invalid. 16:10 is simply a l33t factor in a market dominated by 16:9. I'll take my cheap 27" 1080p TN's spaciousness and HD content nativiness over your pricy 24" 1200p IPS' "quality" anyday.
  • CeriseCogburn - Saturday, June 23, 2012 - link

    I went over this already with the amd fanboys.
    For literally YEARS they have had harpy fits on five and ten dollar card pricing differences, declaring amd the price perf queen.

    Then I pointed out nVidia wins in 1920x1080 by 17+% and only by 10+% in 1920x1200 - so all of a sudden they ALL had 1920x1200 monitors, they were not rare, and they have hundreds of extra dollars of cash to blow on it, and have done so, at no extra cost to themselves and everyone else (who also has those), who of course also chooses such monitors because they all love them the mostest...

    Then I gave them egg counts, might as well call it 100 to 1 on availability if we are to keep to their own hyperactive price perf harpying, and the lowest available higher rez was $50 more, which COST NOTHING because it helps amd, of course....

    I pointed out Anand pointed out in the then prior article it's an ~11% pixel difference, so they were told to calculate the frame rate difference... (that keeps amd up there in scores and winning a few they wouldn't otherwise).

    Dude, MKultra, Svengali, Jim Wand, and mass media, could not, combined, do a better job brainwashing the amd fan boy.

    Here's the link, since I know a thousand red-winged harpies are ready to descend en masse and caw loudly in protest...

    http://translate.google.pl/translate?hl=pl&sl=...

    1920x1080: " GeForce GTX680 is on average 17.61% more efficient than the Radeon 7970.
    Here, the performance difference in favor of the GTX680 are even greater"

    So they ALL have a 1920x1200, and they are easily available, the most common, cheap, and they look great, and most of them have like 2 or 3 of those, and it was no expense, or if it was, they are happy to pay it for the red harpy from hades card.
  • silverblue - Monday, June 25, 2012 - link

    Your comparison article is more than a bit flawed. The PCLab results, in particular, have been massively updated since that article. Looks like they've edited the original article, which is a bit odd. Still, AMD goes from losing badly in a few cases to not losing so badly after all, as the results on this article go to show. They don't displace the 680 as the best gaming card of the moment, but it certainly narrows the gap (even if the GHz Edition didn't exist).

    Also, without a clear idea of specs and settings, how can you just grab results for a given resolution from four or five different sites for each card, add them up and proclaim a winner? I could run a comparison between a 680 and 7970 in a given title with the former using FXAA and the latter using 8xMSAA, doesn't mean it's a good comparison. I could run Crysis 2 without any AA and AF at all at a given resolution on one card and then put every bell and whistle on for the other - without the playing field being even, it's simply invalid. Take each review at its own merits because at least then you can be sure of the test environment.

    As for 1200p monitors... sure, they're more expensive, but it doesn't mean people don't have them. You're just bitter because you got the wrong end of the stick by saying nobody owned 1200p monitors then got slapped down by a bunch of 1200p monitor owners. Regardless, if you're upset that NVIDIA suddenly loses performance as you ramp up the vertical resolution, how is that AMD's fault? Did it also occur to you that people with money to blow on $500 graphics cards might actually own good monitors as well? I bet there are some people here with 680s who are rocking on 1200p monitors - are you going to rag (or shall I say "rage"?) on them, too?

    If you play on a 1080p panel then that's your prerogative, but considering the power of the 670/680/7970, I'd consider that a waste.
  • FMinus - Friday, June 22, 2012 - link

    Simply put; No!

    1080p is the second worst thing that happened to the computer market in the recent years. The first worst thing being phasing out 4:3 monitors.
  • Tegeril - Friday, June 22, 2012 - link

    Yeah seriously, keep your 16:9, bad color reproduction away from these benchmarks.
  • kyuu - Friday, June 22, 2012 - link

    16:10 snobs are seriously getting out-of-touch when they start claiming that their aspect ratio gives better color reproduction. There are plenty of high-quality 1080p IPS monitors on the market -- I'm using one.

    That being said, it's not really important whether it's benchmarked at x1080 or x1200. There is a neglible difference in the number of pixels being drawn (one of the reasons I roll my eyes at 16:10 snobs). If you're using a 1080p monitor, just add anywhere from 0.5 to 2 FPS to the average FPS results from x1200.

    Disclaimer: I have nothing *against* 16:10. All other things being equal, I'd choose 16:10 over 16:9. However, with 16:9 monitors being so much cheaper, I can't justify paying a huge premium for a measily 120 lines of vertical resolution. If you're willing to pay for it, great, but kindly don't pretend that doing so somehow makes you superior.

Log in

Don't have an account? Sign up now