Tonga’s Microarchitecture - What We’re Calling GCN 1.2

As we alluded to in our introduction, Tonga brings with it the next revision of AMD’s GCN architecture. This is the second such revision to the architecture, the last revision (GCN 1.1) being rolled out in March of 2013 with the launch of the Bonaire based Radeon HD 7790. In the case of Bonaire AMD chose to kept the details of GCN 1.1 close to them, only finally going in-depth for the launch of the high-end Hawaii GPU later in the year. The launch of GCN 1.2 on the other hand is going to see AMD meeting enthusiasts half-way: we aren’t getting Hawaii level details on the architectural changes, but we are getting an itemized list of the new features (or at least features AMD is willing to talk about) along with a short description of what each feature does. Consequently Tonga may be a lateral product from a performance standpoint, but it is going to be very important to AMD’s future.

But before we begin, we do want to quickly remind everyone that the GCN 1.2 name, like GCN 1.1 before it, is unofficial. AMD does not publicly name these microarchitectures outside of development, preferring to instead treat the entire Radeon 200 series as relatively homogenous and calling out feature differences where it makes sense. In lieu of an official name and based on the iterative nature of these enhancements, we’re going to use GCN 1.2 to summarize the feature set.


AMD's 2012 APU Feature Roadmap. AKA: A Brief Guide To GCN

To kick things off we’ll pull old this old chestnut one last time: AMD’s HSA feature roadmap from their 2012 financial analysts’ day. Given HSA’s tight dependence on GPUs, this roadmap has offered a useful high level overview of some of the features each successive generation of AMD GPU architectures will bring with it, and with the launch of the GCN 1.2 architecture we have finally reached what we believe is the last step in AMD’s roadmap: System Integration.

It’s no surprise then that one of the first things we find on AMD’s list of features for the GCN 1.2 instruction set is “improved compute task scheduling”. One of AMD’s major goals for their post-Kavari APU was to improve the performance of HSA by various forms of overhead reduction, including faster context switching (something GPUs have always been poor at) and even GPU pre-emption. All of this would fit under the umbrella of “improved compute task scheduling” in AMD’s roadmap, though to be clear with AMD meeting us half-way on the architecture side means that they aren’t getting this detailed this soon.

Meanwhile GCN 1.2’s other instruction set improvements are quite interesting. The description of 16-bit FP and Integer operations is actually very descriptive, and includes a very important keyword: low power. Briefly, PC GPUs have been centered around 32-bit mathematical operations for some number of years now since desktop technology and transistor density eliminated the need for 16-bit/24-bit partial precision operations. All things considered, 32-bit operations are preferred from a quality standpoint as they are accurate enough for many compute tasks and virtually all graphics tasks, which is why PC GPUs were limited to (or at least optimized for) partial precision operations for only a relatively short period of time.

However 16-bit operations are still alive and well on the SoC (mobile) side. SoC GPUs are in many ways a 5-10 year old echo of PC GPUs in features and performance, while in other ways they’re outright unique. In the case of SoC GPUs there are extreme sensitivities to power consumption in a way that PCs have never been so sensitive, so while SoC GPUs can use 32-bit operations, they will in some circumstances favor 16-bit operations for power efficiency purposes. Despite the accuracy limitations of a lower precision, if a developer knows they don’t need the greater accuracy then falling back to 16-bit means saving power and depending on the architecture also improving performance if multiple 16-bit operations can be scheduled alongside each other.


Imagination's PowerVR Series 6XT: An Example of An SoC GPU With FP16 Hardware

To that end, the fact that AMD is taking the time to focus on 16-bit operations within the GCN instruction set is an interesting one, but not an unexpected one. If AMD were to develop SoC-class processors and wanted to use their own GPUs, then natively supporting 16-bit operations would be a logical addition to the instruction set for such a product. The power savings would be helpful for getting GCN into the even smaller form factor, and with so many other GPUs supporting special 16-bit execution modes it would help to make GCN competitive with those other products.

Finally, data parallel instructions are the feature we have the least knowledge about. SIMDs can already be described as data parallel – it’s 1 instruction operating on multiple data elements in parallel – but obviously AMD intends to go past that. Our best guess would be that AMD has a manner and need to have 2 SIMD lanes operate on the same piece of data. Though why they would want to do this and what the benefits may be are not clear at this time.

AMD's Radeon R9 285 GCN 1.2: Geometry Performance & Color Compression
Comments Locked

86 Comments

View All Comments

  • Alexvrb - Tuesday, September 16, 2014 - link

    "if other GCN 1.1 parts like Hawaii are any indication, it's much more likely the 280 maintains its boost clocks compared to the 285 (due to low TDP limits)"

    This is what you said. This is where I disagreed with you. The 285 maintains boost just as well as the 280. Further, GCN 1.1 Bonaire and even Hawaii reach and hold boost at stock TDP. The 290 series were not cooled sufficiently using reference coolers, but without any changes to TDP settings (I repeat, stock TDP) they boost fine as long as you cool them. GCN 1.1 boosts fine, end of story.

    As far as Tonga goes, there's almost no progress in performance terms. In terms of power it depends on the OEM and I've seen good and bad. The only additions that really are interesting are the increased tessellation performance (though not terribly important at the moment) and finally getting TrueAudio into a mid-range part (it should be across the board by next gen I would hope - PS4 and XB1 have the same Tensilica DSPs).

    I would hope they do substantially better with their future releases, or at least release a competent reference design that shows off power efficiency better than some of these third party designs.
  • chizow - Wednesday, September 17, 2014 - link

    Yes, and my comment was correct, it will ALWAYS be "more likely" the 280 maintains its boost over other GCN 1.x parts because we know the track record of GCN 1.0 cards and their conservative Boost compared to post-PowerTune GCN1.x and later parts as a result of the black eye caused by Hawaii. There will always be a doubt due to AMD's less-than-honest approach to Boost with Hawaii, plain and simple.

    I also (correctly) qualified my statement by saying the low stated TDP of the 285 would be a hindrance to exceeding those rated specs and/or the performance of the 280, and we also see that is the case that in order to exceed those speed limits, AMD traded performance for efficiency to the point the 285's power consumption is actually closer to the 250W rated 280.

    In any case, in another day or two, this unremarkable part is going to become irrelevant with GM104 Maxwell, no need to further waste any thoughts on it.
  • etherlore - Thursday, September 11, 2014 - link

    Speculating here. The data parallel instructions could be a way to share data between SIMD lanes. I could see this functionality being similar in functionality to what threadgroup local store allows, but without explicit usage of the local store.

    It's possible this is an extension to, or makes new use of, the 32 LDS integer units in GCN. (section 2.3.2 in the souther islands instruction set docs)
  • vred - Thursday, September 11, 2014 - link

    And... DP rate at last. Sucks to have it at 1/16 but at least now it's confirmed. First review where I see this data published.
  • chizow - Thursday, September 11, 2014 - link

    It has to be artificially imposed, as AMD has already announced FirePro cards based on the Tonga ASIC that do not suffer from this castrated DP rate. AMD as usual taking a page from Nvidia's playbook, so now all the AMD fans poo-poo'ing Nvidia's sound business decisions can give AMD equal treatment. Somehow I doubt that will happen though!
  • Samus - Thursday, September 11, 2014 - link

    If this is AMD's Radeon refresh, if the 750Ti tells us anything, they are screwed when Maxwell hits the streets next month.
  • Atari2600 - Thursday, September 11, 2014 - link

    The one thing missed in all this - APUs.

    As we all know, APUs are bandwidth starved. A 30-40% increase in memory subsystem efficiency will do very nicely for removing a major bottleneck.

    That is before the move to stacked chips or eDRAM.
  • limitedaccess - Thursday, September 11, 2014 - link

    @Ryan

    Regarding the compression (delta color compression) changes for Tonga does this have any effect on the actual size of data stored in VRAM.

    For instance if you take a 2gb Pitcarin card and a 2gb Tonga card showing the identical scene in a game will they both have identical (monitored) VRAM usage? Assuming of course the scenario here is neither is actually hitting the 2gb VRAM limit.

    I'm wondering if it possible to test whether or not this is the case if unconfirmed.
  • Ryan Smith - Sunday, September 14, 2014 - link

    VRAM usage will differ. Anything color compressed will take up less space (at whatever ratio the color compression algorithm allows). Of course this doesn't account for caching and programs generally taking up as much VRAM as they can, so it doesn't necessarily follow that overall VRAM usage will be lower on Tonga than Pitcairn. But it is something that can at least be tested.
  • abundantcores - Thursday, September 11, 2014 - link

    I see Anand still don't understand the purpose of Mantle, if they did they wouldn't be using the most powerful CPU they could find, i would explain it to them but i think its already been explained to them a thousand times and they still don't grasp it.

    Anand are a joke, they have no understanding of anything.

Log in

Don't have an account? Sign up now