Our own Ryan Smith pointed me at an excellent thread on Beyond3D where forum member yuri ran across a reference to additional memory controllers in AMD's recently released Kaveri APU. AMD's latest BIOS and Kernel Developer's Guide (BKDG 3.00) for Kaveri includes a reference to four DRAM controllers (DCT0 - 3) with only two in use (DCT0 and DCT3). The same guide also references a Gddr5Mode option for each DRAM controller.

Let me be very clear here: there's no chance that the recently launched Kaveri will be capable of GDDR5 or 4 x 64-bit memory operation (Socket-FM2+ pin-out alone would be an obvious limitation), but it's very possible that there were plans for one (or both) of those things in an AMD APU. Memory bandwidth can be a huge limit to scaling processor graphics performance, especially since the GPU has to share its limited bandwidth to main memory with a handful of CPU cores. Intel's workaround with Haswell was to pair it with 128MB of on-package eDRAM. AMD has typically shied away from more exotic solutions, leaving the launched Kaveri looking pretty normal on the memory bandwidth front.

In our Kaveri review, we asked the question whether or not any of you would be interested in a big Kaveri option with 12 - 20 CUs (768 - 1280 SPs) enabled, basically a high-end variant of the Xbox One or PS4 SoC. AMD would need a substantial increase in memory bandwidth to make such a thing feasible, but based on AMD's own docs it looks like that may not be too difficult to get.

There were rumors a while back of Kaveri using GDDR5 on a stick but it looks like nothing ever came of that. The options for a higher end Kaveri APU would have to be:

1) 256-bit wide DDR3 interface with standard DIMM slots, or
2) 256-bit wide GDDR5 interface with memory soldered down on the motherboard

I do wonder if AMD would consider the first option and tossing some high-speed memory on-die (similar to the Xbox One SoC).

All of this is an interesting academic exercise though, which brings me back to our original question from the Kaveri review. If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?

I know I'd be interested in a 2-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, assuming AMD could stick some high-bandwidth memory on-die as well. More or less a high-end version of the Xbox One SoC. Such a thing would interest me but I'm not sure if anyone would buy it. Leave your thoughts in the comments below, I'm sure some important folks will get to read them :)

Comments Locked


View All Comments

  • R3MF - Friday, January 17, 2014 - link

    " If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?"

    What I want from the Kaveri successor is two different products using a unified socket:

    1. FM3 with 3x64bit DDR4 3200 (192bit)
    2. 2x Excavator modules + 1024 shaders
    3. 4x Excavator modules + 512 shaders
  • moozoo - Friday, January 17, 2014 - link

    The current fp64 ratio of the GPU cores is 1:16 When you work out the total DP Gflops between the GPU and CPU it is around 105Gflops. An i7 is DP 224Gflops CPU only. i.e. there is no point programming a double precision app in HSA, its far easier to just program in AVX2 only. However if the fp64 ratio was 1/4 then the total is 243 Gflops which makes HSA a much more attactive option especially has later APU's will have more cores.
  • MrSpadge - Friday, January 17, 2014 - link

    That's what I'm thinking as well: server chips with DP = 1/4 SP and 4 memory channels. They want to do server APUs anyway. Then make this platform the new high end desktop and vary a bit: 4 modules + moderate GPU etc.
  • simonpschmitt - Friday, January 17, 2014 - link

    As something of an end user I believe the problem with AMD is this:

    You only need strong graphics when you want to play games.
    If you want to play games you buy a dedicated graphics card.
    If you have a dedicated graphics card you dont need stong on integrated graphics.

    So in my opinion AMD must either:

    Improve their non gaming CPU perfomance (perhaps even with GPGPU/HSA) so they can compete with Intels offerings on a performance, energy efficiency, price ... level to be attractive to non gaming customers.


    Inprove their integrated graphics to a level where even a mid range gaming enthusiast doesn't need a dedicated graphics card.

    As an example my own situation is that I do my gaming on a ultrabook with Geforce 730M graphics, and I don't want to build and maintain a dedicated gaming PC. However I do have a mini-ITX Server with six HDDs as a file dump/ personal cloud/ backup service/ HTPC running on an Intel Celeron.
    I'd love to drop a X Box One class APU in there and do something like Steam Big Picture for more higher level gaming.
  • simonpschmitt - Friday, January 17, 2014 - link

    I actually forgot the reference to the article: If they go the second route with improving integrated graphics to serious levels i bellieve memory bandwidth is (one of the) biggest issues.
    Adressing that with lots of DDR3 RAM/bandwidth (ie 4 channels) or using GDDR5 RAM seems to me more elegant than just drop an other cache level on the die witch is what Intel does.
    Ideally they would use GDDR5-Sticks but I dont bellieve, that it is good to itroduce an other Standard.
    How could DDR4 play into this? Would it be a viable alternative to GDDR5 in this scenario?
  • yarcod - Friday, January 17, 2014 - link

    I would definitely be interested in buying such a thing. Like many others have said: building your own Xbone or PS4 in a PC? Hell yeah! I would specifically use it for a Steam machine (although I am still not 100% fan of SteamOS) and also as a replacement for my NAS/Home backup system. Perfect combo.

    According to nordichardware (http://www.nordichardware.se/CPU/Styrkrets/amd-lan... - It is in swedish, ask Google if you want to read it) there are testing being done on implementing R9 280X graphics into an APU next year. This is a rumor, and should of course be treated as such. But if they manage to solve the memory problem by using 4 memory controllers then perhaps it is viable. I wonder how they are going to keep the thermals down. How would DDR4 stand in this? Would it be possible for them to use that instead of GDDR5 and still have about the same performance?
  • Galatian - Friday, January 17, 2014 - link

    I sure would be interested in it! As it stands to me Kaveri tries two things at once (trying to be a good desktop CPU and a good GPU) but fails at both. Anand stated in his launch article that Kaveri fits in a rather tiny niche, that is Steambox/HTPC gaming. I agree, but for that to really take off the GPU need to be a least as powerful as the XBone and PS4. Currently it isn't so I might as well just get either console. On the desktop CPU part Steamroller (the 3rd Bulldozer "Revision" mind you) still lacks power. If I build a new Office computer I'd choose Celeron/Pentium/Core i3. Their OpenCL performance is not as good as AMDs, but it's still there, which is something that get's lost often when looking at AMD marketing slides. They make it seem like they are the only ones with OpenCL capability. Not to mention Broadwell is almost right around the corner and we'll have to see how much Intel can improve the GPU part. Also there is Quicksync. I use it on all my Blu-Rays. Even a lowly Core i3 can use that and effectively boost Blu-Ray encoding compared to AMD APUs. Go to hardcore gamers and they are better served with a Core i5 or AMD FX CPU and given the prices of the new Kaveri chip it's actually advisable to do so.
    In my eyes there is simply no selling point of the AMD APU on a desktop. Mobile will probably be fantastic as the A8-7600 45W review showed, but from a desktop/HTPC perspective? Meh...I'll pass. More cores and GDDR5 or quadchannel would have sold me.
  • Da W - Friday, January 17, 2014 - link

    But i'd be more interested in a Jaguar Windows 8 gaming tablet that can stream games ala Shield.
  • gonchuki - Friday, January 17, 2014 - link

    Given the current building blocks I would like to see 3-4 Steamroller modules (6-8 cores) with 4 CUs (keeping the GPU for assymetric coprocessing with Mantle, as announced).

    Given the chance to improve Steamroller for a new stepping I would seriously review those caches (they have been getting worse ever since Brisbane) and memory controllers. AMD is now basically accessing the L2 cache at the same latency that Intel accesses main memory, it's ridiculous.

    Also, L3 cache seems to be sorely needed into the equation. AMD was pioneer on this, and they got a big win by implementing it. Now they are neglecting it and suffering the consequences. The current architecture is bandwidth and latency starved, if they fix this then every other improvement can show its true potential.
  • Kevin G - Friday, January 17, 2014 - link

    There is of course the chance that the Kaveri die supports both DDR3 and GDDR5. Of course the FM2+ implementation would only support dual channel DDR3 to maintain backwards compatibility. This does not preclude the possibility of a GDDR5 version that'd be soldered onto a motherboard alongside GDDR5 memory. Ditto for the chances of a quad channel version.

    One thing worth noting is that with GDDR5 there would be a memory capacity limitation. 16 GB of GDDR5 is the current maximum on a 256 bit wide bus using multiple and expensive high capacity GDDR5 memory chips (a similar DDR3 setup can reach 64 GB using unbuffered memory).

    The other thing about a GDDR5 option is that it is not suitable for ultra mobile devices. GDDR5 memory consumes more power than vanilla DDR3 and far more than LP-DDR3. This means that a GDDR5 version would only have a handful of niche roles: HTPC, HPC server clusters, and embedded applications that need lots of memory bandwidth.

    I would love to see a quad channel GDDR5 option though. Such a setup would provide 4 to 6 times are much memory bandwidth as the current DDR3 implementation. It'd effectively remove the memory bandwidth bottleneck.

Log in

Don't have an account? Sign up now