AMD Kaveri Docs Reference Quad-Channel Memory Interface, GDDR5 Optionby Anand Lal Shimpi on January 16, 2014 10:51 PM EST
Our own Ryan Smith pointed me at an excellent thread on Beyond3D where forum member yuri ran across a reference to additional memory controllers in AMD's recently released Kaveri APU. AMD's latest BIOS and Kernel Developer's Guide (BKDG 3.00) for Kaveri includes a reference to four DRAM controllers (DCT0 - 3) with only two in use (DCT0 and DCT3). The same guide also references a Gddr5Mode option for each DRAM controller.
Let me be very clear here: there's no chance that the recently launched Kaveri will be capable of GDDR5 or 4 x 64-bit memory operation (Socket-FM2+ pin-out alone would be an obvious limitation), but it's very possible that there were plans for one (or both) of those things in an AMD APU. Memory bandwidth can be a huge limit to scaling processor graphics performance, especially since the GPU has to share its limited bandwidth to main memory with a handful of CPU cores. Intel's workaround with Haswell was to pair it with 128MB of on-package eDRAM. AMD has typically shied away from more exotic solutions, leaving the launched Kaveri looking pretty normal on the memory bandwidth front.
In our Kaveri review, we asked the question whether or not any of you would be interested in a big Kaveri option with 12 - 20 CUs (768 - 1280 SPs) enabled, basically a high-end variant of the Xbox One or PS4 SoC. AMD would need a substantial increase in memory bandwidth to make such a thing feasible, but based on AMD's own docs it looks like that may not be too difficult to get.
There were rumors a while back of Kaveri using GDDR5 on a stick but it looks like nothing ever came of that. The options for a higher end Kaveri APU would have to be:
1) 256-bit wide DDR3 interface with standard DIMM slots, or
2) 256-bit wide GDDR5 interface with memory soldered down on the motherboard
I do wonder if AMD would consider the first option and tossing some high-speed memory on-die (similar to the Xbox One SoC).
All of this is an interesting academic exercise though, which brings me back to our original question from the Kaveri review. If you had the building blocks AMD has (Steamroller cores and GCN CUs) and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?
I know I'd be interested in a 2-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, assuming AMD could stick some high-bandwidth memory on-die as well. More or less a high-end version of the Xbox One SoC. Such a thing would interest me but I'm not sure if anyone would buy it. Leave your thoughts in the comments below, I'm sure some important folks will get to read them :)
Post Your CommentPlease log in or sign up to comment.
View All Comments
UtilityMax - Thursday, January 30, 2014 - linkI am not sure where you have seen the "paper" claiming 2-5 times the framerates vs traditional GPU. I heard the claim of up to 40% better performance with the Mantle driver. However, first, 40% better frame rate would simply allow to catch up with the discrete GPU that Kaveri's APU is based on, and two, we haven't seen Mantle enabled games to confirm this claim yet.
Fusion_GER - Wednesday, January 22, 2014 - linkI think AMD will test it internally for Workstations, the bandwith should mainly/only be needed at GPU-loads.
For the folks that are interested: AMD said: "Dual-Ranked DDR3 Modules are optimal on Kaveri in a 2-Slot-DDR3-Setup !" to COMPUTERBASE.DE they tested on their own benchtable with Single-Ranked first after the NDA cause they didn´t get a Rig from AMD to test. AMD itself shipped only Rigs with Dual-Ranked Memory.
please look at the Numbers.
translated with google: http://translate.google.de/translate?sl=de&tl=...
Fusion_GER - Wednesday, January 22, 2014 - linkAnd heterogenous Loads ofcourse
Tikcus9666 - Friday, January 24, 2014 - link4-module Steamroller core + 20 CUs with a 256-bit wide DDR3 interface, with high-bandwidth on die (or without if was capable of minimum 1080p gaming at med-high settings)
more importantly the prtice would have to be right
It would have to come in cheaper than an i5 build with 12 - 20 cus GPU
abufrejoval - Monday, January 27, 2014 - linkKaveri definitely needs a scale up solution to support proper 2k and 4k gaming without losing the benefit of HSA: Otherwise the HSA ecosystem simply won't develop.
For now that scale up solution might simply have to mean Kaveri modules, which include RAM, pretty much in a dGPU form factor. That could be GDDR5 or "ultra clocked and specially binned" DDR3 depending in the number of CUs on the chip. Beyond 10CUs there might be diminishing returns without GDDR5 and 16 CUs with a somewhat lower CPU clock might still fit into a 150Watts per "blade".
Now I'd just want to be able to put 1-4 of these blade on a back-plane or "motherboard" which is little more than a place to connect the SATA, USB and power cables to create something that scale to the resolution I need at that specific point in the house.
Aries1470 - Wednesday, January 29, 2014 - link
Ok, just hoping that someone has reached this far ;-) as for me, I reached up to page 7.. and might return back later.
Having a personal interest for many years in SFF boards (Small Form Factor), for embedded systems, yes you mobile / cell phone is a computer too... I would recommend the following options:
gDDR5 Controller on die, 256 bit wide, with 2 - 4 SKU's available from the M/B manufacturer, for on board memory of 4GB, 8GB or 16GB. Reason behind this is that a lot of the embedded systems are not even in general circulation to the average Joe, since they end up for industrial purposes, Data Centres or Rendering farms.
Also increase the 'power more' like to (under the old scheme) 4, 6 or 8 CPU's with a much better FPU (Floating Point Unit, since it is not as good as Intel's) and have the GPU(GCN) power as a minimum of 512, but preferably an R9 maybe Hawaii but only half with 1280/80/32 with 512 bus width or a Curaçao with a 256 Bit bus width, both with a somewhat lower Core Clock Rate, so it can be in a reasonable TDP. it can be slightly a larger size, since it will not be user replaceable as a part, as is common, and is soldered directly on board. The board would need to be either a 6 or 8 layer depending on what amount of memory and bus width is used and supporting circuitry.
As for the socketed version, my approach would be similar to a SoC, of which could have a R9 with 768:48:16 for example and since the memory will be on chip and to lower complexity, it could be done with 256 bit wide and lower end with 128 bit wide.
On Chip Ram to Have 1GB or 2GB directly on the chip of gDDR5, or more like 4GB or 8GB if going to be a combined memory for CPU & GPU access.
Alternatively, AMD should push a gDDR5 socket standard, so M/B manufacturers can add sockets on the M/B or have a 'lower end' version with DDR3/4 so more memory can be added by an end user.
Just my 2¢ worth.
Finally - Saturday, April 12, 2014 - linkIf you had the building blocks and the potential for a wider memory interface, would you try building a high-end APU for the desktop? If so, what would you build and why?
I'd love to see a 6 core Excavator APU with about the same amount of CUs like Kaveri (but GCN 2.0) and a quad-channel interface; I'd combine that with a best-bang-for-the-buck (upper) midrange dedicated card.
According to Golem.de, the memory bandwidth should jump from 34.1 to 68.2 GB/sec - just by enabling quad channel on DDR3 modules that are already available today.
I really wonder why AMD didn't enable it on Kaveri. Doesn't make much difference if I buy 2x8GB or 4x4GB of DDR3-2133...