In something of a surprise move, NVIDIA took to the stage today at GTC to announce a new roadmap for their GPU families. With today’s announcement comes news of a significant restructuring of the roadmap that will see GPUs and features moved around, and a new GPU architecture, Pascal, introduced in the middle.

We’ll get to Pascal in a second, but to put it into context let’s first discuss NVIDIA’s restructuring. At GTC 2013 NVIDIA announced their future Volta architecture. Volta, which had no scheduled date at the time, would be the GPU after Maxwell. Volta’s marquee feature would be on-package DRAM, utilizing Through Silicon Vias (TSVs) to die stack memory and place it on the same package as the GPU. Meanwhile in that roadmap NVIDIA also gave Maxwell a date and a marquee feature: 2014, and Unified Virtual Memory.


NVIDIA's Old Volta Roadmap


NVIDIA's New Pascal Roadmap

As of today that roadmap has more or less been thrown out. No products have been removed, but what Maxwell is and what Volta is have changed, as has the pacing. Maxwell for its part has “lost” its unified virtual memory feature. This feature is now slated for the chip after Maxwell, and in the meantime the closest Maxwell will get is the software based unified memory feature being rolled out in CUDA 6. Furthermore NVIDIA has not offered any further details on second generation Maxwell (the higher performing Maxwell chips) and how those might be integrated into professional products.

As far as NVIDIA is concerned, Maxwell’s marquee feature is now DirectX 12 support (though even the extent of this isn’t perfectly clear), and that with the shipment of the GeForce GTX 750 series, Maxwell is now shipping in 2014 as scheduled. We’re still expecting second generation Maxwell products, but at this juncture it does not look like we should be expecting any additional functionality beyond what Big Kepler + 1st Gen Maxwell can achieve.

Meanwhile Volta has been pushed back and stripped of its marquee feature. It’s on-package DRAM has been promoted to the GPU before Volta, and while Volta still exists, publicly it is a blank slate. We do not know anything else about Volta beyond the fact that it will come after the 2016 GPU.

Which brings us to Pascal, the 2016 GPU. Pascal is NVIDIA’s latest GPU architecture and is being introduced in between Maxwell and Volta. In the process it has absorbed old Maxwell’s unified virtual memory support and old Volta’s on-package DRAM, integrating those feature additions into a single new product.

With today’s announcement comes a small degree of additional detail on NVIDIA’s on-package memory plans. The bulk of what we wrote for Volta last year remains true: NVIDIA uses on-package stacked DRAM, allowed by the use of TSVs. What’s new is that NVIDIA has confirmed they will be using JEDEC’s High Bandwidth Memory (HBM) standard, and the test vehicle Pascal card we have seen uses entirely on-package memory, so there isn’t a split memory design. Though we’d also point out that unlike the old Volta announcement, NVIDIA isn’t listing any solid bandwidth goals like the 1TB/sec number we had last time. From what NVIDIA has said, this likely comes down to a cost issue: how much memory bandwidth are customers willing to pay for, given the cutting edge nature of this technology?

Meanwhile NVIDIA hasn’t said anything else directly about the unified memory plans that Pascal has inherited from old Maxwell. However after we get to the final pillar of Pascal, how that will fit in should make more sense.

Coming to the final pillar then, we have a brand new feature being introduced for Pascal: NVLink. NVLink, in a nutshell, is NVIDIA’s effort to supplant PCI-Express with a faster interconnect bus. From the perspective of NVIDIA, who is looking at what it would take to allow compute workloads to better scale across multiple GPUs, the 16GB/sec made available by PCI-Express 3.0 is hardly adequate. Especially when compared to the 250GB/sec+ of memory bandwidth available within a single card. PCIe 4.0 in turn will eventually bring higher bandwidth yet, but this still is not enough. As such NVIDIA is pursuing their own bus to achieve the kind of bandwidth they desire.

The end result is a bus that looks a whole heck of a lot like PCIe, and is even programmed like PCIe, but operates with tighter requirements and a true point-to-point design. NVLink uses differential signaling (like PCIe), with the smallest unit of connectivity being a “block.” A block contains 8 lanes, each rated for 20Gbps, for a combined bandwidth of 20GB/sec. In terms of transfers per second this puts NVLink at roughly 20 gigatransfers/second, as compared to an already staggering 8GT/sec for PCIe 3.0, indicating at just how high a frequency this bus is planned to run at.

Multiple blocks in turn can be teamed together to provide additional bandwidth between two devices, or those blocks can be used to connect to additional devices, with the number of bricks depending on the SKU. The actual bus is purely point-to-point – no root complex has been discussed – so we’d be looking at processors directly wired to each other instead of going through a discrete PCIe switch or the root complex built into a CPU. This makes NVLink very similar to AMD’s Hypertransport, or Intel’s Quick Path Interconnect (QPI). This includes the NUMA aspects of not necessarily having every processor connected to every other processor.

But the rabbit hole goes deeper. To pull off the kind of transfer rates NVIDIA wants to accomplish, the traditional PCI/PCIe style edge connector is no good; if nothing else the lengths that can be supported by such a fast bus are too short. So NVLink will be ditching the slot in favor of what NVIDIA is labeling a mezzanine connector, the type of connector typically used to sandwich multiple PCBs together (think GTX 295). We haven’t seen the connector yet, but it goes without saying that this requires a major change in motherboard designs for the boards that will support NVLink. The upside of this however is that with this change and the use of a true point-to-point bus, what NVIDIA is proposing is for all practical purposes a socketed GPU, just with the memory and power delivery circuitry on the GPU instead of on the motherboard.

NVIDIA’s Pascal test vehicle is one such example of what a card would look like. We cannot see the connector itself, but the basic idea is that it will lay down on a motherboard parallel to the board (instead of perpendicular like PCIe slots), with each Pascal card connected to the board through the NVLink mezzanine connector. Besides reducing trace lengths, this has the added benefit of allowing such GPUs to be cooled with CPU-style cooling methods (we’re talking about servers here, not desktops) in a space efficient manner. How many NVLink mezzanine connectors available would of course depend on how many the motherboard design calls for, which in turn will depend on how much space is available.


Molex's NeoScale: An example of a modern, high bandwidth mezzanine connector

One final benefit NVIDIA is touting is that the new connector and bus will improve both energy efficiency and energy delivery. When it comes to energy efficiency NVIDIA is telling us that per byte, NVLink will be more efficient than PCIe – this being a legitimate concern when scaling up to many GPUs. At the same time the connector will be designed to provide far more than the 75W PCIe is spec’d for today, allowing the GPU to be directly powered via the connector, as opposed to requiring external PCIe power cables that clutter up designs.

With all of that said, while NVIDIA has grand plans for NVLink, it’s also clear that PCIe isn’t going to be completely replaced anytime soon on a large scale. NVIDIA will still support PCIe – in fact the blocks can talk PCIe or NVLink – and even in NVLink setups there are certain command and control communiques that must be sent through PCIe rather than NVLink. In other words, PCIe will still be supported across NVIDIA's product lines, with NVLink existing as a high performance alternative for the appropriate product lines. The best case scenario for NVLink right now is that it takes hold in servers, while workstations and consumers would continue to use PCIe as they do today.

Meanwhile, though NVLink won’t even be shipping until Pascal in 2016, NVIDIA already has some future plans in store for the technology. Along with a GPU-to-GPU link, NVIDIA’s plans include a more ambitious CPU-to-GPU link, in large part to achieve the same data transfer and synchronization goals as with inter-GPU communication. As part of the OpenPOWER consortium, NVLink is being made available to POWER CPU designs, though no specific CPU has been announced. Meanwhile the door is also left open for NVIDIA to build an ARM CPU implementing NVLink (Denver perhaps?) but again, no such product is being announced today. If it did come to fruition though, then it would be similar in concept to AMD’s abandoned “Torrenza” plans to utilize HyperTransport to connect CPUs with other processors (e.g. GPUs).

Finally, NVIDIA has already worked out some feature goals for what they want to do with NVLink 2.0, which would come on the GPU after Pascal (which by NV’s other statements should be Volta). NVLink 2.0 would introduce cache coherency to the interface and processors on it, which would allow for further performance improvements and the ability to more readily execute programs in a heterogeneous manner, as cache coherency is a precursor to tightly shared memory.

Wrapping things up, with an attached date for Pascal and numerous features now billed for that product, NVIDIA looks to have to set the wheels in motion for developing the GPU they’d like to have in 2016. The roadmap alteration we’ve seen today is unexpected to say the least, but Pascal is on much more solid footing than old Volta was in 2013. In the meantime we’re still waiting to see what Maxwell will bring NVIDIA’s professional products, and it looks like we’ll be waiting a bit longer to get the answer to that question.

Comments Locked

68 Comments

View All Comments

  • BillyONeal - Wednesday, March 26, 2014 - link

    @dk: This is probably designed for more "custom" designs like GRID or HTPC applications. Most consumer applications don't have many many GPUs installed in a single system. (If you look at the block diagram above the NVLINK interconnects are cross GPU in the 2016 timeframe)

    And hey; if it doesn't work it doesn't work.
  • l.set - Wednesday, March 26, 2014 - link

    Nvidia aren't over-confident at all, this helps to solve a massive issue in the mainstream server market. Getting GPU density (in terms of size and linking them together) in normal x86 servers is a big problem and Nvidia are the only company really doing anything in this space. There is hardly any support for AMD cards in servers and they don't have anything like vGPU or GRID (AMD are also saying goodbye to their x86 server market share so it's really only Intel and the server OEMs that Nvidia needs to convince. IBM are already onboard by the sounds of things).

    Servers have already shrunk and blades use this style of mezzanine connector for it's daughter boards. If AMD aren't going to do anything about giving my virtual servers and virtual desktops the graphics grunt they need then Nvidia can pretty much do what they like.
  • Yorgos - Wednesday, March 26, 2014 - link

    At least you can search wikipedia before you even post something, but it seems people don't even bother to do that, they just like to write the fairy tale they have in their head.
    Anyway, I am glad to hear that amd doesn't have vGPU, I wonder how I am using it.
    On the other hand companies are pushing amd based solutions with Seattle or Warsaw cpus and some of them with the blessings of Arm(whatever this means).
    GRID, hmmm..... what can someone say about GRID, well someone can say it is as successful as onLive.
    The cuda passion is fading away by the time and openCL is gaining ground in every application.
    Server market is not favoring proprietary standards with doubtful development and support, unlike open-standards.
  • l.set - Wednesday, March 26, 2014 - link

    You know what, I'll bite :)

    AMD's x86 server market share is shrinking and has been shrinking for a long time when it comes to off the shelf servers that 95% of businesses run on. This is fact. If AMD can take back market share with Warsaw and push ARM business as well then that's awesome. More competition is always welcome and is better for the customer. However my point was that Nvidia doesn't have to convince AMD to include NVLink in anything because they don't dictate enough of the market for it to be a problem.

    You are glad AMD doesn't have something like vGPU? Seriously? Just because you don't use it doesn't mean this isn't a big deal to a lot of other people. You can't vmotion/live migrate a VM that is tied down to a passed through hardware device, but you can if it's virtualised. And no AMD doesn't do true virtualised GPUs, they rely on RemoteFX.

    As for GRID, it's working it's way into a market that has large project lifecycles. There a lot of seriously large oil, gas, architect and design firms that are currently in proof of concept stages for GRID at the moment and are taking it very seriously. It's not a one size fits all but it's proving to be really popular at the moment (with my customers at least).

    The server market favours what it favours, so we'll wait and see, but I rather put my money on Nvidia than AMD when it came to the server GPU space.
  • phoenix_rizzen - Wednesday, March 26, 2014 - link

    You need to slow down, and actually read what he wrote:

    "Anyway, I am glad to hear that amd doesn't have vGPU, I wonder how I am using it."

    Note the sarcasm in the first part of the sentence, and the acknowledgement of the feature in the second part. IOW, he's using the vGPU features of an AMD chip right now.
  • l.set - Thursday, March 27, 2014 - link

    I noted the sarcasm, but the other way around. Maybe because AMD doesn't have a form of functionality that is same as vGPU that's being implemented by Nvidia. I'm willing to admit if I'm massively wrong here, but the only way to get shared graphics using AMD cards through any of the main hypervisors at the moment is through stuff like RemoteFX or vSGA.
  • MrSpadge - Wednesday, March 26, 2014 - link

    The first version of NVLink is "just" attached to a PCIe crossbar (like PLX bridges), so there's no need for explicit support in the chipset. Current ones wouldn't know what to do with this bandwidth anyway.
  • BMNify - Thursday, March 27, 2014 - link

    that's like saying plug in a GFx card to USB1 logic IC and then plug that into the USB3.1 slot , it may not be "explicit " but you know full well its slowing it down to a crawl
  • vgray35@hotmail.com - Wednesday, March 26, 2014 - link

    To downplay the slow progression of PCIe over the next decade, and failure to acknowledge the vast disparity between GPU bandwidth needs as compared with rudimentary PCIe and CPU communications, would be a big mistake. Intel cannot afford not to participate because of the risks of competing CPUs rapidly overtaking the capability of modern PCs. If Intel does not provide a high performance CPU link then GPU manufacturers will simply add this to ARM, with competing PC platforms using ARM processors. The risk to Intel is really enormous. The advent of ARM processors running windows 8 as an alternative to INTEL and AMD is just around the corner. GPUs will have their high speed BUS or all hell will ensue.
  • andrewaggb - Wednesday, March 26, 2014 - link

    all I can think is history repeats itself.... we had isa (general), vesa local bus (mostly video cards), pci replaced it (general), we got agp (video cards), replaced by pci-e (general), if we get nvlink (video cards), presumably it will be replaced by something general again. but maybe not. the days of add-in cards are quickly going away, maybe specialized busses for the remaining few add-on devices will be the future.

Log in

Don't have an account? Sign up now