NVIDIA 400M: DX11 Top to Bottom Solutions Now Available
by Jarred Walton on September 3, 2010 12:02 AM ESTIntroducing the GeForce 400M Family
Back in May, NVIDIA surprised us by announcing their first mobile DX11 GPU, the GTX 480M. What was surprising is that they were using a full GF100 chip, only harvested and downclocked relative to the desktop GPUs. In fact, GTX 465M would have been a more accurate name, as the 480M shipped with the same number of cores as the desktop GTX 465. Power requirements were understandably quite high (100W), but there's no arguing that the 480M is now the fastest mobile GPU on the block. Whether it's worth the price of admission is another story, of course, which segues nicely into today's announcement.
NVIDIA is filling out the rest of their mobile lineup with a slew of new chips. What they're not telling is precisely which core the chips are using, so potentially there will be some overlap with harvesting going on (the 445M in particular looks like it will use two different chips). NVIDIA also didn't give us any figures for power requirements, though Optimus Technology means that when paired with and IGP-enabled CPU they can "idle" at 0W. Anyway, here's what we do know, starting with the high-end offerings. (We've split the other parts out on the next page to keep our tables manageable.)
NVIDIA High-End 400M Specifications | |||
GeForce GTX 480M | GeForce GTX 470M | GeForce GTX 460M | |
Codename | GF100 | GF104 | GF106 |
CUDA Cores | 352 | 288 | 192 |
Graphics Clock (MHz) | 425 | 535 | 675 |
Processor Clock (MHz) | 850 | 1070 | 1350 |
Memory Clock (MHZ) | 1200 | 1250 | 1250 |
Standard Memory Configuration | GDDR5 | GDDR5 | GDDR5 |
Memory Interface Width | 256-bit | 192-bit | 192-bit |
Memory Bandwidth (GB/sec) | 76.8 | 60 | 60 |
SLI Ready | Yes | Yes | Yes |
We eliminated several rows of supported features, which we'll summarize here: all of the 400M CPUs, from the lowly 415M up to the top 480M, include support for DX11, OpenGL 4.0, PhysX, Optimus, CUDA, DirectCompute, OpenCL, H.264/VC1/MPEG2 1080p video decoding, and full spec Blu-ray decode. They also support the HDMI 1.4a spec, so hopefully that means all the new cards will include 1.4a ports; now we just need 1.4a HDMI displays to go along with the GPUs.
The more interesting specs are the number of CUDA cores in the various models, which allow us to make guesses as to the base chip. (Update: NVIDIA also included images of the chips, though it looks like they used the same image for many chips and just changed the log via Photoshop, so we have a pretty good idea of what's going on. We have updated the tables after looking at the images, as one reader suggested we do.) We already know 480M uses a harvested GF100. The GF104 was introduced on the desktop with the GTX 460, and it contains up to 384 CUDA cores—which potentially means the 480M could switch to the GF104 as well. Anyway, the 470M will use GF104, and perhaps a new revision of the 480M will make the switch as well. In the past, NVIDIA has chopped off about half of their halo product for the next level GPUs, and then half of that again for the lower midrange parts, and finally one third/fourth of that for the entry-level parts. Thus, GT200 had up to 240 cores, GT215 had 128, GT216 48, and GT218 came with a lowly 16 cores. Right now, it looks like we don't have that final cut yet, so perhaps we'll see a G 410M at some point in the future.
The good news is that with 400M, we get roughly twice as many cores at every level compared to the previous generation 200M/300M parts, but typically slightly lower clocks. Theoretical computational power is nearly double, but the catch is that our testing of the desktop GTX 480 suggests that clock-for-clock, GF100 cores aren't as potent as GT200 cores. So looking at clocks and core counts, GTX 480 has 90% more computational power available relative to GTX 285, but in actual games it's more like 50% faster—though memory bandwidth and other areas also come into play. Even with that said, here's how things break down in the various performance segments.
At the very top, we've gone from 285M with 128 cores at 1500MHz to 480M with 352 cores at 850MHz. That represents a computational power increase of about 55%, but memory bandwidth is relatively close—only 18% higher. In our testing 480M beat 285M by around 20%, so the computational power isn't likely the bottleneck and memory bandwidth is playing a major role. What we'd like to see is a shift to the smaller (and presumably less power hungry) GF104 while still keeping the same specs, but perhaps that's not possible. Either way, 480M is the mobile performance champion but with a 100W TDP it's also very hot and will only be found in larger notebooks.
The next step down gives us 470M, which replace GTX 260M. The 260M had a TDP of around 55W (75W max, but that was more for the 285M), so presumably the 470M will target a similar power envelope. Core count at the top goes from 112 at 1375MHz in up to 288 at 1070MHz, an increase of 100%. As we saw with 285M and 480M, however, memory bandwidth may be the bigger factor; here the 260M and 470M are equal (60GB/s vs. 60.8GB/s), so it will be interesting to see how performance plays out. It's also very possible that future games will be able to stress shaders more than memory bandwidth and thus show greater performance improvements.
The 460M replaces the GTS 360M and GTS 350M, neither of which saw much use in notebooks. (We'll actually look at our first GTS 350M notebook in the near future, just in time for replacements to arrive.) GTS 360M has 96 cores at 1325MHz with 57.6GB/s of bandwidth; GTS 350M has a slightly lower shader and RAM clocks. The new 460M checks in with 192 cores at 1350MHz, and slightly more memory bandwidth. Again, computationally we're looking at roughly double the performance potential. If TDP is similar, we're also looking at around 40W for the 460M.
39 Comments
View All Comments
Roland00 - Friday, September 3, 2010 - link
The reason I ask is for this is the most popular "mainstream" gaming card in current designs of laptops right now. Yes the GTX 480m may be the fastest but if nobody uses it besides select Alienwares and Clevos what does it matter?JarredWalton - Friday, September 3, 2010 - link
The GT 335M is roughly at the same performance level as HD 5650, but obviously without DX11 support. Based on that, I would venture that even 420M might match the 5650; certainly 425M and up will be faster (unless you're bandwidth limited, which is entirely possible).marraco - Friday, September 3, 2010 - link
The CPU integrated video of Intel will rule out all the "entry level" discrete chips.Nvidia and AMD will be forced to offer more juicy entry level cards.
I guess that the next generation of entry level will have at least geforce 8600 power.
JarredWalton - Friday, September 3, 2010 - link
Our initial look suggest that the 12 EU version of the Sandy Bridge IGP will be at roughly the level of the G 310M. In other words, even the GT 415M looks to be around twice as fast, plus you get CUDA, OpenCL, DirectCompute, OpenGL 4.0, DX11, PhysX, etc. I'd say Sandy Bridge will make a direct replacement for G 310M (i.e. "G 410M") pointless, so perhaps that's why there's no castrated 24 core 400M chip with a 64-bit interface.tviceman - Friday, September 3, 2010 - link
Looks like a solid lineup from everything below the gtx480m. Hopefully like others have suggested they updated the 480m to be a little more powerful at the same TDP or offer the same performance at a lower TDP.I'm personally still waiting on any word or significant rumors regarding a a full 384 core GF104 desktop part.
blah238 - Friday, September 3, 2010 - link
Most interested in the 12-14" size range with the fastest possible GPU and decent screen options. The Sony Vaio Z is the only machine that fits the bill currently but it's way out of my price range.Here's hoping something from this refresh gets stuck into a chassis that gives Sony some competition in this space.
patrickjchase - Saturday, September 4, 2010 - link
Jarred makes reference to the fact that the GF1xx series strike a different balance between compute (shader) and memory bandwidths. I think that part of NVIDIA's motivation is indeed an expectation that future games will skew towards requiring more shader performance, but I think there's another factor: Fermi has an L1 data cache in each SM.I do OpenCL and CUDA programming professionally, and I think that it's important not to underestimate the impact of cache. Many algorithms in graphics and elsewhere have access patterns that are best described as "localized but unpredictable". This means that the algorithm's data accesses tend to "cluster" spatially and/or temporally, but it's very difficult to predict *where* they'll cluster and it's therefore impractical to explicitly pre-load data into local memory.When running such algorithms Fermi needs less DRAM bandwidth for any given performance level than any other GPU on the market (and again, I say this as somebody who develops for and benchmarks these things day in and day out).
This is actually a bit of a repeat of how general-purpose CPUs and their associated memory systems progressed, beginning with the IBM 360/85 all the way back in 1969...
JackNSally - Monday, September 6, 2010 - link
"OpenGL 40." in the supported features. I believe you mean "OpenGL 4.0" unless nVidia is jumping far, FAR into the future.JarredWalton - Monday, September 6, 2010 - link
The Alienware M15x has already done 1080p in a 15.4" chassis I think, and there are several other 1080p ~15" laptops around. ASUS G51JX-X1, ThinkPad W510, and MSI's GX660R-060US are all 15.6" 1080p. But then, they're also non-3D and cost $1300 minimum.