Today must be the busiest day in the world of NAND. Earlier today, Toshiba announced that it has begun sampling of its 48-layer 128Gbit 3D NAND part and now a few hours later Intel and Micron held a joint webinar that revealed a few new details about the companies' 3D NAND process. Intel-Micron originally unveiled their 3D NAND in November last year and disclosed that the first generation product will be a 32-layer 256Gbit (32GB) MLC part, which can also operate in TLC mode to bring the capacity per die to 384Gbit (48GB).

The initial part will be a 4-plane design, which is necessary for retaining high performance at such a high die capacity. For understanding why the number of planes is important, I suggest you read this page from our Crucial M550 review, but in short the number of planes translates to the number of pages that can be programmed in tandem within a single die (i.e. one page can be programmed in one plane at a time, so that's four simultaneous page programs in a 4-plane die). There's some additional latency from multi-plane programming, but with a 4-plane design you can get roughly two times the write (and read/erase too) throughput compared to a 2-plane design. That's a vital element because as the die capacity increases, less die is required to build a drive with fixed capacity (e.g. with 256Gbit die, a 256GB SSD only consists of eight dies, whereas with a 128Gbit die there would be 16 dies operating in parallel). 

As I thoroughly explained in our Samsung 850 Pro review, one of the key issues with planar NAND is the shrinking number of electrons. Because 3D NAND can utilize a much larger cell structure due to the fact that scaling is done vertically rather than horizontally, the number of electrons is considerably higher, which improves both endurance and performance. The slide Intel and Micron shared shows that their 3D NAND will have roughly the same number of electrons as their 50nm process did (or actually slightly more), which is over a tenfold improvement compared to the latest 16nm node. The companies weren't willing to share the exact lithography that's used for manufacturing, but I was told that the process relies on single patterning and thus I would estimate the lithography to be somewhere between 35nm and 50nm given the limits of argon fluoride patterning. Ultimately the lithography on its own is rather useless anyway because it only measures the smallest pitch in the die and there's more than a couple of pitches that need to be known for any accurate analysis. 

In terms of endurance, Micron told me that the parts will initially be rated at 3,000 P/E cycles. That may sound low, but Micron explained that the reason behind this is that all Micron's client-grade MLC has been rated at 3,000 for several years now and frankly that's more than enough for client applications. Both companies are confident that their 3D NAND technology is capable of delivering far more than that, but as validation takes time and money the first batch won't be rated at more than 3,000 cycles. It will be interesting to see what the enterprise-focused 3D NAND is rated at once it arrives to the market, but given the increased number of electrons and other endurance improvements it's safe to assume that Intel-Micron's 3D NAND will be capable of +10,000 P/E cycles as the process matures. 

The companies also shared some high-level details of their 3D NAND structure, which has remained a secret until now. To be honest, the above illustration I was given is far from easy to understand (even the Intel/Micron engineers I talked had trouble understanding it), but what I was able to understand is that the purple tubes are the channels and the individual cells are between the channels and the green wordline (i.e. the actual cells are not really shown in the picture). Obviously, the graphs leave many questions unanswered (like how a single cell is accessed and where and how the wordlines are connected), so we'll have to wait for further details before we can fully understand how the structure differs from Samsung's and Toshiba-SanDisk's.

That said, Intel-Micron did disclose that their design utilizes a traditional floating gate, whereas the other 3D NAND designs we have seen use a newer charge trap technology. There's inherently several benefits to charge trap (e.g. less electron leakage), but Intel and Micron told me that they decided to use floating gate because it's a decades old design and the physics are well known, while charge trap is much newer and more unproven. It's impossible to outright say that one cell structure is better than the other because in the end it all boils down to cost where floating gate design is probably more cost efficient for Intel-Micron given their deep knowledge of its functionality. 

All in all, we now know a few more bits about Intel-Micron's 3D NAND, but there's still lots of details to be unveiled and investigated to fully understand the differences to other 3D NAND technologies. 256Gbit MLC samples are now shipping to select customers and mass production will begin in the second half of this year, so the first products with Intel-Micron 3D NAND will likely hit the shelves in the first half of 2016.



View All Comments

  • zepi - Thursday, March 26, 2015 - link

    Multilayer CPU'd would allow huge L4 caches possibly making even DRAM unnecessary... Reply
  • zepi - Thursday, March 26, 2015 - link

    Oh, and just piling memory layers on top of each other and then CPU on top of that should allow reasonably easy way to disable bad blocks.

    And nothing really prevents piling EDRAM / SRAM whatnot there...
  • jjj - Thursday, March 26, 2015 - link

    Except costs and the huge difference in manufacturing between RAM and CPUs.
    For a long time some people expected RAM and CPU on the same die and it didn't and won't happen because it's not cost effective. Advanced packaging is another matter but that's not a monolithic 3D IC.
  • jjj - Thursday, March 26, 2015 - link

    For anyone curious Qualcomm was talking about monolithic 3D ICs last year but it feels more like a call to arms than them having a practical solution soon.
  • earl colby pottinger - Friday, March 27, 2015 - link

    Don't forget they can use the stacking to make bigger caches. At present a cache miss is costly in terms of CPU cycles wasted to get the right data. Increasing the cache sizes by 8-32 times their present sizes will make a measurable difference. Reply
  • Vlad_Da_Great - Thursday, March 26, 2015 - link

    In my view 3D processors are not in the cards, because you have heat leakage on a stocked skyscraper platform. Please, note read/write are not emitting heat(or very little), but logical calculations and transistors switches do. It appears for the moment, 10nm is the last achievable node for the human race, since many of the silicons are not sustainable. "A more significant limitation comes from plasma damage to low-k materials. The extent of damage is typically 20 nm thick,[1] but can also go up to about 100 nm.[2] The damage sensitivity is expected to get worse as the low-k materials become more porous." wikipedia So after 10nm, the manufacturing might be impossible to us. Reply
  • azazel1024 - Friday, March 27, 2015 - link

    With silicon is key here. Intel has already talked about using other materials, like indium gallium arsenide for nodes below 10nm. There are several research univeristies (and I assume fabs too) that have working sub-10nm lithographies running on research scales.

    We'll hit below 10nm, it just probably won't be on silicon.
  • jjj - Friday, March 27, 2015 - link

    The hardest problem is design.
    You take a SoC and what do you do, even if you put diff compute units on diff layers and you end up with 3-4 layers,then what? How do you scale it further, if it's a 1 time only and you can't scale it further, it has minimal relevance.
    NAND went 24 layers, 32, 48 and will keep going. It has many repetitive structures and you can keep scaling. For a SoC you can't do that and there is no need for now for lots of CPU cores to just add that. For GPU it would work, at least when adding more layers it's easy to figure out what to put on them, more cores.
    In the end even the monolithic 3D ICs that we aim for are just layered (as in ,lest say a core, is still planar and on one layer not a cube using multiple layers) and going beyond that is a lot harder and way more interesting.
    Folks will find a way to go around all the other problems in the early stages but how to scale beyond a few layers in a way that makes sense ...
  • azazel1024 - Monday, March 30, 2015 - link

    It can possibly make some sense for a SoC. Once you get beyond thermal issues, you can reduce latencies within the SoC by layering. Place the SRAM on top (or bottom), memory controller on the other side and you've potentially reduced the distance that signals have to travel from the cores/system bus to memory. Sure, we are talking picosecond latency reductions...but it does reduce latency. It also possibly opens things up (if it reduces overall cost) to larger L3 and L2 caches, possibly larger iGPUs and so on. Reply
  • willis936 - Thursday, March 26, 2015 - link

    Nevermind the manufacturing difficulties. You have a physical limitation you can't get around. You can't just double or triple the number of active transistors in a given area without doubling or tripling the size without melting the chip. They could do clever and complicated things to use more transistors and stay in the same power budget but you won't see incredible leaps in performance like the incredible leaps in density we're seeing with storage. Reply

Log in

Don't have an account? Sign up now