Analyzing Intel-Micron 3D XPoint: The Next Generation Non-Volatile Memory
by Kristian Vättö, Ian Cutress & Ryan Smith on July 31, 2015 11:00 AM ESTThe Technology: How Does 3D XPoint Work?
The way 3D XPoint works is fundamentally very different from NAND. Whereas NAND defines bit values by trapping a varying number of electrons in an insulated floating gate, 3D XPoint is a resistance based technology that works by a bulk property change to alter the resistance level of a cell and thus differentiate between a 0 and 1.
The 3D XPoint structure is very simple. It consists of a selector and memory cell, which sit in between a wordline and bitline (hence the "crosspoint" name). Applying a specific voltage on the wordline and bitline will activate a single selector and enables the cell underneath to be either written (i.e. a bulk property change in the memory cell material) or read (allows the current through to check whether the memory cell is in high or low resistance state). I suspect that write operations require a higher voltage than read because otherwise there might be a risk of triggering a bulk material change (i.e. write operation) while reading a cell. Intel and Micron wouldn't disclose any internal read/write voltages, but we were told that the voltages are considerably lower than in NAND, which requires ~20V for program/erase to create an electric field strong enough to tunnel electrons through an insulator. The lower voltages should, in turn, result in lower power consumption compared to DRAM and NAND.
As the name suggests, the cells can be stacked in a 3D fashion to further improve density and the first generation die that is currently sampling a a two-layer design. Two layers does sound few compared to 3D NAND that is already at 32 layers with 48 being close to production, but the way 3D XPoint array is built is fundamentally quite different.
3D NAND is manufactured by first depositing alternating layers of conducting and insulating materials on top of each other. It's only after all layers have been deposited that the "cell towers" are lithographically defined, followed by a high aspect ratio etch with the hole being filled by channel material to gain access to a memory cell in each layer. In 3D XPoint, however, each layer needs to be lithographically patterned and etched (i.e. repeating the same process for each layer) before another layer can be deposited, which does take away some of the economic benefits that 3D NAND has (i.e. very few lithography steps), but the 3D XPoint approach still provides higher density than what pure lithography based scaling would.
Intel and Micron said that scaling in future will happen through both lithography and 3D stacking of layers. Being scalable both horizontally and vertically is the key in enabling future-proof scalability because traditional Argon Fluoride based immersion lithography with multi-patterning is getting quite uneconomical at ~10nm and there is still no clear successor in the pipeline. The big bets seem to be on EUV and Intel-Micron confirmed that 3D XPoint will (unsurprisingly) be compatible with EUV lithography and the cell design can scale down to single digit nanometers without major impact on endurance/reliability (in fact, some aspects actually get better as the lithography is scaled down), but we are still at least a few years away from EUV being ready for use in high volume mass production. Initial EUV production will also focus on logic given the high equipment cost and because logic cannot be scaled vertically as easily as memory can due to heat issues.
In theory, 3D XPoint also supports multiple bits per cell, but that's not a path Intel and Micron are pursuing at the moment. While it's relatively easy to demonstrate multiple resistance levels in a lab, it's far more difficult to produce tens of thousands of wafers with each die having the necessary characteristics for proper multi-level cell operation. For comparison, it took nearly two decades before a second bit per cell was introduced to NAND, so for now Intel and Micron will focus on lithography and 3D scaling to increase density and cost efficiency, but multi-level cells may become a viable alternative in the future.
One of the big architectural differences to NAND is the fact that 3D XPoint is accessible at the bit-level. In NAND a whole page (16KB for the latest nodes) had to be programmed at once in order to save just one bit of data and to make matters worse you could only erase at the block level (a couple hundred pages at least). As a result, NAND requires sophisticated garbage collection algorithms for efficient performance, but regardless of the level of sophistication there is still performance degradation as a drive enters steady-state because of the inherent read-modify-write cycle that is needed to erase invalid pages within a block. With each cell being individually accessed 3D XPoint doesn’t necessarily require any garbage collection to work effectively, which simplifies the controller and firmware architecture, and even more importantly enables higher performance and lower power consumption.
I suspect that end products, especially storage focused, may still employ logical pages to reduce the tracking overhead because tracking data at a bit level would require a massive cache. However, Intel and Micron made it clear that today’s announcement was solely a technology announcement, so the companies declined to comment anything about upcoming products based on the new technology. That said, both are working on their own products with first commercial shipments scheduled for next year.
The Memory Cell: What's the Secret Behind 3D XPoint?
Section by Ian Cutress
At an array level, 3D XPoint operation is rather easy to understand, but what happens inside the memory cell during a bulk property change is a more complex issue. Personally, what comes to mind is that there are two ways in order to do this – physically adjust the properties of the cell with an external stimulus that adjusts the crystal structure, or chemically adjust the properties of the material used in the cell. During the discussions after the announcement, we were told categorically that this is not a phase change material, eliminating one potential avenue that it might be the change in the crystal structure of the cell producing the resistance change. This also makes a lot of sense, given the claims of a high durability where a constant crystal structure change could have affected the metal bonding between different parts of the cell as inter-structure atom lengths adjusting frequently. This leaves a chemical adjustment, or specifically the realignment in the electron structure of the bit in the cell, that promotes the resistance difference.
After some research, it’s worth talking about the types of technologies that could be being used here.
Spin-Crossover
From previous experience, my fingers automatically point to spintronics and the application of spin-crossover compounds. The short version of this means that a material can have two different resistance levels depending where exactly the electrons are in the electron levels of the structure, and an external stimulus (temperature, high voltage, magnetic field) can be used to switch between the two.
For the long answer, it is best to consider individual transition metal atoms. Depending on the local arrangement around the metal, the bonding orbitals of the metal are partially filled with electrons:
Here is an iron molecule in a 2+ environment, with its main bonding orbitals called t2g (lower energy) and eg (higher energy). Due to Pauli’s exclusion principle, each of the six electrons are paired into three orbitals and each pair of electrons has one ‘spin-up’ and one ‘spin-down’. This is the ground state, also known as the low-spin state as the total spin, S, is equal to zero, as each electron of spin 1/2 is cancelled out by the other electron having -1/2.
The other mode for this atom is the high-spin state where two electrons move to the higher bonding orbitals giving a total spin of S = 2:
Due to an external stimulus, two of the electrons have flipped their spin and occupied the higher energy eg orbitals and this is officially a ‘metastable’ state. Depending the arrangement around the atom, this state could fundamentally be very stable and yet have wildly differing properties to the original ground state.
Extrapolating this out into a bulk material is fundamentally difficult. A quick look at research papers suggests that spin-crossover compounds can be directly applied between junctions and their resistance changed, but the chemical base for such operations in many papers are carbon nanotubes, graphene layers or organic chains.
Image from M. Urdampilleta et al., Nature Materials, 10, 502 (2011)
In this paper, the orientation of the low-spin/high-spin states either affords conductivity between the two edges or does not, and depending on the metal in question will determine the exact nature, resistance, characteristics and/or stability of the platform. Intel will have had to have developed a material that is essentially programmed by voltage rather than any other stimulus, which adds to the complexity. Normally spin-crossover compounds have a specific temperature window as well which has a direct impact on stability as electrons may migrate from the high-state to low-state.
At this point, the scalability and fundamental material characteristics are the major barriers to large scale output of spin-crossover, particularly if nanotubes are required. If it was somehow extended to bulk metallic materials, as you bring an individual metal environment to bulk, band lines (and gaps) start to blur between simple orbital theories, so how it can be incorporated it into a cell is unknown. Intel is also stating that their technology is capable of multiple bits per cell, and the reality of spin-crossover arrangements can be blurred with electron tunnelling to achieve that effect.
Spin-Transfer Torque
The STT method of memory cells relies on both the resistance between the states of the cell, as well as the ability to control magnetic fields within a cell itself. The short version is that if you adjust the magnetic alignment of a material, you can adjust the resistance and use it as a form of memory.
The long form for STT relies on a feature known as spin-polarised current. Electrons intrinsically have spin in two directions, up or down, and electric current typically consists of an even mix of these two, giving an overall un-polarised current (i.e. no overall spin direction majority). When an unpolarized current is passed through a thick layer of magnetic material, it causes the current to become polarised. If this polarised current then passes through a thin magnetic layer with a different density of electron states, the polarization of the current can cause the magnet to flip due to more electrons relaxing into states of opposite spin.
With the thick layer having a constant magnetic orientation and the thin layer (or the free layer) being able to flip, depending on the material, the resistance between the two combinations can act as a memory cell.
From Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative, Kültürsay et. al.
Claimed benefits from STT methodologies revolve around energy consumption with it being lower to DRAM but with equivalent performance. Barriers to the technology depend on the ability to essentially create permanent magnets at such a small nanolithography note, and how placing many small magnets near each other (similar to bits on a HDD) might cause some of them to flip unexpectedly. Also with this method, it is unclear if it can scale beyond a simple binary bit per cell, and available research suggests that it requires a control transistor, which Intel has stated that 3D XPoint does not use.
Public history of STT development comes from a 1-megabit IC from Qualcomm on 45nm in 2011, and a 64-megabit module from Everspin in 2012 although no mention of a node.
Conductive Bridging
One of the easiest ways to adjust the resistance of a pathway is to block the path of electrons almost completely with a physical change. Conductive bridging technologies (or programmable metallization cells) rely on the creation of a nano-bridge between electrodes to reduce the resistance of a cell using a similar technique to electrolysis.
Within a CB cell, a thin film of electrolyte (historically a liquid, but can be a solid) is sat between an active and an inert electrode. When a negative bias is applied to the inert electrode the metal ions in the electrolyte are reduced to form metal atoms. When enough of these metal atoms are reduced and connected, it forms a wire between the two electrodes. In order to break the wire, the reverse potential difference is applied and the atoms in the wire are oxidized to become part of the electrolyte again. Ultimately the electrode-electrolyte-electrode combination is still conductive, but its resistance is higher than that of the situation where a wire is created.
Even discussing a brief overview of CB, for anyone familiar with electrolysis, brings up a large number of questions. First might be the use of liquids in a cell, but we should assume here that we might be dealing with mobile ions in a solid moving between intercalation points (spaces between a lattice/framework) anyway. But for me it comes to the growth of this ‘wire’. Typically growth via electrolysis is somewhat undirected – you are relying on the activity of various crystal planes to drive growth and ion diffusion is multi-directional, though there will be a slight bias towards the electrode depending on the crystal plane of growth. That leads to forked wires, similar to the way lightning works. When you do make a connection between the electrodes, or at least within electron tunneling distance, the resistance differential with the wire (from high resistance to low resistance) is large, but as the wire grows it will again decrease the resistance. This makes the wire method potentially multi-bit per cell, but as I mentioned before, growth control is hideously difficult. The other factor is that of reversing the process – typically the active electrode is made from the same material as the ions, meaning that the electrode itself could essentially dissolve. This creates a few headaches in research at least for durability.
To the benefit of CB, it theoretically scales to smaller than floating gate cells and the layout is simpler, with reported benefits in both power and performance by several orders of magnitude over current NAND.
Despite these issues and benefits, I might suggest that CB is currently the leading contender for the technology used in 3D XPoint. Subtle hints such as the official licensing of the technology by Micron in 2002, but also this slide in Micron’s Analyst conference in 2014 shows how they’ve overcome some of the issues I’ve listed above:
At the bottom here is the image above of a working cell along with a diagram showing how the bridging works. In order to facilitate the active electrode not being ‘eaten’ in reverse programming, we have a large ion reservoir to call on. The other electrode is also small in order to direct wire growth. As long as the electrolyte layer is small (several monolayers), read/write speeds can be quick and easy.
The Future
With all that being said, if we fast forward to Feb 2015, Micron released this slide at an analyst meeting:
In the bottom half of this slide, it’s clear that Micron is looking into spin-transfer torque based memory on the left, and several analyst reports has suggested that the RRAM cell on the right is most likely that of CB technology. Within the 3D XPoint slides presented at the announcement this week, a number of analysts have also concluded that the selector per bit is most likely a diode in order to enable the appropriate characteristics for probing the resistance.
Given the fact that the announcement has stated that what is being done here is fundamentally different to previous attempts, and that the hardest part of producing 3D XPoint is the materials, I ultimately can’t judge on the exact specifics of what Intel and Micron are doing here, only that the conductive bridging method is most likely the method by which it is being done. Chances are the exact details will be kept close to Micron and Intel’s chest for a while as they are quoting a full decade research time from concept to product, which aligns from the 2002 licensing deal mentioned above for the conductive bridging method.
80 Comments
View All Comments
FunBunny2 - Friday, July 31, 2015 - link
If you want to know what's being sold, go back and look up Unity Semiconductor's CMOx tech. Rambus bought them, then Rambus and Micron settled, including a patent sharing arrangement. The last Unity CEO said, just before Rambus bought them, that 2015 was production year. Could be.nwarawa - Friday, July 31, 2015 - link
I can't wait for this to be a normal conversation:A:"How much storage do you have?"
B:"256GB"
A:"RAM or on your drive?"
B:"Yes."
ajp_anton - Friday, July 31, 2015 - link
10^15 P/E cycles for DRAM? How does this work?, as typical DRAM does on the order of 10^16 cycles in a year. I'm assuming a P/E cycle is the same as a clock cycle because of the constant refreshing, is this wrong?Crazy1 - Saturday, August 1, 2015 - link
I had to look this up, but the DDR3 standard calls for at least 8 refresh commands every 7.8 usec. Rounding down to the nearest 50ns, means to one refresh every 950 ns. When calculated out, that equals roughly 3.32x10^13 cycles/year. That means DDR3 should survive up to 30 years with a 10^15 P/E cycles rating, while never turning off your computer or putting it in hibernate.In a refresh cycle, the information in a cell is read, then rewritten. There is no erase. I'm not sure the speed a typical P/E cycle occurs when erasing and writing new data is required. If it is significantly quicker than 950ns, there may be a decrease in lifespan from 30 years. However, unless you run intensive programs that delete and write new information to all memory cells every 32ns, you are not going to exceed the 10^15 P/E cycles in a year.
TallestJon96 - Friday, July 31, 2015 - link
Excellent work. Anandtech always has the best information and reviews, even if they are the last.This is pretty exciting stuff. If storage can become fast enough, then perhaps we will not need memory. Theoretically this would be a massive improvement to efficiency and performance. I would argue that the perfect computer would only have a processor and extremely fast storage. This is not enough to fill the gap, but storage is certainly catching up.
As a gamer, the idea of having my game loaded onto storage that is fast enough to not need to load into the memory is pretty appealing. Zero load time, no texture streaming issues, and potentially larger scale.
I have to wonder about bandwidth with this tech. Latency is clearly between ram and SSDs, but is closer to ram. But I haven't seen any solid bandwidth stats.
Freakie - Friday, July 31, 2015 - link
In the article they mention that gamers already can by-pass slow NAND and HDD speeds by just creating a RAMDisk. If you have 32GB of RAM, you could take 8GB of it for your system memory, turn the other 24GB into a RAM disk, and put all of your game files onto it and then your games will load their resources at the speed of your RAM.And DDR4 is coming down in price very quickly so it isn't such a crazy idea. The cheapest 32GB DDR4 kit I can find is $176 which means 64GB will cost you $350 for games that have 40GB of resources. While not incredibly cheap, it's also not totally unreasonable especially if you're already complaining about SSD's not loading game resources fast enough.
Friendly0Fire - Saturday, August 1, 2015 - link
Sadly, 24GB is a bit short for modern games and 8GB for the OS and the game is also a bit on the low side. Games are finally taking advantage of 64-bit executables (and thus far larger memory cap) and it's showing up as a dramatic increase in asset size, both on disk and in memory.64GB of RAM might get you there, but I think 32's on the short-ish side. 3D XPoint would side-step the issue by providing far more storage than contemporary games would likely need.
lordken - Sunday, August 2, 2015 - link
As said by Friendly0Fir 24GB is unfortunately nothing today, many games today have 20-50GB disk requirments (not sure if devs are plain lazy to optimize or they really need that much space for stuff)Plus dont forget that you need to first fetch data into ramdisk after boot, and wait it to flush it out before shutdown. So personally I would not bother with ramdisks, and probably load times doesnt solely depend on read time from storage only. On some games I didnt seen much difference between HDD and SSD load performance (which shows either bad game engine/coding or some other bottleneck, maybe my CPU).
And not to say leaving only 8GB for OS is really not that great.
JKflipflop98 - Monday, August 3, 2015 - link
Not to mention it's a giant pain in the butt to have to create the ram drive, copy all the files over, and then create all the links needed to actually run the game. By the time you're done futzing around with all that crap, you've cost yourself 10x the time you've saved in loading screens.lordken - Sunday, August 2, 2015 - link
"This is pretty exciting stuff. If storage can become fast enough, then perhaps we will not need memory. "imho this will "never" be true, RAM will always be faster, no matter how much you make storage faster you can still also improve RAM which in turn will always keep ahead of storage. Plus as shown in article it is much closer to CPU and thus better perf/latencies etc.
Maybe in case when Xpoint v3 reach performance level of DDR3/4 then diminishing returns could start to kick in , but still by that time we will probably have DDR5/6 or HBM3. So I think RAM will stick around, even if it could perhaps shift into CPU L4 like cache with HBM for example.