So this enables putting 128 ARM cores on a single piece of silicon? Even with little cores and 14nm process, that's going to be a pretty large die.
This would be pretty cool for building a BIG.little supercomputer though. A small block of 4 big cores to manage the OS and task scheduling, and then 124 little cores in parallel... Add a high speed interconnect to talk to other nodes and external storage servers, and you've got an entire HPC node as an SoC. Want a half million ARM cores in a single 19" rack?
A quad A72 is 8 mm^2 in size (this includes L2 cache) on TSMC 16FF+.
128 A72 cores would come out to 256 mm^2, not accounting for the interconnects and the rest of the chip. TSMC is manufacturing bigger GPU chips on this process so this is not unfeasible at all ...
Yeah, people are totally confused about how large (or rather small) CPU cores are.
Even the A10 core, eyeballing it from the die shot, is perhaps 14 to 16mm^2 (including 2 fast cores, 3MiB L2, and two companion cores; L3 adds maybe 30% more). Not as small as A72s, but again it would be totally feasible to put 16, maybe even 24, of these (2+2) units, and L3, on a die if you had the urge to do so. The details would start to depend on howmuch else you also want to add --- memory controllers, what IO, etc.
The high end of die size is up at 650mm^2 or so, as opposed to 100 to 150mm^2 at the low-mid end (eg the range of Apple's mobile and iPad SoCs). Obviously you pay serious money for that sort of large size, but it is technically quite feasible and is used.
ARM press release talks only about the interconnect bandwidth increase of five times, not about a similar increase in memory bandwidth. They do support 3D stacked ram though (HBM, HMC?), so that might explain the number.
Is there a source for those simulated memory controller results somewhere, a release event or an interview? News release talks about throughput related to interconnect bandwidth on the whole, not specifically that related to the memory controller like this article suggests.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
14 Comments
Back to Article
shelbystripes - Tuesday, September 27, 2016 - link
So this enables putting 128 ARM cores on a single piece of silicon? Even with little cores and 14nm process, that's going to be a pretty large die.This would be pretty cool for building a BIG.little supercomputer though. A small block of 4 big cores to manage the OS and task scheduling, and then 124 little cores in parallel... Add a high speed interconnect to talk to other nodes and external storage servers, and you've got an entire HPC node as an SoC. Want a half million ARM cores in a single 19" rack?
Arnulf - Tuesday, September 27, 2016 - link
A quad A72 is 8 mm^2 in size (this includes L2 cache) on TSMC 16FF+.128 A72 cores would come out to 256 mm^2, not accounting for the interconnects and the rest of the chip. TSMC is manufacturing bigger GPU chips on this process so this is not unfeasible at all ...
ddriver - Tuesday, September 27, 2016 - link
Half the die of consumer chips is graphics, remove graphics and suddenly you have plenty of room for cores.jjj - Tuesday, September 27, 2016 - link
A73 on 10nm is under 0.65mm2, a quad cluster with 2MB L2 some 5mm2,add a large L3 and it's still small.But the real push in server is for 7nm HPC and bigger cores.
name99 - Tuesday, September 27, 2016 - link
Yeah, people are totally confused about how large (or rather small) CPU cores are.Even the A10 core, eyeballing it from the die shot, is perhaps 14 to 16mm^2 (including 2 fast cores, 3MiB L2, and two companion cores; L3 adds maybe 30% more).
Not as small as A72s, but again it would be totally feasible to put 16, maybe even 24, of these (2+2) units, and L3, on a die if you had the urge to do so. The details would start to depend on howmuch else you also want to add --- memory controllers, what IO, etc.
The high end of die size is up at 650mm^2 or so, as opposed to 100 to 150mm^2 at the low-mid end (eg the range of Apple's mobile and iPad SoCs).
Obviously you pay serious money for that sort of large size, but it is technically quite feasible and is used.
patrickjp93 - Thursday, October 6, 2016 - link
Cache is the expensive part, and you're going to need an L3 cache to keep such a cluster fed, whether it's a victim cache or a primary.TeXWiller - Tuesday, September 27, 2016 - link
ARM press release talks only about the interconnect bandwidth increase of five times, not about a similar increase in memory bandwidth. They do support 3D stacked ram though (HBM, HMC?), so that might explain the number.T1beriu - Wednesday, September 28, 2016 - link
Have you even read anything from this announcement?!TeXWiller - Wednesday, September 28, 2016 - link
Is there a source for those simulated memory controller results somewhere, a release event or an interview? News release talks about throughput related to interconnect bandwidth on the whole, not specifically that related to the memory controller like this article suggests.jjj - Tuesday, September 27, 2016 - link
Don't forget that on the slide the A57 is on the same process so real world A57 on 28nm is not the baseline.Pork@III - Tuesday, September 27, 2016 - link
I see the first new competitor in the server market that is too uniform for many years. Time for a change!patrickjp93 - Thursday, October 6, 2016 - link
Cavium and Qualcomm both just tried and failed against Xeon D and Avoton. Intel is not losing that market without quite a fight.Communism - Wednesday, September 28, 2016 - link
Certainly looks like this will put a hurt'n on atom servers.Perhaps this will be the straw that breaks the camel's back on 10 Gbit servers, switches, and adapters.
Communism - Wednesday, September 28, 2016 - link
In terms of breaking the absurdly high pricing of them of course.