After Swift Comes Cyclone Oscar

I was fortunate enough to receive a tip last time that pointed me at some LLVM documentation calling out Apple’s Swift core by name. Scrubbing through those same docs, it seems like my leak has been plugged. Fortunately I came across a unique string looking at the iPhone 5s while it booted:

I can’t find any other references to Oscar online, in LLVM documentation or anywhere else of value. I also didn’t see Oscar references on prior iPhones, only on the 5s. I’d heard that this new core wasn’t called Swift, referencing just how different it was. Obviously Apple isn’t going to tell me what it’s called, so I’m going with Oscar unless someone tells me otherwise.

Oscar is a CPU core inside M7, Cyclone is the name of the Swift replacement.

Cyclone likely resembles a beefier Swift core (or at least Swift inspired) than a new design from the ground up. That means we’re likely talking about a 3-wide front end, and somewhere in the 5 - 7 range of execution ports. The design is likely also capable of out-of-order execution, given the performance levels we’ve been seeing.

Cyclone is a 64-bit ARMv8 core and not some Apple designed ISA. Cyclone manages to not only beat all other smartphone makers to ARMv8 but also key ARM server partners. I’ll talk about the whole 64-bit aspect of this next, but needless to say, this is a big deal.

The move to ARMv8 comes with some of its own performance enhancements. More registers, a cleaner ISA, improved SIMD extensions/performance as well as cryptographic acceleration are all on the menu for the new core.

Pipeline depth likely remains similar (maybe slightly longer) as frequencies haven’t gone up at all (1.3GHz). The A7 doesn’t feature support for any thermal driven CPU (or GPU) frequency boost.

The most visible change to Apple’s first ARMv8 core is a doubling of the L1 cache size: from 32KB/32KB (instruction/data) to 64KB/64KB. Along with this larger L1 cache comes an increase in access latency (from 2 clocks to 3 clocks from what I can tell), but the increase in hit rate likely makes up for the added latency. Such large L1 caches are quite common with AMD architectures, but unheard of in ultra mobile cores. A larger L1 cache will do a good job keeping the machine fed, implying a larger/more capable core.

The L2 cache remains unchanged in size at 1MB shared between both CPU cores. L2 access latency is improved tremendously with the new architecture. In some cases I measured L2 latency 1/2 that of what I saw with Swift.

The A7’s memory controller sees big improvements as well. I measured 20% lower main memory latency on the A7 compared to the A6. Branch prediction and memory prefetchers are both significantly better on the A7.

I noticed large increases in peak memory bandwidth on top of all of this. I used a combination of custom tools as well as publicly available benchmarks to confirm all of this. A quick look at Geekbench 3 (prior to the ARMv8 patch) gives a conservative estimate of memory bandwidth improvements:

Geekbench 3.0.0 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A7 1.3GHz 5.24 GB/s 5.21 GB/s 5.74 GB/s 5.71 GB/s
Apple A6 1.3GHz 4.93 GB/s 3.77 GB/s 3.63 GB/s 3.62 GB/s
A7 Advantage 6% 38% 58% 57%

We see anywhere from a 6% improvement in memory bandwidth to nearly 60% running the same Stream code. I’m not entirely sure how Geekbench implemented Stream and whether or not we’re actually testing other execution paths in addition to (or instead of) memory bandwidth. One custom piece of code I used to measure memory bandwidth showed nearly a 2x increase in peak bandwidth. That may be overstating things a bit, but needless to say this new architecture has a vastly improved cache and memory interface.

Looking at low level Geekbench 3 results (again, prior to the ARMv8 patch), we get a good feel for just how much the CPU cores have improved.

Geekbench 3.0.0 Compute Performance
  Integer (ST) Integer (MT) FP (ST) FP (MT)
Apple A7 1.3GHz 1065 2095 983 1955
Apple A6 1.3GHz 750 1472 588 1165
A7 Advantage 42% 42% 67% 67%

Integer performance is up 44% on average, while floating point performance is up by 67%. Again this is without 64-bit or any other enhancements that go along with ARMv8. Memory bandwidth improves by 35% across all Geekbench tests. I confirmed with Apple that the A7 has a 64-bit wide memory interface, and we're likely talking about LPDDR3 memory this time around so there's probably some frequency uplift there as well.

The result is something Apple refers to as desktop-class CPU performance. I’ll get to evaluating those claims in a moment, but first, let’s talk about the other big part of the A7 story: the move to a 64-bit ISA.

A7 SoC Explained The Move to 64-bit
Comments Locked

464 Comments

View All Comments

  • akdj - Thursday, September 19, 2013 - link

    I'm with ya on the note. I bought the original GNote and the contract can't end quick enough. It's a dog! Slow as molasses and my wife and each own one as our 'business' phones. Made sense, our clients can sign with the stylus their credit card authorization. We use the Square system for CCs and we won't be buying the Notes again. I think you're right. TouchWiz is a killer
  • coolhardware - Tuesday, September 17, 2013 - link

    Very true, however, it is my understandingthat sometimes Apple can use their volume to (A) get things a bit before everyone else (like when Apple gets Intel CPUs before others) or (B) get something special added/tweaked/improved on an existing component (batteries, displays, materials).

    Sorry to not have more definitive references/examples for (A) & (B) but here's a recent illustration:
    http://www.macrumors.com/2013/07/26/intel-to-suppl...
    How much this really happens I do not know, but I imagine suppliers want to keep Apple happy :-)
  • akdj - Tuesday, October 8, 2013 - link

    They sold more iOS devices last year (200,000,000) than vehicles sold in the world. They're still the world's number one selling 'phone'. Samsung makes a dozen...maybe two? Their flagships tend to sell well (wasn't the S3 close to 30mil @ some point?)---but no where near the iPhone specific sales figures....when you're dealing in that quantity--ya betcha....you'll have access, pricing and typically 'pick of the litter'
  • melgross - Wednesday, September 18, 2013 - link

    Display density is now nothing more than a marketing tool. It no longer serves any purpose. Displays with ppi's of over 350 don't give us any apparent extra sharpness, as we can't see it. The Galaxy S4 has a much higher Rez display, but it still uses PenTile, so that extra Rez is only allowing the screen to look as good as a lower Rez display. I'm wondering what Apple will do with the iPad Retina. If they do what they've been doing, then the display will have four times the number of pixels, and will be one of, if not the highest ppi displays on the market. They do that to make it easy for developers, but it's obviously unnecessary. No one has ever been able to see the pixels on my 326 ppi iphone display. in fact, no one has ever seen the pixels on my 266 ppi ipad Retina display. Hopefully we'll find out in a month.
  • melgross - Wednesday, September 18, 2013 - link

    Oops! I meant what will they be doing with the iPad Mini Retina display of course.
  • ESC2000 - Saturday, September 21, 2013 - link

    There isn't one definitive cutoff above which extra pixels are useless since people hold their phones different distances from their faces and people have varying eyesight. 'Retina' is pure marketing - first apple used it to emphasize how great their high rez (for the time) screens were (and they did look a lot better than 15" 1366x768 screens) and now they're using it to disguise the fact that this is the same low rez (for the time) screen that they've had on the iPhone 5 for a year.

    I don't even have good eyesight but even I can see that the LG G2 screen (441 PPI) is better than my nexus 7 2013 screen (323 PPI) which is better than the iPad 3 screen (264 PPI - I don't have the 4 for comparison) which is miles better than the iPad mini screen (163 PPI). Personally I'd slot the iPhone 5 after the nexus 7 2013 on that list even though the PPI are about the same. Obviously other factors, often subjective, affect our preferences. I find most apple screens washed out but many people feel they are the only true color reproductions.

    Regardless the random arbitrary cutoffs beyond which extra PPI supposedly makes no difference is a copout.
  • tuxRoller - Tuesday, September 17, 2013 - link

    Great review. I wish we could see this for the other architectures/socs.
    If want to see the code for the benchmarks (and you should) there are plenty of oss suites you can choose from. You could use linaro's if you want, but for the stream benchmark you could grab http://www.cs.virginia.edu/stream/FTP/Code/ and compile it on xcode.
  • abrowne1993 - Tuesday, September 17, 2013 - link

    Not a single Lumia in the camera comparison? Why? The people who are really concerned about their smartphone's camera performance will put up with the WP8 platform's downsides.
  • A5 - Tuesday, September 17, 2013 - link

    I would guess Anand doesn't have a 1020 handy to compare with. Probably have to wait for Brian on that one.
  • Anand Lal Shimpi - Wednesday, September 18, 2013 - link

    This. I only compared to what I had on hand.

Log in

Don't have an account? Sign up now