After Swift Comes Cyclone Oscar

I was fortunate enough to receive a tip last time that pointed me at some LLVM documentation calling out Apple’s Swift core by name. Scrubbing through those same docs, it seems like my leak has been plugged. Fortunately I came across a unique string looking at the iPhone 5s while it booted:

I can’t find any other references to Oscar online, in LLVM documentation or anywhere else of value. I also didn’t see Oscar references on prior iPhones, only on the 5s. I’d heard that this new core wasn’t called Swift, referencing just how different it was. Obviously Apple isn’t going to tell me what it’s called, so I’m going with Oscar unless someone tells me otherwise.

Oscar is a CPU core inside M7, Cyclone is the name of the Swift replacement.

Cyclone likely resembles a beefier Swift core (or at least Swift inspired) than a new design from the ground up. That means we’re likely talking about a 3-wide front end, and somewhere in the 5 - 7 range of execution ports. The design is likely also capable of out-of-order execution, given the performance levels we’ve been seeing.

Cyclone is a 64-bit ARMv8 core and not some Apple designed ISA. Cyclone manages to not only beat all other smartphone makers to ARMv8 but also key ARM server partners. I’ll talk about the whole 64-bit aspect of this next, but needless to say, this is a big deal.

The move to ARMv8 comes with some of its own performance enhancements. More registers, a cleaner ISA, improved SIMD extensions/performance as well as cryptographic acceleration are all on the menu for the new core.

Pipeline depth likely remains similar (maybe slightly longer) as frequencies haven’t gone up at all (1.3GHz). The A7 doesn’t feature support for any thermal driven CPU (or GPU) frequency boost.

The most visible change to Apple’s first ARMv8 core is a doubling of the L1 cache size: from 32KB/32KB (instruction/data) to 64KB/64KB. Along with this larger L1 cache comes an increase in access latency (from 2 clocks to 3 clocks from what I can tell), but the increase in hit rate likely makes up for the added latency. Such large L1 caches are quite common with AMD architectures, but unheard of in ultra mobile cores. A larger L1 cache will do a good job keeping the machine fed, implying a larger/more capable core.

The L2 cache remains unchanged in size at 1MB shared between both CPU cores. L2 access latency is improved tremendously with the new architecture. In some cases I measured L2 latency 1/2 that of what I saw with Swift.

The A7’s memory controller sees big improvements as well. I measured 20% lower main memory latency on the A7 compared to the A6. Branch prediction and memory prefetchers are both significantly better on the A7.

I noticed large increases in peak memory bandwidth on top of all of this. I used a combination of custom tools as well as publicly available benchmarks to confirm all of this. A quick look at Geekbench 3 (prior to the ARMv8 patch) gives a conservative estimate of memory bandwidth improvements:

Geekbench 3.0.0 Memory Bandwidth Comparison (1 thread)
  Stream Copy Stream Scale Stream Add Stream Triad
Apple A7 1.3GHz 5.24 GB/s 5.21 GB/s 5.74 GB/s 5.71 GB/s
Apple A6 1.3GHz 4.93 GB/s 3.77 GB/s 3.63 GB/s 3.62 GB/s
A7 Advantage 6% 38% 58% 57%

We see anywhere from a 6% improvement in memory bandwidth to nearly 60% running the same Stream code. I’m not entirely sure how Geekbench implemented Stream and whether or not we’re actually testing other execution paths in addition to (or instead of) memory bandwidth. One custom piece of code I used to measure memory bandwidth showed nearly a 2x increase in peak bandwidth. That may be overstating things a bit, but needless to say this new architecture has a vastly improved cache and memory interface.

Looking at low level Geekbench 3 results (again, prior to the ARMv8 patch), we get a good feel for just how much the CPU cores have improved.

Geekbench 3.0.0 Compute Performance
  Integer (ST) Integer (MT) FP (ST) FP (MT)
Apple A7 1.3GHz 1065 2095 983 1955
Apple A6 1.3GHz 750 1472 588 1165
A7 Advantage 42% 42% 67% 67%

Integer performance is up 44% on average, while floating point performance is up by 67%. Again this is without 64-bit or any other enhancements that go along with ARMv8. Memory bandwidth improves by 35% across all Geekbench tests. I confirmed with Apple that the A7 has a 64-bit wide memory interface, and we're likely talking about LPDDR3 memory this time around so there's probably some frequency uplift there as well.

The result is something Apple refers to as desktop-class CPU performance. I’ll get to evaluating those claims in a moment, but first, let’s talk about the other big part of the A7 story: the move to a 64-bit ISA.

A7 SoC Explained The Move to 64-bit
Comments Locked

464 Comments

View All Comments

  • darkcrayon - Wednesday, September 18, 2013 - link

    The 5c maybe. The 5s looks like a pretty awesome upgrade, especially from the 4s.
  • abbati - Wednesday, September 18, 2013 - link

    I think the names of the iPhone is going to be more streamlined now... there'll be an iPhone 6s and 6c. It would'nt make sense to have an iPhone 6c and an iPhone 6.

    Great Review as always.... Thanks for all your effort!
  • ddriver - Wednesday, September 18, 2013 - link

    Hate to be that guy again, but isn't anyone else gonna touch the fact that iOS and Android use entirely different JS engines? Comparing apples to oranges much?

    How about more native benches comparing to other arm chips? The tegra 4 bench on Engadget shows tegra 4 being faster than this chip. Conveniently enough, geekbench only compares the old apple chips to the new and the new chip between 32 and 64 bit modes.

    Nice try anand... For a moment there you almost fooled me ;)
  • A5 - Wednesday, September 18, 2013 - link

    Shield has active cooling. I'd be shocked if it puts up those numbers in a smartphone form factor.

    Are there even any announced Tegra 4 smartphones coming?
  • ddriver - Wednesday, September 18, 2013 - link

    But still, tegra 4 is still ancient v7 32bit architecture. You mean a tiny fan is all it takes to diminish the advantage of apple's great chips?
  • darkcrayon - Wednesday, September 18, 2013 - link

    You mean a larger device running a higher clocked chip and using more power is all it takes? Yes, that is all it takes. The fact anyone would compare what's in the physical limitations of the 5s vs the Shield is pretty telling in favor of how great the A7 is.
  • ddriver - Wednesday, September 18, 2013 - link

    Said the guy who named himself after an apple chip LOL.
  • UpSpin - Wednesday, September 18, 2013 - link

    I agree with you. It's stupid to use browser benchmarks as a measure of the CPU performance. It heavily depends on the used browser, version, and OS. You can't even use browser benchmark to compare the CPU performance of devices running Android 2.3 to those running Android 4.0 with them running Android 4.3. How in the world shall it be legitimate to compare them between two totally different systems.
    And finally, iOS is closed source and totally restricted, Apple can do whatever they want with it and no one would know (like using different JS/Browser versions on iOS7 depending on the used SoC or device, like using a more optimized version for A7 than they use for A6)

    To measure the raw CPU core! performance one's only option is number crunching benchmarks like it gets done on the PC (Prime, ...)
    To test the whole SoC, which is a collection of memory, I/O, CPU, GPU, ... across different devices with totally different software, one has to rely on other benchmarks, similar to how it gets done on the PC world. But there you also don't say that a Windows PC using IE is magnitudes slower than a Mac using the identical hardware but the faster Safari.

    A browser benchmark is a browser benchmark, nothing more and not in the slightest a CPU benchmark.
  • Dug - Wednesday, September 18, 2013 - link

    Apples to Oranges? Yes. They are different platforms. Does that bother you?

    Yes they do use different JS engines. They also use different OS's.
    It also shows how Apple is able to optimize it's own use of it.

    It also trounced everything at Google's Octane Benchmark.
    It also beet everything in Browsermark.

    From your comment, you seem to want everyone to use the least common denominator instead of optimizing for their own system. Why?
  • Krysto - Wednesday, September 18, 2013 - link

    Anand, I think you're wrong about the reason why the new iPhone GPU sucks in physics. You said it's because it has half the CPU cores.

    But hang on a minute - isn't that a GPU test? Also, isn't it true that some GPU makers dedicate space for parts that are better at physics? I think I read something about Adreno 330 being much better at physics, kind of like those Mali T678 or whatnot. Physics is about GPGPU, too, not just CPU, is it not?

Log in

Don't have an account? Sign up now