Improving The Exynos 9810 Galaxy S9: Part 1
by Andrei Frumusanu on April 4, 2018 10:00 AM EST- Posted in
- Mobile
- Samsung
- Smartphones
- SoCs
- M3
- Exynos 9810
- Galaxy S9
Last week we published our Galaxy S9 and S9+ review, and the biggest story (besides the device) was the difference between the processors used inside the two variants: the Snapdragon 845 SoC and Exynos 9810 SoC. Based on earlier announcements, we had large expectations from the Exynos 9810 as it promised to be the first “very large core” Android SoC. However, our initial testing put a lot of questions on the table. Unfortunately the synthetic performance of the Exynos 9810 did not translate well at all into real life performance. As part of the testing, I had discovered that the device’s scheduler and DVFS (dynamic voltage-frequency scaling) configurations were atrociously tuned.
For the full review I had opted not to go modify the device, as that initial review was aimed as a consumer oriented article, and it would serve little purpose to readers (that and time constraints). Still I promised I would follow up in the coming weeks and this is the first part of what’s hopefully a series where I try to extract as much as possible out of the Exynos 9810 and alleviate its driver situation.
After rooting the device and getting a custom kernel up and running, one of the first things I did was to play around with the hotplugging mechanism that I explained in the review article was what is in my opinion the most damaging to performance.
One of the benefits the custom kernel is pushing the frequencies, and so after a bit of fun playing around a bit I achieved GeekBench multi-core records at 4x 2496MHz on the M3 cores. Trying to run the M3 cores at 2704 MHz consistently crashed the phone, so I decided it’s better to start at the bottom and work our way up the performance scale.
Configuration Overview And SPEC
Samsung Galaxy S9 (E9810) Kernel Comparison and Changelog |
||
Version | Changes and Notes | |
Official Firmware | As Shipped | - Stock setup and behavior - Single Core M3 at 2704 MHz - Dual Core M3 at 2314 MHz - Quad Core M3 at 1794 MHz |
'CPU Limited Mode' |
- Optional Samsung-defined CPU Mode in Settings - CPU limited to 1469 MHz - Memory controller at half-speed - Conservative Scheduler |
|
Custom Config 1 | - Start with 'As Shipped' Firmware - Remove hotplugging mechanism - Limit M3 frequency peak to 1794MHz at any loading |
For the first round of testing, I judged the hotplugging mechanism as a whole is better off to be completely disabled, as it just doesn’t manage to serve its purpose. The frequency for the M3 cores was set fixed at 1794 MHz, the same frequency as the shipping kernel gives when all four M3 cores are in play. Technically, based on frequency alone, this should theoretically result in the same multi-threaded performance, and worse single core performance.
And here’s where the interesting things start. At 1794MHz the Exynos M3 performed extremely well and ended up above my expectations in synthetic benchmarks. Starting off with SPEC:
In the Galaxy S9 review article I had made the hypothesis that for the equivalent peak performance of the Snapdragon 845, the Exynos 9810 would lose in efficiency. In the review, I extrapolated the energy increase with as the frequency rises using the CPU Limiter figures at 1469MHz. Based on those first results, I suggested that the Exynos 9810 would need to be at 2.1-2.3GHz to match the Snapdragon 845 in SPEC.
To my surprise, the results at 1794MHz closely matched the Snapdragon 845’s peak figures within a couple of percentage points. Not only did the 9810 match the 845 in performance, but the efficiency metrics were also beating expectations. In SPECint, the Exynos now matches the energy usage of the Snapdragon. In SPECfp, the Exynos even now retains a 15% efficiency lead.
This time I included back a second graph for SPEC – I’m still conflicted to show the data here this way, as the metric is 'work per time per energy' or 'performance per energy' and I’m still struggling to come up with an immediately intuitive explanation as what is being showcased, I ended up calling it 'Performance Efficiency' as the most fitting term. Readers here need to carefully disambiguate the term “efficiency” between total energy usage (representative of battery usage) and this “performance efficiency” metric which is more ethereal in its meaning.
How this difference between the three setups is happening is something that I at the moment attribute to three factors: the removal of the hotplugging mechanism removes any ST performance noise from other threads, the M3 cores are being better utilized at the lower frequencies, and we’re just seeing some gains in thermal limited workloads such as libquantum.
The second reason is I think the most important here as the M3 cores are better served by the conservative memory controller DVFS. This is extremely noticeable in the SPECfp results, as the 1794 MHz scores end up with a 'SPECspeed per GHz' ratio (a measure of performance efficiency) of 12.37 - this is compared to a ratio of 9.94 for the stock kernel runs. This translates to a 24% increase in measured IPC efficiency. The SPECint results also improve that performance/GHz ratio, rising from 8.05 to 9.53, correlating to an 18% increase.
In the earlier SPECspeed/Joules (Performance Efficiency) graph, we see how the 1794MHz results fare better than the stock and CPU limiter results. This difference isn’t displayed in the total energy or perf/W metrics so there is credit to showcasing the results this way.
Overall, with these current best-case results, the M3 cores offer a 51% IPC advantage over the Kryo 385 Gold cores in the Snapdragon. This is a lot nearer to what we expected from the core based on the initial announcements and microarchitecture.
System Performance
I argued in the Galaxy S9 review that the real-world performance of the Exynos variant was hampered by software and that workloads never really got to make use of the raw performance. Let’s quickly revisit the benchmarks and see what the results of the first changes in the kernel running at 1794 MHz.
PCMark’s Web Browsing 2.0 test sees a significant 22% boost in performance over the stock configuration. Again I argued that the hotplugging mechanism was in many cases extremely detrimental to real-world performance as its coarse nature simply did not manage to efficiently control the finer-grained behaviour of threads in many workloads. The result here speaks for itself, as even though the new custom kernel significantly limits raw single threaded CPU performance compared to stock, the web test just does so much better.
The video editing test saw a small performance degradation here, I will try to investigate this more later on.
Photo editing test saw an increase in performance.
The Writing 2.0 and Data Manipulation tests saw degradations, with the latter one seeing the biggest downside of the frequency limiting. I’m extremely confident that I’ll be able to recoup these degradations with further tuning as I haven’t yet touched the scheduler and DVFS behaviours of the phone.
Finally, in the web tests the performance of this first custom configuration again provides a better score than the stock behaviour, which is ironic, as web workloads are meant to be the one scenario where the ST performance is meant to manifest itself the most. I’ll refer back to my “The Mobile CPU Core-Count Debate: Analyzing The Real World” from a few years back where we saw that this actually might not be the case and today’s browsers are actually more multi-threaded as one might think.
The issue again with the Exynos 9810’s current mechanism in the S9 is that it’s trying too hard, and biasing too much towards ST performance at the cost of MT performance. As a result, as our results show, the whole thing falls down like a house of cards.
Battery Life Regained
Finally, the most important metric is to see how the S9 fares in terms of battery life under these settings.
The Exynos S9 was able to improve by 20% and regain 1h23m in our web browsing test which is a good improvement given the low effort. Given the fact that the changes made with the this first custom kernel were, in the vast majority of scenarios, favourable for web browsing performance, this feels to me like another nail to the coffin towards Samsung’s mechanisms. The higher frequencies on the M3 CPU as provided in the shipping kernel simply aren’t beneficial in real-world scenarios, as the software just doesn’t (and likely just can’t) make proper use of them.
Today’s results are just meant as a quick preview of the Exynos S9’s behaviour when removing what I believed the most offending parts of Samsung’s mechanisms. The results are very promising as we are essentially getting real-world performance boosts while at the same time seeing efficiency improvements and battery life improvements.
There is still a lot to be done - I haven’t really touched the scheduler or DVFS scaling logic, but I’m confident that it’s possible to improve things further. It’s unlikely that we’ll end up matching the Snapdragon 845 variant in battery life, as there are efficiency curve considerations at lower performance points that simply cannot be changed through software. But closing the gap as much as possible seems to be an attainable short-term goal.
We have not yet had official commentary from Samsung on the situation yet, but the likelihood that we’ll see significant changes introduced through firmware updates seem slim to me. For now, unless you’re one of the very few enthusiast people into modifying your mobile devices, then the above results are relatively meaningless for a consumer and should not be used for purchasing guidance, for which I refer back to my conclusions in the S9 and S9+ review.
Original carousel image credit to Terry King
65 Comments
View All Comments
jjj - Wednesday, April 4, 2018 - link
But you don't have a battery life result for S9 with SD845. Comparing it with the S+ is not ok and not highlighting that is also misleading as many won't notice.You can measure power for both like you did for SPEC but in browsing since , in this case, the goal is to look at the SoC.
Will you compare the small cores too in part 2? Just to see if part of the difference comes from cores and SoC. Somewhat related to this , the graphs with power for 1 to 4 cores were nice, maybe they stage a comeback?
Andrei Frumusanu - Wednesday, April 4, 2018 - link
> Comparing it with the S+ is not ok and not highlighting that is also misleading as many won't notice.I'm not comparing it to the S9+ anywhere here in this article. All the highlighted devices are the E9810 S9. And we don't have a S845 S9, simple as that.
> Will you compare the small cores too in part 2?
I'll try at some point but it will involve some tricks for the Snapdragon variant as it can't be rooted.
> Somewhat related to this , the graphs with power for 1 to 4 cores were nice, maybe they stage a comeback?
The single core power posted in the S9 review seems to just scale near linearly with cores as far I've seen.
jjj - Wednesday, April 4, 2018 - link
Andrei, you do compare it when you say that you think you can't catch up. You don't claim to have the result but I haven't even noticed in the review that the SD845 result is for the + (it is highlighted there). Now I had to check 3 times and look for the S9 with SD845 to make sure I am not missing it, then went to the review to see if you list it there and forgot it here. Not saying it's intentional but it is confusing.The core count scaling is tested with the power virus and how does memory bandwidth scale with that?
If it's not the small cores and uncore, that leads to the difference in efficiency in SPEC vs browsing, maybe it's the scheduler and DVFS or maybe the task, hopefully you figure it out.
jjj - Wednesday, April 4, 2018 - link
One unrelated question, any idea how OLED power consumption varies between the S9 and S9+? Same number of pixels but different area and never thought about this before. Are the pixels same size so you end up with some power?jjj - Wednesday, April 4, 2018 - link
nm on the display question, luminance is expressed on a per area unit so power will scale with area at same luminance.lopri - Wednesday, April 4, 2018 - link
They use identical components sans display and battery. It isn't much of an ask to extrapolate it yourself when the reviewer clearly stated that he did not have the device in the foregoing article.As to multi-core power scaling, I do believe OEMs set hard limits on power per core usage, so it is more of a device test than a power or bandwidth scaling test.
eastcoast_pete - Wednesday, April 4, 2018 - link
Great work Andrei - shame on you, Samsung! Andrei just fixed or at least improved a sizable number of your self-inflicted shortfalls, maybe you, Samsung, should bring him on board as a (paid!) consultant.Apart from that, this article confirms my view of Samsung: do the hardware, stay away from the software - please!
[ECHO] - Wednesday, April 4, 2018 - link
Amazing work Andrei![ECHO] - Wednesday, April 4, 2018 - link
Kinda makes you wonder if, with somewhat simple tweaking like you've done, this wouldn't make a fantastic Chromebook chip given a higher TDP and better thermals...Spunjji - Thursday, April 5, 2018 - link
If they don't use it as such I'll be bitterly disappointed. Looks to have the chops to rival those godawful number-mangled m7/5/3 chips from Intel!