Improving The Exynos 9810 Galaxy S9: Part 1
by Andrei Frumusanu on April 4, 2018 10:00 AM EST- Posted in
- Mobile
- Samsung
- Smartphones
- SoCs
- M3
- Exynos 9810
- Galaxy S9
Last week we published our Galaxy S9 and S9+ review, and the biggest story (besides the device) was the difference between the processors used inside the two variants: the Snapdragon 845 SoC and Exynos 9810 SoC. Based on earlier announcements, we had large expectations from the Exynos 9810 as it promised to be the first “very large core” Android SoC. However, our initial testing put a lot of questions on the table. Unfortunately the synthetic performance of the Exynos 9810 did not translate well at all into real life performance. As part of the testing, I had discovered that the device’s scheduler and DVFS (dynamic voltage-frequency scaling) configurations were atrociously tuned.
For the full review I had opted not to go modify the device, as that initial review was aimed as a consumer oriented article, and it would serve little purpose to readers (that and time constraints). Still I promised I would follow up in the coming weeks and this is the first part of what’s hopefully a series where I try to extract as much as possible out of the Exynos 9810 and alleviate its driver situation.
After rooting the device and getting a custom kernel up and running, one of the first things I did was to play around with the hotplugging mechanism that I explained in the review article was what is in my opinion the most damaging to performance.
One of the benefits the custom kernel is pushing the frequencies, and so after a bit of fun playing around a bit I achieved GeekBench multi-core records at 4x 2496MHz on the M3 cores. Trying to run the M3 cores at 2704 MHz consistently crashed the phone, so I decided it’s better to start at the bottom and work our way up the performance scale.
Configuration Overview And SPEC
Samsung Galaxy S9 (E9810) Kernel Comparison and Changelog |
||
Version | Changes and Notes | |
Official Firmware | As Shipped | - Stock setup and behavior - Single Core M3 at 2704 MHz - Dual Core M3 at 2314 MHz - Quad Core M3 at 1794 MHz |
'CPU Limited Mode' |
- Optional Samsung-defined CPU Mode in Settings - CPU limited to 1469 MHz - Memory controller at half-speed - Conservative Scheduler |
|
Custom Config 1 | - Start with 'As Shipped' Firmware - Remove hotplugging mechanism - Limit M3 frequency peak to 1794MHz at any loading |
For the first round of testing, I judged the hotplugging mechanism as a whole is better off to be completely disabled, as it just doesn’t manage to serve its purpose. The frequency for the M3 cores was set fixed at 1794 MHz, the same frequency as the shipping kernel gives when all four M3 cores are in play. Technically, based on frequency alone, this should theoretically result in the same multi-threaded performance, and worse single core performance.
And here’s where the interesting things start. At 1794MHz the Exynos M3 performed extremely well and ended up above my expectations in synthetic benchmarks. Starting off with SPEC:
In the Galaxy S9 review article I had made the hypothesis that for the equivalent peak performance of the Snapdragon 845, the Exynos 9810 would lose in efficiency. In the review, I extrapolated the energy increase with as the frequency rises using the CPU Limiter figures at 1469MHz. Based on those first results, I suggested that the Exynos 9810 would need to be at 2.1-2.3GHz to match the Snapdragon 845 in SPEC.
To my surprise, the results at 1794MHz closely matched the Snapdragon 845’s peak figures within a couple of percentage points. Not only did the 9810 match the 845 in performance, but the efficiency metrics were also beating expectations. In SPECint, the Exynos now matches the energy usage of the Snapdragon. In SPECfp, the Exynos even now retains a 15% efficiency lead.
This time I included back a second graph for SPEC – I’m still conflicted to show the data here this way, as the metric is 'work per time per energy' or 'performance per energy' and I’m still struggling to come up with an immediately intuitive explanation as what is being showcased, I ended up calling it 'Performance Efficiency' as the most fitting term. Readers here need to carefully disambiguate the term “efficiency” between total energy usage (representative of battery usage) and this “performance efficiency” metric which is more ethereal in its meaning.
How this difference between the three setups is happening is something that I at the moment attribute to three factors: the removal of the hotplugging mechanism removes any ST performance noise from other threads, the M3 cores are being better utilized at the lower frequencies, and we’re just seeing some gains in thermal limited workloads such as libquantum.
The second reason is I think the most important here as the M3 cores are better served by the conservative memory controller DVFS. This is extremely noticeable in the SPECfp results, as the 1794 MHz scores end up with a 'SPECspeed per GHz' ratio (a measure of performance efficiency) of 12.37 - this is compared to a ratio of 9.94 for the stock kernel runs. This translates to a 24% increase in measured IPC efficiency. The SPECint results also improve that performance/GHz ratio, rising from 8.05 to 9.53, correlating to an 18% increase.
In the earlier SPECspeed/Joules (Performance Efficiency) graph, we see how the 1794MHz results fare better than the stock and CPU limiter results. This difference isn’t displayed in the total energy or perf/W metrics so there is credit to showcasing the results this way.
Overall, with these current best-case results, the M3 cores offer a 51% IPC advantage over the Kryo 385 Gold cores in the Snapdragon. This is a lot nearer to what we expected from the core based on the initial announcements and microarchitecture.
System Performance
I argued in the Galaxy S9 review that the real-world performance of the Exynos variant was hampered by software and that workloads never really got to make use of the raw performance. Let’s quickly revisit the benchmarks and see what the results of the first changes in the kernel running at 1794 MHz.
PCMark’s Web Browsing 2.0 test sees a significant 22% boost in performance over the stock configuration. Again I argued that the hotplugging mechanism was in many cases extremely detrimental to real-world performance as its coarse nature simply did not manage to efficiently control the finer-grained behaviour of threads in many workloads. The result here speaks for itself, as even though the new custom kernel significantly limits raw single threaded CPU performance compared to stock, the web test just does so much better.
The video editing test saw a small performance degradation here, I will try to investigate this more later on.
Photo editing test saw an increase in performance.
The Writing 2.0 and Data Manipulation tests saw degradations, with the latter one seeing the biggest downside of the frequency limiting. I’m extremely confident that I’ll be able to recoup these degradations with further tuning as I haven’t yet touched the scheduler and DVFS behaviours of the phone.
Finally, in the web tests the performance of this first custom configuration again provides a better score than the stock behaviour, which is ironic, as web workloads are meant to be the one scenario where the ST performance is meant to manifest itself the most. I’ll refer back to my “The Mobile CPU Core-Count Debate: Analyzing The Real World” from a few years back where we saw that this actually might not be the case and today’s browsers are actually more multi-threaded as one might think.
The issue again with the Exynos 9810’s current mechanism in the S9 is that it’s trying too hard, and biasing too much towards ST performance at the cost of MT performance. As a result, as our results show, the whole thing falls down like a house of cards.
Battery Life Regained
Finally, the most important metric is to see how the S9 fares in terms of battery life under these settings.
The Exynos S9 was able to improve by 20% and regain 1h23m in our web browsing test which is a good improvement given the low effort. Given the fact that the changes made with the this first custom kernel were, in the vast majority of scenarios, favourable for web browsing performance, this feels to me like another nail to the coffin towards Samsung’s mechanisms. The higher frequencies on the M3 CPU as provided in the shipping kernel simply aren’t beneficial in real-world scenarios, as the software just doesn’t (and likely just can’t) make proper use of them.
Today’s results are just meant as a quick preview of the Exynos S9’s behaviour when removing what I believed the most offending parts of Samsung’s mechanisms. The results are very promising as we are essentially getting real-world performance boosts while at the same time seeing efficiency improvements and battery life improvements.
There is still a lot to be done - I haven’t really touched the scheduler or DVFS scaling logic, but I’m confident that it’s possible to improve things further. It’s unlikely that we’ll end up matching the Snapdragon 845 variant in battery life, as there are efficiency curve considerations at lower performance points that simply cannot be changed through software. But closing the gap as much as possible seems to be an attainable short-term goal.
We have not yet had official commentary from Samsung on the situation yet, but the likelihood that we’ll see significant changes introduced through firmware updates seem slim to me. For now, unless you’re one of the very few enthusiast people into modifying your mobile devices, then the above results are relatively meaningless for a consumer and should not be used for purchasing guidance, for which I refer back to my conclusions in the S9 and S9+ review.
Original carousel image credit to Terry King
65 Comments
View All Comments
ZolaIII - Wednesday, April 11, 2018 - link
For whose work? Mine is there for a long time now as a script only for a adjusted kernel & build tree (so that tunables are exposed to the user space in this manner). We never putted them as defaults nor builded the franken kernel for it it's still based on what used to be standard included sched kernel infrastructure and with default MSM in kernel hotplug solution (which is now separate kernel ko module). Kernel, build tree, device tree, vendor tree and script are all present on XDA and or Github. I don't believe Andrei is interested in pushing it so far and I am certain he will detailed explain what & how he did in the future article after he is finished.flar2 - Wednesday, April 4, 2018 - link
I've taken a different approach with ElementalX kernel. Instead of disabling the hogplug mechanism, I boosted the hotplug limits a bit. With two cores, the limit is increased from 2314 to 2496MHz and with three or more it is increased from 1794 to 1924MHz. This provides a substantial increase in multicore performance while maintaining single core performance. Overall, the subjective experience is a bit snappier and it doesn't destroy battery life.I experimented with disabling hotplugging, but the phone quickly becomes unstable unless the big cluster max frequency is reduced. It looks like when all four cores run at freqs over 2.0GHz, they draw too much power. Reducing the max frequency obviously reduces single core performance substantially.
Although Samsung's hotplug code is a bit convoluted, they are trying to achieve a compromise, balancing battery life, performance and stability. Ultimately, this is a hardware problem, the Exynos chip is too power hungry at the highest frequencies. The hotplug mechanism allows it to run at higher frequencies for single tasks. The alternative would have been to release the phone with a max frequency of somewhere between 1794 and 2314 MHz. Given that Samsung needs to be conservative due to chip variations, they would have struggled to go much higher than 2GHz (imagine the scandal if the hardware was unstable!)
No doubt more gains can be made, so I will continue experimenting as well.
MrCommunistGen - Wednesday, April 4, 2018 - link
Thanks for your continued work on the ElementalX kernel. I haven't used it since my Nexus 5, but it was really helpful back then.On another note, I think you've corroborated my knee-jerk suspicion that the reason for Andrei's device crashing when running 4x M3 @ 2.7GHz was a power delivery issue -- but I don't want to be putting words in your mouth or drawing conclusions you didn't intend.
Hypothetically (assuming power is the issue), I wonder if you could run the M3 cluster at a low enough voltage that it would be stable at 4x2.7GHz. Maybe you'd need to win the silicon lottery in a big way to do it. I'm just academically curious whether it is possible with this combination of chip design/lithography/power delivery/etc.
flar2 - Wednesday, April 4, 2018 - link
I'm quite sure it's a power delivery issue. They really had to push the voltages to reach freqs over 2.3. Heat could be a factor too, but I've had many phones that get a lot hotter than this.My device can't even complete a benchmark without rebooting at 4X2.4GHz.
Quantumz0d - Thursday, April 5, 2018 - link
This is what I thought, to supply more voltage and current to the processor the SoC hits its ceiling (voltage for heat, current for the HW limit), the glass make of the device I guess can be attributed to this. Also the conservative approach on the GPU arch. Limiting its power vs the power hungry CPU arch.But when efficiency drops down we have to either make sacrifice for the battery or the heat. In these small confined spaces I guess we are limited as well, but given your EX reputation I guess we can have a better final product by the customization. Unlike the Apple throttle battery gate and disastrous A11.
Also from what I remember unlike the SD800 days the newer 820 and up the voltage planes are complicated and hard for end user to tune, I used to UV/LiveOC on my Hummingbird in i9000 and OMAP in Gnex, seems like we cannot tune them much nowadays but still we can do it on these complex chips is great !
Much thanks Flar2. For the insight and your work. Always a admirer of great devs at XDA..If Iwanted to buy a successor to OP3, I will either see the 9810 S9+ or the 9810 Note9 if you develop for the latter.
ZolaIII - Wednesday, April 4, 2018 - link
What's the maximum sustainable frequency for two big cores running with high utilisation (on a long run)? That's also the answer how you would limit it so that it doesn't reach the point when it throttles down & reaches higher thermal leaking. This is actually a upper performance limit. The sustainable optimal one (or little higher then that) is the top one Samsung used on the little core's, really ideal should be about 1.3GHz so you could adjust those together by per frequency putin a load limit to interactive CPU governor. Regarding the hotpluging I still found two + two big - little core's active all the time as best possible solution, it's not a most power efficient one (which would be one big & all small ones) but is most suitable for all possible tasks. This is just add hock.MrCommunistGen - Wednesday, April 4, 2018 - link
WOW. Thanks Andrei! I'm looking forward to the rest of your updates! Hoping you can find a good happy median with performance and power with your DVFS and scheduler tunings (and running the M3 cluster at peak clocks higher than 1794MHz?)I think this really illustrates how hardware specs aren't everything. I know I'm not alone among enthusiasts in being disappointed that Qualcomm hasn't been able to compete with Apple SoCs in terms of IPC or single-threaded workloads. Nothing I do on my phone *needs* the performance, but I WANT IT ANYWAY! Similarly, I'm disappointed that their driver support for older SoCs doesn't last for very long. BUT at least they seem to get the hardware/software tuning *moderately* right on day-1 -- which is no small feat for a company as large as Qualcomm. I can see why they are so appealing for OEMs as a turnkey solution.
I guess my point is: We can complain that Qualcomm is holding back the Android ecosystem, but think of where we'd be without them.
On another note:
The last time I messed around with the likes of schedulers and DVFS was before big.LITTLE on my Nexus 5. My unit would thermally throttle pretty badly under any sustained load (long sessions of Temple Run) so I lowered voltages across the board as much as I could while maintaining stability and lowered peak clockspeeds. The gains were nothing drastic like you've demonstrated here, but I was pretty proud of myself at the time.
A5 - Wednesday, April 4, 2018 - link
Great article, looking forward to the other parts, even if I'm not in the market for this phone.Lorem Ipsum - Wednesday, April 4, 2018 - link
Great write up! Will you guys be willing to release your modified kernel? As one of those enthusiast people into modifying my mobile device I’d love to see some of these improvements.flar2 - Wednesday, April 4, 2018 - link
ElementalX has been available for a couple weeks already, with the changes I describe abovehttps://forum.xda-developers.com/galaxy-s9/samsung...