Over the last several weeks there’s been increasing discussions in the AMD enthusiast community about how the company’s new Ryzen 3000 processors interact with Windows, and in particular on how the new CPUs’ boost behaviour behaves in relation to a discrepancy between what tools such as Ryzen Master showcase as the best CPU cores, and what operating systems such as Windows interpret as being the best CPU cores. Today AMD is officially commenting on the situation and why it arises, whilst also describing what they’re doing to remedy the discrepancies in the data.

AMD’s Ryzen 3000 processor line-up based on the new Zen2 microarchitecture differs greatly in terms of frequency behaviour compared to past products. The new chips are AMD’s first products to make use of an ACPI feature called CPPC2 (Collaborative Power and Performance Control 2) which is an API interface between the chip’s firmware (Essentially the UEFI BIOS and AGESA) and an operating system such as Windows. The interface allows for the hardware to better communicate its frequency and power management characteristics and configurations to the OS, such that the OS’s software DVFS and scheduler are able to more optimally operate within the capabilities of the hardware.

AMD particularly uses the CPPC2 interface to communicate to the operating system the notion of “preferred cores”, which in essence are the CPU cores that under normal operation will be achieving the highest boost frequencies.

The issue at hand is that AMD also communicates another set of data via its own “Ryzen Master” tools and proprietary APIs, and it’s the relationship between these “best cores” and CPPC2’s “preferred cores” that caused for a bit of confusion ever since the original July launch.

AMD’s Ryzen Master overclocking application has had the unique characteristic that it was able to showcase to the user the “best cores” in a given processor. This was done by the means of a “star” and “dot” marking in the core status UI in the app – in the above example in my system, Ryzen master marks the best CPU core within a CCX with a star, and the second-best core with a dot. Furthermore, the best CPU cores in the whole processor is marked with a gold star.

Since the original launch, and AMD’s release of their SDK to allow third-party developers to poll data from AMD’s proprietary AGESA API for more detailed core information, we’ve also seen also other third-party tools such as HWInfo being able to showcase the ranking of the “best cores” in a processor. The information here essentially matches what Ryzen Master currently displays.

The Discrepancy Between Ryzen Master / SMU Data And CPPC2

The discrepancy that’s been discussed by the community in recent weeks, and that’s been prevalent since the launch of the Ryzen 3000 series in July, is that in the majority of situations and setups, the actual CPU cores that are being loaded in the operating system under single-threaded or lightly threaded workloads mostly never matched the best CPU cores as reported by Ryzen Master. This can be seen with any generic monitoring utility such as the task manager.

Single-Threaded Load - Note Average Frequency Load on Core 2 & 3 Instead of "Best Core" 4

The discrepancy here lies in the actual mapping between the “best cores” information in Ryzen Master and the SMU APIs, and the “preferred cores” mapping that AMD’s firmware communicates via CPPC2 to the operating system.

The easiest way to actually view the configuration settings that CPPC2 communicates to Windows is to view the corresponding “Kernel-Processor-Power” System Windows Logs entries in the Windows Event Viewer, as depicted above in the screenshot.

If we were to take my own system as an example, showcasing my 3700X’s data reported across both CPPC2 and the Ryzen Master / SMU API, we see the following layout:

Andrei's 3700X Example
CPPC2 Data /
SMU Data /
API Ranking
0 148 / #3   #6
1 144 / #4   #8
2 151 / #1/2 Dot #3
3 151 / #1/2 Star #2
4 140 / #5 Gold Star #1
5 136 / #6   #5
6 132 / #7 Dot #4
7 128 / #8   #7

What AMD does here is to kind of abuse the CPPC2 per-core maximum frequency entries in order to achieve a hierarchy in the CPUs, ordering them from the preferred capable cores to the least preferred ones by altering their reported maximum frequency capability. The actual metric here is an arbitrary scale of frequency percentages above the base clock of the CPU. AMD explains that the scale and ratio here are also arbitrary (in fixed 3% steps) and does not correspond to the actual frequency differences between the CPU cores, it is just used as a representation of the ordering of the CPUs.

The oddity here is that there’s a discrepancy between CPPC2 and Ryzen Master data. For example, even though RM showcases that core 4 on my system is the best one, it’s only ranked as #5 in the CPPC2 data. The ranking order of the CPPC2 data in fact better corresponds to what we’re actually seeing in actual workloads on the operating system, which shouldn’t be too big a surprise as this is the basis on which the OS makes scheduling decisions.

The Clarification Between Ryzen Master "Best Cores" and CPPC2 "Preferred Cores"

To start off, the whole situation can be summed up with the following quote from AMD’s blog post today:

“This [Ryzen Master] star does NOT necessarily mean it is the fastest booster”

AMD goes on to clarify that there’s only a very loose relationship between what Ryzen Master showcases as the “best cores”, and what the AGESA firmware decides is the “preferred cores” that are communicated to the operating system.

The “best cores” as defined by Ryzen Master corresponds to the electrical characteristics of the cores which are determined during the binning and testing phase of the chips at the factory. The physical and electrical characteristic data is fused on each chip, and is interpreted by the SMU firmware to create the ranking between the cores independent of other factors, and this ranking is what’s currently being reported by Ryzen Master as well as third-party application readouts out through the custom API.

The ranking of the “preferred cores” as characterized by CPPC2 does not directly correspond to the electrical characteristics of the cores, and further takes into account many other factors of the chip layout. The biggest factor at hand here that affects the choice of the highest performing preferred cores in the system, is that AMD is aiming to accommodate Windows’ scheduler core rotation policy.

Currently when there’s a single large CPU thread workload running, the Windows scheduler tries to rotate this thread between a pair of physical CPU cores. The rationale for this is thermal management, as switching between two cores would distribute the latent heat dissipation between the cores and reduce the average temperature of each core, possibly also improving power draw and maximum achievable frequencies.

For AMD’s CCX topology, this rotation policy however poses an issue as it wouldn’t be very optimal for a thread to switch between CPU cores which are located on different clusters, as there would be a large performance penalty when migrating across the two cores on different L3 caches. Taking this hardware limitation into account, AMD’s firmware “lies” about the CPPC2 data to the OS in order to better optimize the schedulers behavior and attempting to achieve better overall performance.

In my example above, the AGESA reports to the OS that core 2 and 3 are the fastest in the system, even though core number 4 is electrically/physically the fastest core. The choice here by the firmware is done by selecting the highest average frequency achieved by two cores within a CCX. In my case, this would correspond to cores 2 and 3, which are electrically ranked as #2 and #3.

While AMD’s explanation currently does map out for the two fastest cores in the system, the actual ranking in the CPPC2 data is furthermore impacted by other aspects. Again, in my system example above, we can see that even though electrically cores 0 and 1 are quite bad, with core 1 actually being the worst in the system, they’re still ranked in CPPC2 as being faster than all the cores in the second CCX. The reasoning for this is that again AMD is sort of abusing the CPPC2 frequency characterization data in order to force the Windows scheduler to first fill out CPU cores on the first CCX before having activity scheduled on the second CCX.

To complicate things even further, there’s other invisible factors at hand that impact the CPPC2 ordering. Taking my system as an example again, it doesn’t quite make sense for CPU core 5 to be ranked higher than core 6, as the electrical characteristics of the cores are in fact being described as being the opposite way around.

I had a theory that the firmware would possibly prevent the ranking of cores to be sequential to each other if the corresponding physical cores would be adjacent to each other. Indeed, AMD confirms that local thermal management is also part of the decision making of the CPPC2 ranking:

"[The firmware] mixes in additional requirements to optimize user performance: individual core characteristics, overall CCX performance, cache awareness, overall CPU topology, core rotation, localized thermal management, lightly-threaded performance counters and more."

The Direcrepancy "Remedy" And Conclusion - A Compromise

AMD’s official stance on the Ryzen Master discrepancy is the following:

“Ryzen Master, using [the same] firmware readings, selects the single best voltage/frequency curve in the entire processor from the perspective of overclocking. When you see the gold star, it means that is the one core with the best overclocking potential. As we explained during the launch of 2nd Gen Ryzen, we thought that this could be useful for people trying for frequency records on Ryzen.”

“Overall, it’s clear that the OS-Hardware relationship is getting more complex every day. In 2018, we imagined that the starred cores would be useful for extreme overclockers. In 2019, we see that this is simply being conflated with a much more sophisticated set of OS decisions, and there’s not enough room for nuance and context to make that clear. That’s why we’re going to bring Ryzen Master inline with what the OS is doing so everything is visibly in agreement, and the system continues along as-designed with peak performance.”

In essence, AMD will be changing the display in Ryzen Master from showing the “best cores” – as in the best electrically performing ones, to the actual “preferred cores” as declared in CPPC2 and decided by the firmware after taking the more complex factors into consideration.

The current system sort of makes sense given the current Windows scheduler behavior. The issue I see is that currently the Windows scheduler is still relatively stupid – for example instead of spreading around processes across CCXs, it will still try to first group them onto a CCX before starting up the second CCX. Grouping makes sense in terms of threads of a single process, but different processes could be and should be spread across more CCXs in order to maximize per-CCX performance.

There remain some other issues – for example the current configuration even though there would be CPU cores in a secondary CCX that would be capable of higher frequencies, they never see these being achieved as the scheduler will use the "preferred" CCX first, and when processes and threads start to spill over to the next CCX, the requested DVFS frequency will always be lower. On my system I noted a 50MHz margin just left on the table just because of this.

There’s also the question of how this affects Linux users. Linux doesn’t have a core rotation policy in its default scheduler behavior, and it does very well in terms of scheduling across more complex CPU topologies, so in essence AMD's firmware “lying” about the core performance ranking here will possibly slightly impact peak achievable performance by a few MHz.

Generally I recommend ignoring Ryzen Master as a monitoring tool as its display abstractions of frequency and voltage just do not correspond to the actual physical characteristics of the hardware. I also hope that in the future Microsoft will be able to further improve the Windows scheduler to better deal with more complex CPU topologies such as AMD’s Ryzen processors, as there’s still a lot of optimizations that could be achieved.

Overall, the whole topic at hand was more of a storm in a tea-cup and misinterpretation of data, rather than a major technical issue on how the boost mechanism works. 

Related Articles:



View All Comments

  • npz - Friday, November 22, 2019 - link

    MS laid off a lot of core engineering people, QA teams, etc which you can always read up on thelayoff.com . They're too focused on frivoulous stuff, Xbox entertainment, diversity training, sales and marketing, and even with the focus and hiring on cloud, that's way up the stack, very high level, not the kernel and system level needed here. Reply
  • PeachNCream - Friday, November 22, 2019 - link

    Layoffs are not a reliable indicator of the company's loss or retention of capabilities nor does it provide much of a window (pun intended) into new hiring or corporate focus. Reply
  • npz - Friday, November 22, 2019 - link

    well I don't just mean raw unqualified layoff numbers, but also anecdotal though reliable testimony from laid off employees from thelayoff, glassdoor, and what was told to me via 2nd party about who's affected. I don't know going forward if they'll rehire people back but it's been 2.5+ years now and AMD still hasn't been able to convince MS to push the necessary scheduler changes through. Part of that requires a large amount of QA / regression testing, but as I mentioned, they already laid off a large portion of the QA team (hey, public beta testing via the fast / test Windows ring should suffice, right?), so that's part of the reason IMO Reply
  • PeachNCream - Saturday, November 23, 2019 - link

    Ah my apologies then. I didn't know you had good source information into the problem. If that's the case, its concerning that Microsoft is not working to address the issue. I wonder what might be happening. You'd think that MS would want to retain that sort of talent because CPU technology is not standing still. Making those sorts of changes should be an on-going process that needs the right people in place all of the time. Reply
  • PeachNCream - Thursday, November 21, 2019 - link

    Typo in the table. Second column reads "Rryzen" so thereofre has an extra letter r that is not necessary. Reply
  • mikato - Thursday, November 21, 2019 - link

    Wow, well finally this is better understood now. Thank you for actually asking the questions and getting some real info.

    So for the Windows scheduler - I feel like there should be a tool in Windows, either manually run at the user's option or completely automatic and unseen, that allows you to run a small test workload that includes 1,2,3,X... thread workloads and determines the optimal core usage in each scenario. It then stores that config info however needed to be used in a very basic level by the (improved) Windows scheduler. If that info exists, the scheduler loads that when starting and uses it.

    Now it wouldn't be perfect, and maybe you'd get different results with shorter/longer workload runtimes, but it would be better.
  • jospoortvliet - Friday, November 22, 2019 - link

    Problem is also that the windows scheduler is pluggable and games for example often have their own. And all those also dont handle anything other than a normal Intel very well... it is a mess. Reply
  • gamerk2 - Friday, November 22, 2019 - link

    The problem is different thread workloads have different performance profiles. There's really no clean way to handle this sort of situation.

    You also make the classic mistake of assuming no other program is running in the background, which would immediately break your assumptions about thread management. That's why stuff like this really has to be handled automatically by the OS.
  • mikato - Tuesday, November 26, 2019 - link

    Well hey, so yes I provided a simple view there, and I can imagine things are quite complicated. But I didn't say it shouldn't be handled automatically by the OS.

    I still think it could be useful to have some sort of pre-testing. Imagine 90% of a particular CPUs time is spent in one app's workload and the other 10% is spent in everything else. Maybe it's some distributed computing thing. Run some tests trying different configurations for placing threads on the cores/CCXs/etc. It will perform better in some situations than others and that could really save computing time. If both the app and the OS are witness to this result, then I don't see why the OS shouldn't use that info.

    Sure that is a specific example, and I definitely don't know how the details and responsibilities would be worked out for the pre-testing and what could be done to generalize it. But it shows it can be useful, right? All it takes is one example to show that.

    The above commentor jospoortvliet even said that the Windows scheduler is pluggable and games can have their own setup. If an app can suggest its preferred scheduling to Windows, then that tells me it's already a short way down this path. Perhaps the app could do its own pre-testing before suggesting it's preferred scheduling, and/or perhaps it could evaluate on the fly and update its suggestion periodically throughout a workload.
  • bcronce - Thursday, November 21, 2019 - link

    Getting this "perfect is the enemy of good" in this whole turbo maximizing issue. Core X is 0.5% faster than Core y. Lets complicate the scheduling to fix this injustice. Not that any of this benefits when the system is at 100% load since all cores are in use. Reply

Log in

Don't have an account? Sign up now