OS Preparation and Benchmark Installation

Windows 10 Pro

As we started to use Windows 10 Pro in our last update, there's a large opportunity for something to come in and disrupt our testing. Windows 10 is known to kick-in and check for updates at any hour of the day (and we’re testing 24hr), so anything that can interrupt or take CPU time away from benchmarking is a bit of a hassle. There’s also the added element of Windows silently adjusting the update schedule and moving places in the registry without warning.

During building this latest suite, Microsoft launched Windows 10 version 2004. There is always a question as to what we should do in this regard – move to the absolute latest, or take a step back to something more stable and fewer bugs but it might not be as relevant. In order to not create any level of programming debt, by which lots of work is needed to fix the smallest issues that might arise, we often choose the latter. In this regard, we are using Windows 10 version 1909 (18363.900). It has since transpired, from talking to peers, that 2004 has a number of issues that would affect benchmarking consistency, which validates our concerns.

Naturally, the first thing an OS wants to do when it starts up is connect to the internet and update. We install the OS without the internet connected, and our install image automatically sets the update period to the maximum period possible. The scripts we run are continuously updated to ensure that when the benchmark starts, the ‘don’t restart’ period for the OS is resynchronized to the latest possible time. There’s nothing worse than a restart in the middle of a scripted run to wake up in the morning to find that the system rebooted at 1am.

The OS is installed manually with most of the default settings, and disabling all the extra monitoring features offered on install. On entering the OS, our default strategy is multiple: disable the ability to update as much as possible in the registry, disable Windows Defender, uninstall OneDrive, disable Cortana as much as possible, implement the high performance mode in the power options and disable the platform from turning off the display. We also pull the latest version of CPU-Z from network storage, in case we are testing a very new system. Another script is in place to run when the OS loads, to check the CPU and GPU is what we expect, as well as the GPU drivers that we needed are in place, as Windows has a habit of updating those without saying anything. Windows Defender is also disabled, as it (personally) has historically seems to eat CPU time if the network changes for no reason, even when the system is in use.

Some of these strategies are designed to be redundant. The goal here is to attack the option needed in as many different ways as possible. There’s nothing lost by being thorough at this point and hammering the point home. This means executing registry files that adjust settings, executing batch files which do the same while installing files, and reiterating these commands before every benchmark run in order to be crystal clear. Simply put, do not implicitly trust Windows to leave the settings alone. Something always invariably changes (or moves somewhere else) if it is not monitored. Some of these commands that are in place are also old/legacy, but are kept as they don’t otherwise adjust the system (and can take effect if options that are continually moved around suddenly move back).

It is worth noting that some of the options, when run through a batch file, require the file to be run as Administrator. Windows 10 makes a frustrating task to do so manually recently without implementing user access elevation. The best way to ensure that the batch file always runs in admin mode seems to be to create a shortcut to the batch file, and adjusting the properties of the shortcut to always enable the ‘run as admin’ mode. It is an interesting kludge for that to work, and it is frustrating I cannot just adjust the batch file properties directly to run as admin every time.

Benchmark Installs

When choosing a benchmark, it often falls under two headers – standalone, such that it can be run as is, or ones that need installation. With installation, these are subdivided further into those with silent installers, and those who have to have the installation done manually.

Installing benchmarks can either be done before running the main script, or be integrated directly into the main testing script. As time has progressed, we have moved from the former to the latter, so we can wrap uninstall commands into the script if we only get limited access to a system. For the manually installed benchmarks this isn’t possible, and technically calling an install/uninstall from the script does make total testing time longer, but it also reduces requirements for SSD capacity by not having everything installed at once. Experience of doing this scripting over the past few years, and making the benchmark scripts as portable as possible, have pointed to making the install/uninstall part of the benchmark run.

Benchmarks that could be run without installing, known as ‘standalone’ benchmarks, are the holt grail. Cinebench and others are great for this. But for the others, these are probed for silent install methods. Certain benchmarks in the past, such as PCMark8, also have additional features to enable online registration to enable DRM through the command line. Other installers, such as .msi files, seem to be unable to be installed if they are not in the directory from which the batch file was called without the right commands. When scripting successive installs, it becomes important to check the previous one has finished before another one starts, otherwise the script might jump straight to the next installer before the previous ones were finished, making it tricky as well.

For msi files, our install code relies heavily on the following command to ensure that installs are finished before tackling the next one:

cmd /c start /wait msiexec /qb /i <file>

Most .msi files have the same flags for silent installs, however install executables can vary significantly and require probing the vendor documentation. For the most part, a ‘/S’ flag is the silent install flag, while others require /norestart to ensure the system doesn’t restart immediately, or /quiet, to get going in a silent fashion. Some installations use none of these and rely on their own definitions of what constitutes a silent install flag. I’m looking at you, Adobe. However ultimately, most software packages that can install silently, or require additional commands to enable licenses, and are ready to be called for their respective tests.

One benchmark is a special case: Chrome. Chrome has the amazing ability to update itself as soon as it is installed – even without opening it or when the system is booted. To stop this from happening is more than just a simple software adjustment, purely because Google no longer offers an option to delay updates. We initially found an undocumented way to stop it from updating, which requires the install script to gut some of the files after installing the software in order to stop this happening, however the quick update cycle of Chrome means that our v56 version from last year is now out of date. To get over this, we are using a standalone version of Chromium.

The final benchmark in our install is Steam, which is a fully manual only install. Valve has created Steam with a really odd interface interaction mechanism type, even for AHK scripting, which makes installing Steam a bit of a hassle. Valve does not offer a complete standalone installer here, so the base program opens after installation to download ~200MB of updates on a fresh system. We install the software over the Steam directory already present on the benchmark partition from a previous OS install, so the games do not need to be re-downloaded. (When an OS is installed, it’s installed on a specific OS partition, and all benchmarks are kept on a second partition).

One other point to be aware of is when software checks for updates. Loading AIDA, for example, means that it will probe online for the latest version and leave a hanging message box to be answered before a script can continue. There are often two ways to do this, and the best is if the program allows the user to set the ‘no updates’ automatically in the configuration files. The fall back tactic that works is to disable the internet connectivity (often by disabling all network adaptors through PowerShell) while the application is running.

Benchmark Automation The CPU Overload 2020 Suite
Comments Locked

110 Comments

View All Comments

  • Arbie - Monday, July 20, 2020 - link

    I din't realize how much work was being done. Thank you for maintaining this great resource.
  • Arbie - Monday, July 20, 2020 - link

    And maybe consider the technically excellent and easily benchmarked Ashes of the Singularity instead of the problematic Far Cry 5. Not as popular but modern and multi-core (and a great game).
  • BushLin - Monday, July 20, 2020 - link

    I suspect it's due to Far Cry 5's need for 8 threads which manifests in stutter for 6c6t CPUs in contrast to smooth gameplay on lower clocked 4c8t CPUs.
  • Tilmitt - Tuesday, July 21, 2020 - link

    Has anyone ever played Ashes as a game though?
  • Arbie - Tuesday, July 21, 2020 - link

    1000+ hours so far. Glorious in all respects including phenomenal AI. But when Ashes is mentioned someone always pops a comment like yours, which they probably just read somewhere else since it certainly isn't based on actual experience. Still hurts the game, though.
  • driscoll42 - Monday, July 20, 2020 - link

    This is awesome and amazing, I can't wait to see the results. And I hate to say "But what about", but maybe, if possible, go back to some of the popular older ones? No need to retest *everything*, but the most popular CPUs pre-2010 like the i7-920, Core 2 Quad Q6600, Core 2 Duo E8600, Core i7-870, etc...
  • ltcommanderdata - Monday, July 20, 2020 - link

    If they are going to test a few LGA 775 CPUs, I'd vote to also include NetBurst's last gasp, aka the Pentium Extreme Edition 965 as a really old gen reference. It'd be interesting to include it's then competitor, the Socket 939 AMD Athlon 64 FX-60, as well. I've always been curious whether Hyperthreading support allowed the Pentium EE 965 to age better than expected as multithreading became mainstream and possibly reduce the gap against the FX-60 and even early Core 2 Duo Conroe CPUs in modern software compared to the gap seen at launch.
  • mganai - Thursday, July 23, 2020 - link

    How about the dual socket LGA 771 with two Core 2 Extreme QX9650s?

    https://www.youtube.com/watch?v=wNo7qoLRtkQ
  • aryonoco - Monday, July 20, 2020 - link

    Epic work Ian. Epic!

    Now if only your publisher implemented a subscription model (a la Ars Technica) so I could still support your work withthout being bombarded by ads and tracked, I would feel a lot less guilty enjoying the fruits of your amazing work.
  • lmcd - Tuesday, July 21, 2020 - link

    This 100%

Log in

Don't have an account? Sign up now