System Performance: Multi-Tasking

One of the key drivers of advancements in computing systems is multi-tasking. On mobile devices, this is quite lightweight - cases such as background email checks while the user is playing a mobile game are quite common. Towards optimizing user experience in those types of scenarios, mobile SoC manufacturers started integrating heterogenous CPU cores - some with high performance for demanding workloads, while others were frugal in terms of both power consumption / die area and performance. This trend is now slowly making its way into the desktop PC space.

Multi-tasking in typical PC usage is much more demanding compared to phones and tablets. Desktop OSes allow users to launch and utilize a large number of demanding programs simultaneously. Responsiveness is dictated largely by the OS scheduler allowing different tasks to move to the background. Intel's Alder Lake processors work closely with the Windows 11 thread scheduler to optimize performance in these cases. Keeping these aspects in mind, the evaluation of multi-tasking performance is an interesting subject to tackle.

We have augmented our systems benchmarking suite to quantitatively analyze the multi-tasking performance of various platforms. The evaluation involves triggering a ffmpeg transcoding task to transform 1716 3840x1714 frames encoded as a 24fps AVC video (Blender Project's 'Tears of Steel' 4K version) into a 1080p HEVC version in a loop. The transcoding rate is monitored continuously. One complete transcoding pass is allowed to complete before starting the first multi-tasking workload - the PCMark 10 Extended bench suite. A comparative view of the PCMark 10 scores for various scenarios is presented in the graphs below. Also available for concurrent viewing are scores in the normal case where the benchmark was processed without any concurrent load, and a graph presenting the loss in performance.

UL PCMark 10 Load Testing - Digital Content Creation Scores

UL PCMark 10 Load Testing - Productivity Scores

UL PCMark 10 Load Testing - Essentials Scores

UL PCMark 10 Load Testing - Gaming Scores

UL PCMark 10 Load Testing - Overall Scores

All PCMark 10 workload components see the relative ordering being maintained even after the addition of the concurrent loading.

Following the completion of the PCMark 10 benchmark, a short delay is introduced prior to the processing of Principled Technologies WebXPRT4 on MS Edge. Similar to the PCMark 10 results presentation, the graph below show the scores recorded with the transcoding load active. Available for comparison are the dedicated CPU power scores and a measure of the performance loss.

Principled Technologies WebXPRT4 Load Testing Scores (MS Edge)

Despite a 50%+ loss in performance the 40W PL1 configuration of the NUC BOX-1360P/D5 and the Arena Canyon NUC take the top spots even when concurrent loading is active.

The final workload tested as part of the multitasking evaluation routine is CINEBENCH R23.

3D Rendering - CINEBENCH R23 Load Testing - Single Thread Score

3D Rendering - CINEBENCH R23 Load Testing - Multiple Thread Score

The presence of heterogeneous cores is a challenge for handling new multi-threaded workloads when a multi-threaded workload like a transcoding task is already active. That is the primary reason for the AMD-based systems showing minimal performance loss when concurrent loads of different complexities are simultaneously triggered.

After the completion of all the workloads, we let the transcoding routine run to completion. The monitored transcoding rate throughout the above evaluation routine (in terms of frames per second) is graphed below.

The behavior of the NUC BOX-1360P/D5 is very similar to that of the NUCS BOX-1360P/D4, and it is not immediately obvious if Thread Director is working as intended.

ASRock Industrial NUC BOX-1360P/D5 (Performance) ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 3.5 13.21 46.5
PCMark 10 0 11.69 39.5
WebXPRT 4 3.5 11.18 21
Cinebench R23 2.5 11.74 41
Transcode End Pass 4 13.01 42.5

The silver lining seems to be that the drop in transcoding performance is not as heavy as what was seen in other systems.

Workstation Performance - SPECworkstation 3.1 HTPC Credentials
Comments Locked

21 Comments

View All Comments

  • ganeshts - Wednesday, July 19, 2023 - link

    Any links to such a 'NUC' ?

    I do have a Phoenix-based GTR7 from Beelink here in my testbed, but driver issues are preventing it from completing our benchmark suite. I am waiting for a new driver release from AMD.
  • lemurbutton - Friday, July 21, 2023 - link

    And any M2 Mac Mini would destroy any Zen4 NUC.
  • TheinsanegamerN - Monday, July 24, 2023 - link

    Until you have to run something not in the MAC ecosystem. OOPS!
  • PeachNCream - Friday, July 21, 2023 - link

    I don't think destruction is quite the right to articulate your apparent thoughts. Perhaps "result in higher scores on benchmarks" or maybe "complete compute workloads sooner" would fit better in this case. Computer nerds appear to be rather detached from reality when expressing thoughts which gives all of them a bad reputation among the better positioned and more intelligent normal population.
  • Samus - Saturday, July 22, 2023 - link

    The problem with AMD enterprise and industrial products has always been management adoption. Intel has IT depts hooked on vPro, iME, AMT, etc.
  • nicolaim - Wednesday, July 19, 2023 - link

    It's 2023. Only two USB-C ports, none on the back.
  • Samus - Thursday, July 20, 2023 - link

    That was my gripe. Replace the HDMI and DP ports with two TB4-compliant USB-C ports on the rear would be the minimum modification for such an 'industrial' appliance. Seriously, why do you have to plug something into the front to use Thunderbolt?
  • PeachNCream - Thursday, July 20, 2023 - link

    Probably because nobody uses or cares about Thunderbolt. Sure it has that usual small, insane rabid fanbase that any obscure computer standard had in the past, but outside of the inevitable idiots that inflate its utility, no one cares and no one profits from it.
  • abufrejoval - Monday, July 24, 2023 - link

    That's a bit harsh.

    Yes, using TB to its full potential is somewhat expensive but given a choice, I'll always opt for the TB variant over pure USB, if only for 10Gbit Ethernet.

    Front vs. back: I guess they have done their studies on how people use TB and unfortunately habits vary between people.

    Most of my dual TB systems have one TB in the front, the other in the back and that works pretty well for me. The 10GBase-T NIC goes into the back port and the front port is open to anything transient, which could be just some USB media (these native SATA 10Gbit USB sticks are hard to beat via anything native TB), a temporary display (Alt-DP handy there) and in theory to things like TB networking, which is typically transient.

    The older systems just have a single TB and expect a hub connected on the back, which seems sensible.

    Two in the front and two in the back would be better still, even if you couldn't use all four at full speed for lack of PCIe lanes or a cheap enough switch.

    Yet again, when your NUC is stuck to the back of a display, who cares what's front or back, because it's all behind the screen anyway and it's only people like me, wo use clusters of these NUCs as µ-servers in a "tiny-rack" who get bothered by the orientation of those ports.

    Changing port orientation in a NUC means a mainboard redesign and few would want to pay for that. So I guess their asked their volume customers and this is what those came up with.

    Very few vendors want to aggravate the customers.
  • sjkpublic@gmail.com - Thursday, July 20, 2023 - link

    Performance comparison says it all. 1360P DOA. 7735U $100-200 cheaper for ASROCK. Even cheaper if you look at other companies.

Log in

Don't have an account? Sign up now