As a follow-up to the last post, I’d like to review one question I got after the talks I held. That one was particularly interesting, because it highlights a certain aspect of the changes that we have to face with multiple cores and some misconceptions some people might have:
Doesn’t today’s multithreaded software profit from more cores automatically?
The answer is a definite yes and no. Server software like IIS or SQL Server has always been optimized for throughput (performance only secondary) and as long as the processing is CPU bound it will certainly put more cores to good use. That should result in better throughput (i.e. more requests handled at the same time), but not necessarily in better performance (i.e. the time to complete one request).
The more interesting observation however is on the client: Client applications follow different use cases for multithreading, actually mainly two of them:
- Avoid blocking of some task while processing something else. Examples include keeping the Windows UIs responsive (i.e. work around the UI’s thread affinity), Windows Services (their service control handler is invoked by the SCM and has to return in a certain amount of time).
- Be able to spend processing time on something valuable while one task is blocked waiting for something to happen. Blocking regularly happens during some kind of I/O, especially network calls. This is the „call WebService asynchronously and do something worthwhile until the call returns“ scenario.
Just different sides of the same coin actually. Please note that neither use case has been about actual performance, as in doing things really faster. The first one is about perceived performance by reducing latency for some favored task, the second one improves performance, albeit not by doing things faster but by avoiding unnecessary wait times. Anyway, this works quite well even (or rather especially) on single core machines.
Let’s dig deeper into an example: Say a lengthy CPU intensive calculation is triggered by a button. The time spend on updating the UI and doing the calculation is denoted in the following picture, the red line represents the executing thread:
The calculation is done on the UI thread, giving it peek performance, but at the same time freezing the UI until the calculation has finished. The typical “optimization” is putting the calculation on a second thread, e.g. using the BackgroundWorker component. That way, the UI keeps updating itself rather than degenerating to one of those white and unresponsive windows:
The UI and the calculation run on different threads (yet still on one core), thus the UI can update itself even during the calculation is still running. However, now it has to deal with that intermediary state (denoted by the striped blocks). And every time the UI thread uses the core, that time is lost to the calculation, so the time to completion will actually be longer than with A.
Now switch gears to our new, say quad core, machine…
What happens on a quad core is that both tasks now get executed in parallel:
As you can see, actual processing time is now back to the same it was with A, yet it manages to avoid freezing the UI as in B; again at the expense of having to deal with the intermediary state. However that’s about as good as it becomes. But the application doesn’t get faster, neither will a third or fourth core be utilized at all (other than ensuring that other applications can run there rather then interfering with the cores we just occupied).
So, while the potential in “classical multithreading” lies in shifting or arranging calculations of disparate tasks, the potential of parallelism with many cores lies in doing more calculations of CPU bound tasks at the same time. Like so:
This time the calculation has been split into parts and been distributed on different threads on top of the UI thread. This eventually caused better performance than A. However this is not the way today’s multithreaded applications are written, because on a single core it actually leads to worse performance, due to memory pressure if used excessively, thread context switches, synchronization and contention, etc..
Today’s multithreaded client software may profit from more cores, but to a far lower degree than one might think at first. And what’s more: It might even stop running altogether because actual parallelism opens the door for error conditions that were simply theoretical threats on a single core system but become a reality on multi cores. Something like accessing some memory while another core is still in the middle of an instruction doing the same.
PS: And by the way: the future is now!
I just started in a new project and got my new machine last week: An HP wx6600, with 2 quad cores @ 3GHz, 4GB
This is a task manager screenshot, showing 8 nice little cores, sitting under my desk, waiting for me to send them on one or the other errand…
A single core is such a lonesome entity…