Optimizing Massively Multithreaded C++ Applications - Where's the Output Going?

If you have dozens or hundreds of threads doing work they are probably producing some sort of output. I'm not referring to logger output, but to permanent storage, perhaps in a database. If you cannot store the data as quickly as you produce it, you will eventually run into problems. If you are just barely able to keep up with data storage then your scalability will be limited.

Optimizing Massively Multithreaded C++ Applications - Beware of Heap Operations

While debugging my massively multithreaded C++ application I would notice times where the application would seem to pause for a few moments. During one of these pauses I halted the application and attached to it with the debugger (GDB). From within GDB I listed (info threads), switched to (thread <num>) and looked at the stack (bt) of each thread running.

I saw something surprising and very telling. Nearly every single thread that was supposed to be performing work was actually blocked on a mutex inside of either malloc or free.

Optimizing Massively Multithreaded C++ Applications - Watch for Hidden Mutexes

If your application does not scale as your threads increase, you should check the code to make sure there are no hidden mutexes limiting your concurrency.

Logging System

Optimizing Massively Multithreaded C++ Applications - Don't Forget the Obvious

Let's not overcomplicate things here. If you have 160 threads all trying to run concurrently, even if they are doing little to no work, they are all still doing some work. There's no reason to make the threads work harder than they need to.

Optimizing Massively Multithreaded C++ Applications - Intro

This is the beginning of short series of articles related to optimizing massively multithreadded C++ applications. I'm not entirely sure what the exact definition of "massively multithreaded" is, but for our purposes, let's assume at least twice as many threads as CPU cores.

Multithreaded C++: Part 4: Futures and Other Thread Handlers

We've covered the "Assembly Language", "C" and "C++" of the C++ threading world, and now we are going to try and move beyond that.

In the "C++" example we showed a object that automatically managed a thread's life time and used the thread to calculate the Fibonacci sequence in the background. Using templates we should be able to make a generic version of the Fibonacci calculator thread.

Thread Safety Locking Strategies

I'm going to cover a thread safety strategy I have been thinking about lately. Let's look at an example for a typical "lock the variables as you use them" approach:

class Stuff
    void doSomething()

    string myString;
    boost::mutex myMutex;

    void setString()
      boost::mutex::scoped_lock l(myMutex); // Lock before touching local vars
      myString = "hello world";

    void checkString()
      boost::mutex::scoped_lock l(myMutex); // Lock before touching local vars
      assert(myString == "hello world");

I am proposing the following changes to the typical approach:

Multithreaded C++: Part 3: RAII And Threads

If boost::threads represent the C of multithreaded programming, then RAII and automatically managed threads represent the C++ of multithreaded programming.

In the last article we promised that using more RAII would allow us to get this code even smaller and better to manage. Here is the result of that:

Multithreaded C++: Part 2: Boost Threads

If pthreads represent the assembly language of multithreading programming, then boost::threads represent the C of multithreaded programming.

Boost threads introduce some handy code saving features for the creation of threads, which is nice, but not as important as the RAII techniques they put to use for mutex management. In this case an example is worth a thousand words. Here is the same code we wrote in for pthreads rewritten for boost::threads:

Don't Join on Yourself!

My apparent memory leak from the other day did in fact turn out to be leaking thread resources. The key: Don't Join on Yourself!

If a thread attempts to call pthread_join on itself it is likely to cause a deadlock in that thread that you may never know about. Particularly if you are using some sort of worker-thread model that creates and destroys threads often.

A simple way to catch this problem would be to add an assertion in your destructor.