Multithreaded C++: Part 1: Pthreads

(All articles in this series.)

This is the beginning of a series of articles on multithreaded programming in C++. In this first article we will look at pthreads, which are generally considered to be the "assembly language of threading." We will start at the bottom and work our way up to higher concepts.

Pthreads represent the lowest level multithreaded concept that is available on every major platform. On some platforms, pthreads are a wrapper for the operating system specific threads. On other platforms pthreads are native to the OS.

First, a couple of definitions to get us started.

thread
A path of execution separate from the main path of execution and running parallel to it. A thread provides a method by which you can have two parts of your code executing at the same time.
mutex
A method of preventing more than one thread from access a section of code at a time. If a thread has a lock on a mutex, no other thread can aquire the lock until the first thread releases its lock. Look for the functions pthread_mutex_lock and pthread_mutex_unlock in the example. Mutexes are critical for traditional multithreaded programming techniques.

The pthread library is written in C. As such, we need a C compatible way of calling the library from inside of C++. The following example represents the common way to do this:

class threaded_class
{
public:
    threaded_class()
        : m_stoprequested(false), m_running(false)
    {
        pthread_mutex_init(&m_mutex);
    }

    ~threaded_class()
    {
        pthread_mutex_destroy(&m_mutex);
    }

    // Create the thread and start work
    // Note 1
    void go()
    {
        assert(m_running == false);
        m_running = true;
        pthread_create(&m_thread, 0, &threaded_class::start_thread, this);
    }

    void stop() // Note 2
    {
        assert(m_running == true);
        m_running = false;
        m_stoprequested = true;
        pthread_join(&m_thread, 0);
    }

    int get_fibonacci_value(int which)
    {
        pthread_mutex_lock(&m_mutex); // Note 3
        int value = m_fibonacci_values.get(which); // Note 4
        pthread_mutex_unlock(&m_mutex);
        return value;
    }

private:
    volatile bool m_stoprequested; // Note 5
    volatile bool m_running;
    pthread_mutex_t m_mutex; // Variable declarations added 4/14/2010
    pthread_t m_thread;
   
    std::vector<int> m_fibonacci_values;

    // This is the static class function that serves as a C style function pointer
    // for the pthread_create call
    static void start_thread(void *obj)
    {
        //All we do here is call the do_work() function
        reinterpret_cast<threaded_class *>(obj)->do_work();
    }

    int fibonacci_number(int num)
    {
        switch(num)
        {
            case 0:
            case 1:
                return 1;
            default:
                return fibonacci_number(num-2) + fibonacci_number(num-1); // Correct 4/6/2010 based on comments
        };
    }    

    // Compute and save fibonacci numbers as fast as possible
    void do_work()
    {
        int iteration = 0;
        while (!m_stoprequested)
        {
            int value = fibonacci_number(iteration);
            pthread_mutex_lock(&m_mutex);
            m_fibonacci_values.push_back(value);
            pthread_mutex_unlock(&m_mutex); // Note 6
        }
    }                    
};

While this code works and is an all too common way of using threads in C++, it has several disadvantages. Note markers are made in the code and will be explained here.

Note 1
Because we are required to pass a C style function pointer into the pthread_create function we end up with at least 2 (and in our case 3) functions to actually kick off the thread.
  • go() (the public interface for starting the work) calls pthread_create() which calls
  • start_thread() (the C style interface for starting the thread) which finally calls
  • do_work() the object level private entry into the thread.
Note 2
stop() shuts down the thread and calls pthread_join() which does not return until the thread has been fully shutdown. What happens here if we forget to call stop? We may end up with a runaway thread that we have no way of shutting down in the best case. In the worst case we get a crash. The crash occurs when the threaded_class gets destroyed by the main thread of execution but the do_work thread is still running and is now trying to work on destructed data.
Note 3
pthread_mutex_lock() is used to get an exclusive lock on the variable m_mutex. The locks in both get_fibonacci_number and do_work ensure that a user trying to get a fibonacci value will not try to access m_fibonacci_values at the same time that it is being updated by do_work.

If an update and a read were to occur at the same time a crash is likely to happen. This is because the std::vector class resizes itself when new data is added to it.

Note 4
What happens if int value = m_fibonacci_values.get(which); throws an exception? If it were to, the function pthread_mutex_unlock immediately below it would never get executed. If the lock is never unlocked a condition known as a deadlock occurs. When a deadlock occurs one or more threads are stopped and cannot do any work because they cannot acquire the resources they need (in our case a mutex lock).

In fact, this code is susceptible to an exception being thrown if the requested fibonacci value has not yet been calculated.

Note 5
m_stoprequested must be defined volatile for code correctness. There is a chance it would work if it were not, but without volatile it is possible to create a situation where the thread would never exit because it would not know that m_stoprequested had changed to true

See the volatile article for more details.

Note 6
If we were to forget the line pthread_mutex_unlock(&m_mutex); a deadlock would be created just as it would have if an exception were thrown in "Note 4."

Coming up next, boost::threads, the "C" of multithreaded programming.

Part 2

All articles in the series.

Comments

Why isn't volatile needed for the m_fibonacci_values STL vector? Can I be really sure that the compiler will not do any register optimization to some part of the vector?

I understand why it is needed for simple types but how should I think for more complex types?

Tobias

The short answer is because we are updating m_fibonacci_values inside of a mutex locked critical section. Technically speaking volatile alerts the compiler that data may change in "unexpected" ways. Because of the mutex lock, every change to m_fibonacci_values is controlled and it is impossible for one thread to update it while another is trying to read it. As an optimization (and due to the fact that booleans are updated in a single instruction inside the CPU) we are able to update m_running and m_stop_requested without using a mutex, a consequence of that is that we need to mark it as volatile.

So to sum up:

  • If the variable you are using is a simple data type (ie int, bool) you do not need a mutex. If you do not use a mutex you must use volatile.
  • If the variable is complex (ie std::string, std::vector) you must use a mutex if there is any possibility at all that the variable will be updated while another thread is trying to access it.

I'm sure there is a much deeper explanation of this that is outside of my scope of knowledge.

My understanding from the code is that m_fibonacci_values is going to be shared among all the threads. But by making it a instance member aren't all threads going to get their own copy? I hope I didn't misunderstand something here.

Thanks,
Shaf

There is no such thing (currently) as thread-specific storage in C++. If you were to make m_fibonacci_values static it would result in m_fibonacci_values being shared between all object instances of the class "threaded_class." With the most basic usage case there will only be one "threaded_class" in existence in our application.

If you did make it static *and* you had more than one "threaded_class" it would almost certainly cause a crash as the various objects would step on each other with their shared data...

The confusion you are having may be coming from the function "start_thread" which is static. However, that static method class the non-static do_work and allows for the threads to share m_fibonacci_values.

Can you please explain how m_fibonacci_values is shared by all the threads?

Because any thread created in this code is created with access to the "this" pointer, they all share the same access to m_fibonacci_values. This is normal in C++ multithreaded programming, because thread local storage does not exist in the pthread library. It's better to avoid sharing of data between threads and instead send messages between threads anyhow, but the example is a common usage.

> While this code works and is an all too common way of using threads in C++, it has several disadvantages

It has a LOT of disadvantages and it doesn`t always work :

1) It doesn`t compile. No mutex variable "m_mutex" defined :)
2) pthread_create, pthread_mutex_init, pthread_mutex_lock methods may fail (because of insufficient system resources, signal delivery, ...) and
it is necessary to check their return codes.
3) Of course, class creator shouldn't give us a possibility to forget to call stop() method and stop/destroy executed thread in destructor anyway.
4) "assert(m_running == false)" - possibly, this is a bug. IMHO, assert should be used to catch programmers coding bugs and shouldn't`t
replace error handling. By default, You can`t expect that other programmers will no call this method twice. If You don`t like default case,
then document this methods behavior ...
3) Although in practice it is almost always legal to pass threaded_class::start_thread(void *obj) static method as callback to the pthread_start,
it is not guaranteed by the C++ standard. To be fully correct, start_thread method should have extern "C" notation. It is probably better to
create a separate static function using "C linkage" in the .cpp file.
(By saying "better" I mean that it would work for this case, the whole task implementation is broken from the start).
4) In case of optimized solution is required, it would be better to get the actual number of processor cores/processors and then to create the
corresponding number of threads to process the task.

Actually, I don`t agree that this buggy piece of code can be described as a "common way of using threads in C++" ..

That`s just my 2 cents, I appreciate your work ;)
Happy New Year!

It is a common way of using pthreads in C++, in my experience. But the point of the article, which you help illustrate, is that getting it right is difficult and cumbersome and using boost::threads is a much better way.

Thanks for the comment.

Since you are pedantic, I will join you:

You can't count.

1)
2)
3)
4)
3)
4)

But I appreciate your work! ;)

Happy Tuesday!

thanks..this was helpful

Maybe I'm wrong (I frequently am) but shouldn't the fibonacci function look like this? (a minute, nit-picky detail I know...just me being ocd ;) )

int fibonacci_number(int num)
{
switch(num)
{
case 0:
case 1:
return 1;
default:
return fibonacci_number(num-2) + fibonacci_number(num-1);
};
}

Thanks for the website! Has been infinitely helpful to me!

Cheers!

--Kaley

I made a couple of simple typos like that early on on this site. I'm going to try and compile each sample before I post it from now on.

Hi John,
It seems like nice example you made here. But sorry, how to compile this one?
I ve put some includes, but there are still m_mutex and m_thread not declared error.

Thanks.

I've added the variable declarations. Just remember to include the pthread headers when compiling.

-Jason

Can I be really sure that the compiler will not do any register optimization to some part of the vector?

It would be nice if the above example would compile and work as it was intended. For example, why in do_work() the variable "iteration" is always 0?

    int get_fibonacci_value(int which)   {
        pthread_mutex_lock(&m_mutex); // Note 3
        int value = m_fibonacci_values.get(which); // Note 4
        pthread_mutex_unlock(&m_mutex);
        return value;
    }

m_fibonacci_values.get(which) (aka .at) should check that m_fibonacci_values vector is populated up to the element given in the parameter. Not sure what it should do if it isn't populated, but below are the possible options:

  • Wait.
  • Throw an exception.
  • Return 0 or -1.
  • Halt.

m_fibonacci_values.at(which) throws an exception if the value (which) is out of range.
m_fibonacci_values[which] does not throw if out of range.

As you point out, this code is not exception safe, so I don't think it's ready for production.

What is this tutorial missing is using something like scoped resource management for mutexes. Decoupling the lock action from the mutex, you can achieve exception safety very easily.

Another thing I'd like to have seen here are condition variables, that are very useful in any non trivial multithreaded program.

Correct, you missed the entire point of the article. It was specifically for pthread usage, which does not have any RAII type mutex management. For more articles regarding better thread development techniques, please see the list of thread articles.

-Jason