home ‣ Strategies for Implementing POSIX Condition Variables on Win32 login
Strategies for Implementing POSIX Condition Variables on Win32
Douglas C. Schmidt and Irfan PyaraliDepartment of Computer Science
Washington University, St. Louis, Missouri
1. Introduction
The threading API provided by the Microsoft Win32 [Richter] family of operating systems (i.e., Windows NT, Windows '95, and Windows CE) provides some of the same concurrency constructs defined by the POSIX Pthreads specification [Pthreads]. For instance, they both support mutexes, which serialize access to shared state. However, Win32 lacks full-fledged condition variables, which are a synchronization mechanism used by threads to wait until a condition expression involving shared data attains a particular state.The lack of condition variables in Win32 makes it harder to implement certain concurrency abstractions, such as thread-safe message queues and thread pools. This article explores various techniques and patterns for implementing POSIX condition variables correctly and/or fairly on Win32. Section 2 explains what condition variables are and shows how to use them. Secion 3 explains alternative strategies for implementing POSIX condition variables using Win32 synchronization primitives. A subsequent article will describe how the Wrapper Facade pattern and various C++ language features can help reduce common mistakes that occur when programming condition variables.
2. Overview of POSIX Condition Variables
Condition variables are a powerful synchronization mechanism defined in the POSIX Pthreads specification. They provide a different type of synchronization than locking mechanisms like mutexes. For instance, a mutex is used to cause other threads to wait while the thread holding the mutex executes code in a critical section. In contrast, a condition variable is typically used by a thread to make itself wait until an expression involving shared data attains a particular state.In general, condition variables are more appropriate than mutexes for situations involving complex condition expressions or scheduling behaviors. For instance, condition variables are often used to implement thread-safe message queues [Schmidt], which provide a ``producer/consumer'' communication mechanism for passing messages between multiple threads. In this use-case, condition variables block producer threads when a message queue is ``full'' and block consumer threads when the queue is ``empty.''
2.1. POSIX Condition Variable Operations
The following are the four primary operations defined on POSIX condition variables:
pthread_cond_wait (pthread_cond_t *cv, pthread_mutex_t *external_mutex)-- This function waits for conditioncvto be notified. When called, it atomically (1) releases the associatedexternal_mutex(which the caller must hold while evaluating the condition expression) and (2) goes to sleep awaiting a subsequent notification from another thread (via thepthread_cond_signalorpthread_cond_broadcastoperations described next). Theexternal_mutexwill be locked whenpthread_cond_waitreturns.pthread_cond_signal (pthread_cond_t *cv)-- This function notifies one thread waiting on condition variablecv.pthread_cond_broadcast (pthread_cond_t *cv)-- This function notifies all threads that are current waiting on condition variablecv.pthread_cond_timedwait (pthread_cond_t *cv, pthread_mutex_t *external_mutex, const struct timespec *abstime)-- This function is a time-based variant ofpthread_cond_wait. It waits up toabstimeamount of time forcvto be notified. Ifabstimeelapses beforecvis notified, the function returns back to the caller with anETIMEresult, signifying that a timeout has occurred. Even in the case of timeouts, theexternal_mutexwill be locked whenpthread_cond_timedwaitreturns.
pthread_cond_init and pthread_cond_destroy,
which initialize and destroy instances of POSIX condition variables,
respectively.
2.2. Using Condition Variables to Implement Counting Semaphores
The expression waited upon by a condition variable can be arbitrarily complex. For instance, a thread may need to block until a certain expression involving one or more data items shared by other threads becomes true. To illustrate how this works, we'll implement a counting semaphore abstraction below using condition variables.Counting semaphores are a synchronization mechanism commonly used to serialize or coordinate multiple threads of control. Conceptually, counting semaphores are non-negative integers that can be incremented and decremented atomically. If a thread tries to decrement a semaphore whose value equals 0 this thread is suspended waiting for another thread to increment the semaphore count above 0.
Counting semaphores are often used to keep track of changes in the state of objects shared by multiple threads in a process. For instance, they can record the occurrence of a particular event. Unlike condition variables, semaphores maintain state. Therefore, they allow threads to make decisions based upon this state, even if the event has occurred in the past.
The standard POSIX Pthreads interface doesn't contain a counting semaphore abstraction. However, it's straightforward to create one using mutexes and condition variables, as shown below.
First, we define a struct that holds the state
information necessary to implement a counting semaphore:
struct sema_t
{
private:
u_int count_;
// Current count of the semaphore.
u_long waiters_count_;
// Number of threads that have called <sema_wait>.
pthread_mutex_t lock_;
// Serialize access to <count_> and <waiters_count_>.
pthread_cond_t count_nonzero_;
// Condition variable that blocks the <count_> 0.
};
Instances of type sema_t are created by the following
sema_init factory (error checking is omitted to save
space):
int
sema_init (sema_t *s, u_int initial_count)
{
pthread_mutex_init (&s->lock_);
pthread_cond_init (&s->count_nonzero_);
s->count_ = initial_count;
s->waiters_count_ = 0;
}
The sema_wait function shown below blocks the calling
thread until the semaphore count pointed to by s is
greater than 0, at which point the count is atomically
decremented.
int
sema_wait (sema_t *s)
{
// Acquire mutex to enter critical section.
pthread_mutex_lock (&s->lock_);
// Keep track of the number of waiters so that <sema_post> works correctly.
s->waiters_count_++;
// Wait until the semaphore count is > 0, then atomically release
// <lock_> and wait for <count_nonzero_> to be signaled.
while (s->count_ == 0)
pthread_cond_wait (&s->count_nonzero_,
&s->lock_);
// <s->lock_> is now held.
// Decrement the waiters count.
s->waiters_count_--;
// Decrement the semaphore's count.
s->count_--;
// Release mutex to leave critical section.
pthread_mutex_unlock (&s->lock_);
}
The counting semaphore implementation shown above uses a ``caller
manipulates waiter count'' idiom to simplify the synchronization
between the sema_wait and sema_post
functions. This synchronization idiom is used elsewhere in this
article.
The sema_post function atomically increments the
semaphore count pointed to by s. Only one thread will be
unblocked and allowed to continue, regardless of the number of threads
blocked on the semaphore.
int
sema_post (sema_t *s)
{
pthread_mutex_lock (&s->lock_);
// Always allow one thread to continue if it is waiting.
if (s->waiters_count_ > 0)
pthread_cond_signal (&s->count_nonzero_);
// Increment the semaphore's count.
s->count_++;
pthread_mutex_unlock (&s->lock_);
}
To further clarify the behavior of counting semaphores, assume there
are two threads, T1 and T2, both of which are
collaborating via semaphore sema_t s. If T1
calls sema_wait when s's semaphore count is
0 it will block until thread T2 calls
sema_post. When T2 subsequently calls
sema_post, this function detects if the shared data
waiters_count_ is > 0, which signifies that thread
T1 is blocked waiting to acquire semaphore
s.
When thread T2 signals the condition variable
count_nonzero_ in sema_post, thread
T1 is awakened in pthread_cond_wait. When
thread T1 is rescheduled and dispatched, it re-evaluates
its condition expression, i.e., s->count_ == 0.
If count_ is > 0 the sema_post function
returns to thread T1.
Note that a while loop is used to wait for condition variable
count_nonzero_ in sema_wait. This condition
variable looping idiom [Lea] is necessary since
another thread, e.g., T3, may have acquired the
semaphore first and decremented s->count_ back to 0
before thread T1 is awakened. In this example, the while
loop ensures that T1 doesn't return until the semaphore is
> 0. In general, the while loop idiom ensures that threads don't
return from pthread_cond_wait prematurely.
3. Implementing Condition Variables on Win32
As shown above, POSIX Pthreads defines condition variables via thepthread_cond_t type and its supporting operations. In
contrast, the Microsoft Win32 interface does not support condition
variables natively. To workaround this omission, many developers
write POSIX condition variable emulations using Win32 synchronization
primitives such as mutexes, critical sections, semaphores, and events,
which are outlined in the sidebar.
Sidebar: Overview of Win32 Synchronization Primitives
Win32 contains a range of synchronization primitives. The primitives used in this article are outlined below.
Mutexes
Mutexes are ``binary semaphores'' that serialize thread access to critical sections. Win32 mutexes are accessed viaHANDLEs, and can serialize threads within one process or
across multiple processes. A single mutex is acquired via
WaitForSingleObject and
WaitForMultipleObjects can acquire multiple mutexes. A
mutex is released via the ReleaseMutex operation.
Critical Sections
Critical sections are ``lightweight'' mutexes that can only serialize threads within a single process. They are more efficient than Win32 mutexes, however, and potentially require only a single hardware instruction on some platforms. In contrast, Win32 mutexes require a system call and and is substantially slower. A critical section is acquired viaEnterCriticalSection and released via
LeaveCriticalSection.
Although they are slower in some situations, some
experiments show that critical sections provide no performance
benefit at all when running on an SMP system and two threads are
running on different processors and contending for the same critical
section. Moreover, Win32 mutexes are also functionally richer than
critical sections. For instance, WaitForMultipleObjects
can be used to wait for multiple waitable objects (mutexes,
semaphores, threads, processes, etc). By enabling the
waitAll parameter, all waitable object
HANDLEs passed to WaitForMultipleObjects can
be acquired atomically. The waitAll feature supports
``all or nothing'' synchronization semantics that are not available
from critical sections.
Semaphores
Win32 semaphores are more general synchronization objects that implement the counting semaphores mechanism described in Section 2.2. Like mutexes, Win32 semaphores can synchronize threads within one process or across multiple processes. A single semaphore is acquired viaWaitForSingleObject and
WaitForMultipleObjects can acquire multiple semaphores.
A semaphore is released via the ReleaseSemaphore
operation, which increments the semaphore value by a user-specified
amount, which is typically 1.
Events
Win32 events come in two types: (1) auto-reset events and (2) manual-reset events. The semantics of the Win32 event operations, e.g.,SetEvent, PulseEvent,
ResetEvent, differ depending on what type of event
they're operating on. A single event of either type can be waited
upon via WaitForSingleObject and
WaitForMultipleObjects can wait for multiple events.
When a manual-reset event is signaled by SetEvent it sets
the event into the signaled state and wakes up all threads waiting on
this event. In contrast, when an auto-reset event is signaled by
SetEvent and there are any threads waiting, it wakes up
one thread and reset the event to the non-signaled state. If there
are no threads waiting the event remains signaled until a single
waiting thread waits on it and is released.
When a manual-reset event is signaled by PulseEvent it
wakes up all waiting threads and atomically resets the event. In
contrast, if an auto-reset event is signaled by
PulseEvent and there are any threads waiting, it wakes up
one thread. In either case, the auto-reset event is set to the
non-signaled state.
Both Win32 events and POSIX condition variables provide similar
waiting, signaling, and broadcasting
features. For instance, WaitForMultipleObjects can
acquire a mutex and wait on an event simultaneously via the
waitAll flag and SignalObjectAndWait can
release a mutex and wait on an event atomically. These functions
provide semantics akin to the pthread_cond_wait and
pthread_cond_signal. Thus, there are instances where
either events and condition variables can be used interchangably.
However, extreme care must be taken with Win32 events to ensure that there are no race conditions introduced when switching from one mechanism to another. Unfortunately, there's no way to release just one waiting thread with a manual-reset event. Likewise, there's no way to release all waiting threads with an auto-reset event. This limitation is a major source of difficulty when implementing condition variables, as shown in Section 3.
After years of repeatedly seeing Win32 implementations of condition
variables posted in newsgroups like
comp.programming.threads it became apparent that many
Win32 implementations are either incorrect or contain subtle problems
that can lead to starvation, unfairness, or race conditions. To help
developers avoid these problems, this article evaluates common
strategies for implementing POSIX condition variables on Win32,
illustrating common traps and pitfalls and ways to avoid them.
All these strategies described in this paper are designed for threads in the same process. Note that we use a C-style of C++ programming to focus our emphasis on the steps in each solution.
3.1. The PulseEvent Solution
The first solution we'll examine starts by defining apthread_cond_t data structure that contains a pair of
Win32 events:
typedef struct
{
enum {
SIGNAL = 0,
BROADCAST = 1,
MAX_EVENTS = 2
};
HANDLE events_[MAX_EVENTS];
// Signal and broadcast event HANDLEs.
} pthread_cond_t;
In this solution, an auto-reset event (events_[SIGNAL])
implements pthread_cond_signal and a manual-reset event
(events_[BROADCAST]) implements
pthread_cond_broadcast.
Using these synchronization mechanisms, the
pthread_cond_init initialization function is defined as
follows:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *)
{
// Create an auto-reset event.
cv->events_[SIGNAL] = CreateEvent (NULL, // no security
FALSE, // auto-reset event
FALSE, // non-signaled initially
NULL); // unnamed
// Create a manual-reset event.
cv->events_[BROADCAST] = CreateEvent (NULL, // no security
TRUE, // manual-reset
FALSE, // non-signaled initially
NULL); // unnamed
}
The following three functions implement the core operations of POSIX
condition variables (for simplicity, most error handling is omitted in
this and other solutions). We'll start by defining a mutual exclusion
type for the POSIX pthread_mutex_t using a Win32
CRITICAL_SECTION, as follows:
typedef CRITICAL_SECTION pthread_mutex_t;
The pthread_cond_wait function first releases the
associated external_mutex of type
pthread_mutex_t, which must be held when the caller
checks the condition expression. The function then uses
WaitForMultipleObjects to wait for either the
SIGNAL or BROADCAST event to become
signaled:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Release the <external_mutex> here and wait for either event
// to become signaled, due to <pthread_cond_signal> being
// called or <pthread_cond_broadcast> being called.
LeaveCriticalSection (external_mutex);
WaitForMultipleObjects (2, // Wait on both <events_>
ev->events_,
FALSE, // Wait for either event to be signaled
INFINITE); // Wait "forever"
// Reacquire the mutex before returning.
EnterCriticalSection (external_mutex, INFINITE);
}
Note that the pthread_cond_timedwait function can be
implemented by mapping its struct timespec *abstime
parameter into a non-INFINITE final argument to
WaitForMultipleObject.
The pthread_cond_signal function tries to wakeup one thread
waiting on the condition variable cv, as follows:
int
pthread_cond_signal (pthread_cond_t *cv)
{
// Try to release one waiting thread.
PulseEvent (cv->events_[SIGNAL]);
}
The call to PulseEvent on the auto-reset event
cv->events_[SIGNAL] awakens one waiting thread. The
SIGNAL event is atomically reset to the non-signaled
state, even if no threads are waiting on this event.
The pthread_cond_broadcast function is similar, though it
wakes up all threads waiting on the condition variable
cv:
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
// Try to release all waiting threads.
PulseEvent (cv->events_[BROADCAST]);
}
Calling PulseEvent on a manual-reset event releases all
threads that are waiting on cv->events_[BROADCAST]. As
with pthread_cond_signal, the state of the event is
atomically reset to non-signaled, even if no threads are waiting.
Evaluating the PulseEvent Solution
Although thePulseEvent solution is concise and
``intuitive,'' it is also incorrect. The root of the problem is known
as the lost wakeup bug. This bug occurs from the lack of
atomicity between (1) releasing the external mutex and (2) the start
of the wait, i.e.:
LeaveCriticalSection (external_mutex);
// Potential "lost wakeup bug" here!
WaitForMultipleObjects (2, ev->events_, FALSE, INFINITE);
The lost wakeup bug manifests itself when the calling thread is
preempted after releasing the external_mutex,
but before calling WaitForMultipleObjects. In
this case, a different thread could acquire the
external_mutex and perform a
pthread_cond_signal in between the two calls. Because
our signal implementation pulses the auto-reset event, the
event remain unsignaled. Therefore, when the preempted calling thread
resumes and calls WaitForMultipleObjects it may find both
events non-signaled. In this case, it might wait indefinitely since
its wakeup signal got ``lost.''
Note that this problem could be trivially solved if Win32 provided
some type of ``SignalObjectAndWaitMultiple'' function
that released a mutex or critical section and atomically waited for an
array of HANDLEs to be signaled.
3.2. The SetEvent Solution
One way to avoid the lost wakeup bug is to use theSetEvent Win32 functions instead of
PulseEvent. The second solution, shown below, implements
this approach using a pthread_cond_t data structure
that's similar to the first one. The difference is that it maintains
a count of the threads waiting on the condition variable and a
CRITICAL_SECTION to protect this count, as follows:
typedef struct
{
u_int waiters_count_;
// Count of the number of waiters.
CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.
// Same as before...
} pthread_cond_t;
The pthread_cond_init initialization function is defined
as follows:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *)
{
// Initialize the count to 0.
cv->waiters_count_ = 0;
// Create an auto-reset and manual-reset event, as before...
}
The pthread_cond_wait function waits for a condition
cv and atomically releases the associated
external_mutex that it holds while checking the condition
expression:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_++;
LeaveCriticalSection (&cv->waiters_count_lock_);
// It's ok to release the <external_mutex> here since Win32
// manual-reset events maintain state when used with
// <SetEvent>. This avoids the "lost wakeup" bug...
LeaveCriticalSection (external_mutex);
// Wait for either event to become signaled due to <pthread_cond_signal>
// being called or <pthread_cond_broadcast> being called.
int result = WaitForMultipleObjects (2, ev->events_, FALSE, INFINITE);
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_--;
int last_waiter =
result == WAIT_OBJECT_0 + BROADCAST
&& cv->waiters_count_ == 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
// Some thread called <pthread_cond_broadcast>.
if (last_waiter)
// We're the last waiter to be notified or to stop waiting, so
// reset the manual event.
ResetEvent (cv->events_[BROADCAST]);
// Reacquire the <external_mutex>.
EnterCriticalSection (external_mutex, INFINITE);
}
The release of the external_mutex and the subsequent wait
for either of the events_ to become signaled is still
non-atomic. This implementation avoids the lost wakeup bug, however,
by relying on the ``stickiness'' of the manual-reset event, the use of
SetEvent rather than PulseEvent, and the
waiters_count_ count. Note how the last waiter thread in
pthread_cond_wait resets the broadcast
manual-reset event to non-signaled before exiting the function. This
event is signaled in pthread_cond_broadcast, which wakes
up all threads waiting on the condition variable
cv, as follows:
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = cv->waiters_count_ > 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (have_waiters)
SetEvent (cv->events_[BROADCAST]);
}
Calling SetEvent on a manual-reset event sets the
cv->events_[BROADCAST] event to the signaled state. This
releases all threads until the event is manually reset in the
pthread_cond_wait function above.
The pthread_cond_signal function wakes up a thread
waiting on the condition variable cv:
int
pthread_cond_signal (pthread_cond_t *cv)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = cv->waiters_count_ > 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (have_waiters)
SetEvent (cv->events_[SIGNAL]);
}
Calling SetEvent on an auto-reset event will set the
cv->events_[SIGNAL] event to the signaled state. This
releases a single thread and atomically resets the event to the
non-signaled state. If there are no waiting threads, the
SetEvent function is skipped.
Evaluating the SetEvent Solution
Although the solution above doesn't suffer from the lost wakeup bug exhibited by thePulseEvent implementation, it does have
the following drawbacks:
- Unfairness -- The semantics of the POSIX
pthread_cond_broadcastfunction is to wake up all threads currently blocked inwaitcalls on the condition variable. The awakened threads then compete for theexternal_mutex. To ensure fairness, all of these threads should be released from theirpthread_cond_waitcalls and allowed to recheck their condition expressions before other threads can successfully complete a wait on the condition variable.Unfortunately, the
SetEventimplementation above does not guarantee that all threads sleeping on the condition variable whencond_broadcastis called will acquire theexternal_mutexand check their condition expressions. Although the Pthreads specification does not mandate this degree of fairness, the lack of fairness can cause starvation.To illustrate the unfairness problem, imagine there are 2 threads, C1 and C2, that are blocked in
pthread_cond_waiton condition variablenot_empty_that is guarding a thread-safe message queue. Another thread, P1 then places two messages onto the queue and callspthread_cond_broadcast. If C1 returns frompthread_cond_wait, dequeues and processes the message, and immediately waits again then it and only it may end up acquiring both messages. Thus, C2 will never get a chance to dequeue a message and run.The following illustrates the sequence of events:
- Thread C1 attempts to dequeue and waits on CV
non_empty_ - Thread C2 attempts to dequeue and waits on CV
non_empty_ - Thread P1 enqueues 2 messages and broadcasts to CV
not_empty_ - Thread P1 exits
- Thread C1 wakes up from CV
not_empty_, dequeues a message and runs - Thread C1 waits again on CV
not_empty_, immediately dequeues the 2nd message and runs - Thread C1 exits
- Thread C2 is the only thread left and blocks forever since
not_empty_will never be signaled
pthread_cond_broadcast. Howver, there are many cases where fair implementations of condition variables can simplify application code. - Thread C1 attempts to dequeue and waits on CV
- Incorrectness -- A variation on the unfairness problem
described above occurs when a third consumer thread, C3, is
allowed to slip through even though it was not waiting on condition
variable
not_empty_when a broadcast occurred.To illustrate this, we will use the same scenario as above: 2 threads, C1 and C2, are blocked dequeuing messages from the message queue. Another thread, P1 then places two messages onto the queue and calls
pthread_cond_broadcast. C1 returns frompthread_cond_wait, dequeues and processes the message. At this time, C3 acquires theexternal_mutex, callspthread_cond_waitand waits on the events inWaitForMultipleObjects. Since C2 has not had a chance to run yet, theBROADCASTevent is still signaled. C3 then returns fromWaitForMultipleObjects, and dequeues and processes the message in the queue. Thus, C2 will never get a chance to dequeue a message and run.The following illustrates the sequence of events:
- Thread C1 attempts to dequeue and waits on CV
non_empty_ - Thread C2 attempts to dequeue and waits on CV
non_empty_ - Thread P1 enqueues 2 messages and broadcasts to CV
not_empty_ - Thread P1 exits
- Thread C1 wakes up from CV
not_empty_, dequeues a message and runs - Thread C1 exits
- Thread C3 waits on CV
not_empty_, immediately dequeues the 2nd message and runs - Thread C3 exits
- Thread C2 is the only thread left and blocks forever since
not_empty_will never be signaled
- Thread C1 attempts to dequeue and waits on CV
- Increased serialization overhead -- The implementation
shown above uses the
waiters_count_lock_critical section to protect thewaiters_count_from being corrupted by race conditions. This additional serialization overhead is not necessary as long aspthread_cond_signalandpthread_cond_broadcastare always called by a thread that has locked the sameexternal_mutexused bypthread_cond_wait. For instance, application code that always uses the following idiom does not require this additional serialization:
In contrast, application code that uses the following optimizationpthread_mutex_t external_mutex; pthread_cond_t cv; void release_resources (int resources_released) { // Acquire the lock. pthread_mutex_lock (&external_mutex); // Atomically modify shared state here... // Could also use <pthread_cond_broadcast>. pthread_cond_signal (&cv); // Release the lock. pthread_mutex_unlock (&external_mutex); }doesrequire the additional serialization:
The code shown above puts thevoid release_resources (int resources_released) { // Acquire the lock. pthread_mutex_lock (&external_mutex); int can_signal = 0; // Atomically modify shared state here... // Keep track of whether we can signal or not // as a result of the updated shared state. can_signal = 1; // Release the lock. pthread_mutex_unlock (&external_mutex); if (can_signal) // Could also use <pthread_cond_broadcast>. pthread_cond_signal (&cv); }pthread_cond_signalafter thepthread_mutex_unlockto minimize the number of trips through the condition variable scheduler. The problem with this optimization, of course, is that race conditions can occur on thewaiters_count_if it was tested/modified without being protected by theexternal_mutex. Note that thePulseEventssolution did not have this race condition since it did not have any directly mutable internal state.In general, race conditions can occur with condition variable implementations that use internal state if threads don't acquire the
external_mutexwhen callingpthread_cond_signalandpthread_cond_broadcast. The POSIX Pthread specification allows these functions to be called by a thread whether or not that thread has currently locked theexternal_mutex. Therefore, although the use of the internal lock can increase overhead, it is necessary to provide robust, standard-compliant condition variable programming model.
3.3. The Generation Count Solution
There are several ways to solve the fairness and correctness problems. The following example illustrates one strategy for alleviating these problems. We start by creating a more sophisticatedpthread_cond_t struct.
typedef struct
{
int waiters_count_;
// Count of the number of waiters.
CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.
int release_count_;
// Number of threads to release via a <pthread_cond_broadcast> or a
// <pthread_cond_signal>.
int wait_generation_count_;
// Keeps track of the current "generation" so that we don't allow
// one thread to steal all the "releases" from the broadcast.
HANDLE event_;
// A manual-reset event that's used to block and release waiting
// threads.
} pthread_cond_t;
As shown in the pthread_cond_t struct above,
this solution uses just one Win32 event. This manual-reset event
blocks threads in pthread_cond_wait and releases one or
more threads in pthread_cond_signal and
pthread_cond_broadcast, respectively. To enhance
fairness, this scheme uses (1) a wait_generation_count_
data member to track real signals/broadcasts and (2) a
release_count_ to control how many threads should be
notified for each signal/broadcast.The following function creates and initializes a condition variable using the generation count implementation:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *);
{
cv->waiters_count_ = 0;
cv->wait_generation_count_ = 0;
cv->release_count_ = 0;
// Create a manual-reset event.
cv->event_ = CreateEvent (NULL, // no security
TRUE, // manual-reset
FALSE, // non-signaled initially
NULL); // unnamed
}
The pthread_cond_wait implementation shown below waits
for condition cv and atomically releases the associated
external_mutex that must be held while checking the
condition expression:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
// Increment count of waiters.
cv->waiters_count_++;
// Store current generation in our activation record.
int my_generation = cv->wait_generation_count_;
LeaveCriticalSection (&cv->waiters_count_lock_);
LeaveCriticalSection (external_mutex);
for (;;) {
// Wait until the event is signaled.
WaitForSingleObject (cv->event_, INFINITE);
EnterCriticalSection (&cv->waiters_count_lock_);
// Exit the loop when the <cv->event_> is signaled and
// there are still waiting threads from this <wait_generation>
// that haven't been released from this wait yet.
int wait_done = cv->release_count_ > 0
&& cv->wait_generation_count_ != my_generation;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (wait_done)
break;
}
EnterCriticalSection (external_mutex);
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_--;
cv->release_count_--;
int last_waiter = cv->release_count_ == 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (last_waiter)
// We're the last waiter to be notified, so reset the manual event.
ResetEvent (cv->event_);
}
This function loops until the event_ HANDLE
is signaled and there are still threads from this ``generation'' that
haven't been released from the wait. The
wait_generation_count_ field is incremented every time
the event_ is signal via
pthread_cond_broadcast or
pthread_cond_signal. It tries to eliminate the fairness
problems with the SetEvents solution by not responding to
signal or broadcast notifications that have occurred in a previous
``generation,'' i.e., before the current group of
threads started waiting. The following function notifies a single thread waiting on a condition variable:
int
pthread_cond_signal (pthread_cond_t *cv)
{
EnterCriticalSection (&cv->waiters_count_lock_);
if (cv->waiters_count_ > cv->release_count_) {
SetEvent (cv->event_); // Signal the manual-reset event.
cv->release_count_++;
cv->wait_generation_count++;
}
LeaveCriticalSection (&cv->waiters_count_lock_);
}
Note that we only signal the event if there are more waiters than
threads in the generation that is in the midst of being released.
The following implementation of pthread_cond_broadcast
notifies all threads waiting on a condition variable:
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
EnterCriticalSection (&cv->waiters_count_lock_);
if (cv->waiters_count_ > 0) {
SetEvent (cv->event_);
// Release all the threads in this generation.
cv->release_count_ = cv->waiters_count_;
// Start a new generation.
cv->wait_generation_count_++;
}
LeaveCriticalSection (&cv->waiters_count_lock_);
}
Evaluating the Generation Count Solution
The unfairness problem of theSetEvent solution is
nominally avoided by having each thread test whether there was
actually a signal since it waited or if there are only left-over
releases. This prevents one thread from taking more than one release
without waiting in a new ``generation.'' However, the following are
the subtle traps and pitfalls with this approach:
- Busy-waiting -- This solution can result in busy-waiting
if a waiter has the highest priority thread. The problem is that once
pthread_cond_broadcastsignals the manual-resetevent_it remains signaled. Therefore, the highest priority thread may cycle endlessly through theforloop inpthread_cond_waitand never sleep on the event. - Unfairness -- The
forloop inpthread_cond_waitleaves the critical section before callingWaitForSingleObject. Thus, it's possible for another thread to acquire theexternal_mutexand callpthread_cond_signalorpthread_cond_broadcastagain during this unprotected region. If this situation occurs, thewait_generation_count_will increase, which may cause the waiting thread to break out of the loop prematurely. In this case, the waiting thread can steal a release that was intended for another thread. Therefore, the Generation Count solution isn't entirely fair after all! - Serialization overhead -- Because there is additional
state in the
pthread_cond_timplementation, the internal serialization logic is more complex.
3.4. The SignalObjectAndWait Solution
A more complete, albeit more complex, way to achieve fairness is to use a broader set of Win32 synchronization mechanisms. The solution shown below leverages the Windows NT 4.0SignalObjectAndWait function, in conjunction with the
following three synchronization mechanisms:
- A Win32 semaphore (
sema_) -- which is used to queue up threads waiting for the condition to become signaled. - A Win32 auto-reset event (
waiters_done_) -- which is used by the broadcast or signaling thread to wait for all the waiting thread(s) to wake up and be released from the semaphore. - A Win32 Critical Section (
waiters_count_lock_) -- which serializes access to the count of waiting threads.
struct to implement
POSIX condition variables on Win32:
typedef struct
{
int waiters_count_;
// Number of waiting threads.
CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.
HANDLE sema_;
// Semaphore used to queue up threads waiting for the condition to
// become signaled.
HANDLE waiters_done_;
// An auto-reset event used by the broadcast/signal thread to wait
// for all the waiting thread(s) to wake up and be released from the
// semaphore.
size_t was_broadcast_;
// Keeps track of whether we were broadcasting or signaling. This
// allows us to optimize the code if we're just signaling.
} pthread_cond_t;
For variety's sake, this solution optimizes the serialization required
by assuming that the external_mutex is held when
pthread_cond_signal or
pthread_cond_broadcast is called. If this optimization
is not possible, an extra mutex can be added to
pthread_cond_t and held for the entire duration of the
pthread_cond_signal, pthread_cond_broadcast,
pthread_cond_wait, and
pthread_cond_timedwait calls.
The remainder of this section illustrates how to implement condition
variables on Windows NT 4.0 using the SignalObjectAndWait
function. This function atomically signals one HANDLE
and waits for another HANDLE to become signaled. Since
mutexes are accessed via HANDLEs, whereas
CRITICAL_SECTIONS are not, we'll need to typedef a Win32
mutex for the POSIX pthread_mutex_t instead of a
CRITICAL_SECTION:
typedef HANDLE pthread_mutex_t;
The pthread_cond_init function is defined as follows:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *)
{
cv->waiters_count_ = 0;
cv->was_broadcast_ = 0;
cv->sema_ = CreatedSemaphore (NULL, // no security
0, // initially 0
0x7fffffff, // max count
NULL); // unnamed
InitializeCriticalSection (&cv->waiters_count_lock_);
cv->waiters_done_ = CreateEvent (NULL, // no security
FALSE, // auto-reset
FALSE, // non-signaled initially
NULL); // unnamed
}
The pthread_cond_wait function uses two steps:
- It waits for the
sema_semaphore to be signaled, which indicates that apthread_cond_broadcastorpthread_cond_signalhas occurred. - It sets the
waiters_done_auto-reset event into the signaled state when the last waiting thread is about to leave thepthread_cond_waitcritical section.
pthread_cond_broadcast
function described below to ensure fairness. But first, here's the
implementation of pthread_cond_wait:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_++;
LeaveCriticalSection (&cv->waiters_count_lock_);
// This call atomically releases the mutex and waits on the
// semaphore until <pthread_cond_signal> or <pthread_cond_broadcast>
// are called by another thread.
SignalObjectAndWait (*external_mutex, cv->sema_, INFINITE, FALSE);
// Reacquire lock to avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
// We're no longer waiting...
cv->waiters_count_--;
// Check to see if we're the last waiter after <pthread_cond_broadcast>.
int last_waiter = cv->was_broadcast_ && cv->waiters_count_ == 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
// If we're the last waiter thread during this particular broadcast
// then let all the other threads proceed.
if (last_waiter)
// This call atomically signals the <waiters_done_> event and waits until
// it can acquire the <external_mutex>. This is required to ensure fairness.
SignalObjectAndWait (cv->waiters_done_, *external_mutex, INFINITE, FALSE);
else
// Always regain the external mutex since that's the guarantee we
// give to our callers.
WaitForSingleObject (*external_mutex);
}
This implementation of pthread_cond_wait ensures that the
external_mutex is held until all threads waiting on the
cv have a chance to wait again on the
external_mutex before returning to their callers. This
solution relies on the fact that Windows NT mutex requests are queued
in FIFO order, rather than in, e.g., priority order
[MSMutex]. Because the external_mutex
queue is serviced in FIFO order, all waiting threads will acquire the
external mutex before any of them can reacquire it a second time.
This property is essential to ensure fairness.
We use the SignalObjectAndWait function to ensure
atomicity when signaling one synchronization object
(external_mutex) and waiting for another synchronization
object to become signaled (cv->sema_). This is
particularly important for the last waiter thread, i.e., the
one that signals the broadcasting thread when all waiters are done.
If we don't leverage the atomicity of
SignalObjectAndWait, our solution is potentially unfair
since the last waiter thread may not get the chance to wait on the
external_mutex before one of the other waiters gets it,
performs some work, releases and then immediately reacquires
external_mutex.
The pthread_cond_signal function is straightforward since
it just releases one waiting thread. Therefore, it simply increments
the sema_ semaphore by 1:
int
pthread_cond_signal (pthread_cond_t *cv)
{
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = cv->waiters_count_ > 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
// If there aren't any waiters, then this is a no-op.
if (have_waiters)
ReleaseSemaphore (cv->sema_, 1, 0);
}
The pthread_cond_broadcast function is more complex and
requires two steps:
- It wakes up all the threads waiting on the
sema_semaphore, which can be done atomically by passing thewaiters_count_toReleaseSemaphore; - It then blocks on the auto-reset
waiters_done_event until the last thread in the group of waiting threads exits thepthread_cond_waitcritical section.
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
// This is needed to ensure that <waiters_count_> and <was_broadcast_> are
// consistent relative to each other.
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = 0;
if (cv->waiters_count_ > 0) {
// We are broadcasting, even if there is just one waiter...
// Record that we are broadcasting, which helps optimize
// <pthread_cond_wait> for the non-broadcast case.
cv->was_broadcast_ = 1;
have_waiters = 1;
}
if (have_waiters) {
// Wake up all the waiters atomically.
ReleaseSemaphore (cv->sema_, cv->waiters_count_, 0);
LeaveCriticalSection (&cv->waiters_count_lock_);
// Wait for all the awakened threads to acquire the counting
// semaphore.
WaitForSingleObject (cv->waiters_done_, INFINITE);
// This assignment is okay, even without the <waiters_count_lock_> held
// because no other waiter threads can wake up to access it.
cv->was_broadcast_ = 0;
}
else
LeaveCriticalSection (&cv->waiters_count_lock_);
}
As mentioned above, this solution requires that the
pthread_cond_signal and
pthread_cond_broadcast are only called by a thread that's
locked the same external_mutex.
Evaluating the SignalObjectAndWait Solution
Our implementation ofpthread_cond_wait relied on the
SignalObjectAndWait function, which was first released
with Windows NT 4.0. Although this solution provides fairness, it has
more overhead than the earlier solutions. For instance, the
external_mutex is a mutex rather than a critical section,
which increases locking and unlocking overhead. Moreover, the
pthread_cond_wait and pthread_cond_broadcast
implementations require additional synchronization operations to
ensure fairness when broadcasting.
Unfortunately, the SignalObjectAndWait function is not
available in Windows CE, Windows '95, or Windows NT 3.51. Therefore,
emulating condition variables can be trickier, less ``fair,'' and more
computationally expensive on these Win32 platforms. The next article
in this series will show how to develop a portable condition variable
implementation for platforms that lack
SignalObjectAndWait.
5. Concluding Remarks
This article illustrates why developing condition variables on Win32 platforms is tricky and error-prone. There are several subtle design forces that must be addressed by developers. In general, the different implementations we've examined vary according to their correctness, efficiency, fairness, and portability. No one solution provides all these qualities optimally.
The SignalObjectsAndWait solution in Section 3.4 is a
good approach if fairness is paramount. However, this approach is not
as efficient as other solutions, nor is it as portable. Therefore, if
efficiency or portability are more important than fairness, the
SetEvent approach described in Section 3.2 may be more
suitable. Naturally, the easiest solution would be for Microsoft to
simply provide condition variables in the Win32 API.
The C++ source code for POSIX condition variable on Win32 described in
this article is freely available with the ACE framework at
//www.cs.wustl.edu/~schmidt/ACE.html.
Acknowledgements
Thanks to David Holmes <dholmes@mri.mq.edu.au>, Bil Lewis <Bil@LambdaCS.com>, Alan Conway <aconway@iona.ie>, Kenneth Chiu <chiuk@cs.indiana.edu>, Jason Rosenberg <jason@erdas.com>, James Mansion <james@wgold.demon.co.uk>, Saroj Mahapatra <saroj@bear.com>, Yaacov Fenster <fenster@zk3.dec.com>, John Hickin <John.Hickin.hickin@nt.com> and Patrick Taylor <patrick.taylor@template.com> for many suggestions and constructive comments that helped to improve this paper.
References
[Cargill] Tom Cargill, ``Specific Notification for Java Thread Synchronization,'' Proceedings of the Pattern Languages of Programming Conference, Allerton Park, Illinois, August, 1996.[GoF] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Software Architecture, Addison-Wesley, 1995.
[Lea] Doug Lea, Concurrent Programming with Java, Addison Wesley, 1996.
[MSMutex] ``Mutex Wait is FIFO but Can Be Interrupted,'' in Win32 SDK Books Online, Microsoft Corporation, 1995.
[Pthreads] ``Threads Extension for Portable Operating Systems,'' IEEE P1003.4a, February, 1996.
[Richter] Jeffrey Richter, Advanced Windows, 1997, Microsoft Press, Redmond, WA.
[Schmidt] Douglas C. Schmidt and Steve Vinoski, ``Comparing Alternative Programming Techniques for Multi-threaded CORBA Servers: Thread Pool,'' C++ Report, Volume 8, No. 4, April, 1996.