Windows Thread-Pool

How do you perform an action asynchronously?
While spawning a new thread is a convenient way to achieve this, it doesn’t come cheap: There is a clear limit on the number of actively running threads your system can sustain. Unless thread is waiting, the OS will work hard to provide it with an equal opportunity to run. This will most likely be to no ones benefit, since the amount of actual concurrency the hardware can provide is limited by the number of physical cores. Even waiting threads are taking up resources – for instance every thread holds on to about 1-2 MB of stack-memory (depending on the architecture).

However, often when we need to do some work asynchronously, we are not concerned with the identity of the thread who performs it, we don’t care how many threads will be around and we don’t mind some latency if there are no free threads available.

The Thread-Pool Pattern emerges naturally to solve this common engineering problem:
You keep a pool of threads ready and waiting for work, and send “work-items” through a concurrent queue. (With you being the producer and the worker-threads acting as consumers)

Unlike std::thread for threads, no implementation of the Thread-Pool Pattern made it into the standard library yet. You should only consider writing you own if:
A. Your system has some widely exotic constraints and you know exactly what you are doing
or B. You are preparing for a job interview and want to strengthen your concurrency skills.
If you are a Windows developer there is a convenient Thread-Pool implementation built into Win-API. Lets check it out.

The Thread-Pool Object
Instead of jumping straight to usage, we will first inspect what makes a Thread-Pool. Thread-Pool is a resource, and as a resource it has to be managed. That’s where RAII comes in handy:

class ThreadPool
    ThreadPool(unsigned minThreads, unsigned maxThreads);

    ThreadPool(const ThreadPool&) = delete;
    ThreadPool& operator=(const ThreadPool&) = delete;
    ThreadPool(ThreadPool&& other);
    ThreadPool& operator=(ThreadPool&& other);

    // Useful API will go here

    PTP_POOL _pool;
    _TP_CALLBACK_ENVIRON_V3 _environment;
    PTP_CLEANUP_GROUP _cleanupGroup;

The _pool pointer holds to the actual Thread-Pool resource.
We will use _environment to connect work-items to our custom Thread-Pool.
_cleanupGroup pointer will help us clean things up neatly when we are done.

Our Thread-Pool wrapper will be default-constructable, with an option to specify the min/max number of threads it will contain:

    // 1. Create Thread-Pool:
    _pool = CreateThreadpool(nullptr);
    if (_pool == nullptr)
        throw std::exception("Could not create a thread pool!");

    // 2. Initialize the Thread-Pool Environment struct:
    SetThreadpoolCallbackPool(&_environment, _pool);
    SetThreadpoolCallbackLibrary(&_environment, GetCurrentModule());

    // 3. Create Thread-Pool Clean-up Group:
    _cleanupGroup = CreateThreadpoolCleanupGroup();
    if (_cleanupGroup == nullptr)
        throw std::exception("Could not create a thread pool clean-up group!");
    SetThreadpoolCallbackCleanupGroup(&_environment, _cleanupGroup, nullptr);

ThreadPool::ThreadPool(unsigned minThreads, unsigned maxThreads)
    : ThreadPool()
    SetThreadpoolThreadMaximum(_pool, maxThreads);
    SetThreadpoolThreadMinimum(_pool, minThreads);

The line SetThreadpoolCallbackLibrary(&_environment, GetCurrentModule()) connects the Thread-Pool environment to our executing module. If for any reason our library will get unloaded, we want the thread-pool to go down with it. Otherwise all our callbacks will be no longer valid, causing undefined behavior.

In order to avoid aliasing, the type will be non-copyable. Instead it will support moving the underlying Thread-Pool around. I will omit move semantics implementation. Instead, here is how you are supposed to clean things up:

    // Wait for any previously scheduled tasks to complete, but stop accepting new ones
    CloseThreadpoolCleanupGroupMembers(_cleanupGroup, false, nullptr);
    // Clean-up resources:

Abstracting Work
If you are a Win-API developer, you are used to representing user-defined work as callback functions. For the Thread-Pool, work-item is a pointer to a function of type:

  _Inout_     PTP_CALLBACK_INSTANCE Instance,
  _Inout_opt_ PVOID                 Context,
  _Inout_     PTP_WORK              Work

It must be located outside of any class, but comes with a pointer to “whatever”; you are expected to use this pointer to pass any necessary data into the callback.

If like me, you come from a language like C#, to you a work-item is a lambda:

ThreadPoolWork SubmitCallback(std::function<void()> workItem);

What I have in mind is to store the lambda on the heap, pass it as the context and inside the actual callback, run and delete it:

ThreadPoolWork ThreadPool::SubmitCallback(std::function<void()> workItem)
    auto context = new std::function<void()>(workItem);

    auto work = CreateThreadpoolWork(Callback, context, &_environment);
    if (work == nullptr)
        delete context;
        throw std::exception("Can't create thread-pool work!");

    return ThreadPoolWork(work);
void CALLBACK Callback(PTP_CALLBACK_INSTANCE instance, PVOID context, PTP_WORK work)
    auto functionPtr = static_cast<std::function<void()>*>(context);
    auto functionCopy = *functionPtr;
    delete functionPtr;


For the sake of abstraction, I had to make an extra heap allocation. There may be more inside the std::function implementation. If it matters to the performance of your application, you can use the raw API directly. However, I find it hard to avoid heap allocations when using the Thread-Pool anyway, since the callback is getting initialized and executed in two completely unrelated call-stacks.

The class ThreadPoolWork is there to hold on to PTP_WORK pointer, but it does not own it. From my experience, most Thread-Pool requests are fire & forget, so it makes more sense for the work-item to clean after itself. Instead, ThreadPoolWork lets you block until the work is done:

void ThreadPoolWork::Wait()
    if (_work != nullptr)
        // Wait for the work to be done
        WaitForThreadpoolWorkCallbacks(_work, false);
        _work = nullptr;

Advanced Features
Win-API implementation provides much more then what we’ve seen so far. There are two features I find particularly useful – Timers and Waits.

Thread-Pool Timer lets you schedule work to be done not “as-soon-as-possible”, but rather at certain points in time. Whenever you have code like:


You can replace it with:

After(x) Do-On-Thread-Pool(
    []() { 

These two approaches are similar, but the second lets the original thread do other productive things instead of just sleeping.

ThreadPoolTimer ThreadPool::SubmitTimer(unsigned delayMs, unsigned periodMs, 
        std::function<void()> workItem)
    auto context = new std::function<void()>(workItem);

    auto timer = CreateThreadpoolTimer(TimeoutCallback, context, &_environment);
    if (timer == nullptr)
        delete context;
        throw std::exception("Can't create thread-pool timer!");

    auto fileDelayTime = MsToFiletime(delayMs);
    SetThreadpoolTimer(timer, &fileDelayTime, periodMs, 0);
    return ThreadPoolTimer(timer, context);

Thread-Pool Wait also lets you delay the execution of your callback, but instead of waiting for a timeout, you can wait for a Kernel-Object to be signaled (and possibly a timeout).

It is extremely common to see threads implementing the following basic pattern:

    res = WaitForMultipleObjects(A, B, C, ...);
    switch (res)
    case A:
        // Respond to A
    case B:
        // Respond to B
    case C:
        // Respond to C

This is where the Thread-Pool Wait comes handy:

ThreadPoolWait ThreadPool::WaitForKernelObject(HANDLE handle, unsigned timeoutMs, 
        std::function<void()> workItem, std::function<void()> onTimeout)
    auto context = new Context();
    context->WorkItem = workItem;
    context->OnTimeout = onTimeout;

    auto waitObject = CreateThreadpoolWait(WaitCallback, context, &_environment);
    if (waitObject == nullptr)
        delete context;
        throw std::exception("Can't create thread-pool wait!");

    auto fileTimeoutTime = MsToFiletime(timeoutMs);
    SetThreadpoolWait(waitObject, handle, &fileTimeoutTime);
    return ThreadPoolWait(waitObject, context);
void CALLBACK WaitCallback(PTP_CALLBACK_INSTANCE instance, PVOID context, 
        PTP_WAIT waitObject, TP_WAIT_RESULT waitResult)
    auto typedContext = static_cast<Context*>(context);
    auto workItem = typedContext->WorkItem;
    auto onTimeout = typedContext->OnTimeout;

    if (waitResult == WAIT_OBJECT_0)
    else if (waitResult == WAIT_TIMEOUT)

Unlike simple work-items, Timers and Waits are rarely fire & forget, they might trigger one callback, multiple callbacks or none at all depending on the circumstances.
If we want to be 100% sure the underlying object gets cleaned up we can’t leave it to the callback. In this case, the resulting ThreadPoolWait and ThreadPoolTimer classes are proper RAII wrappers for the underlying resources:

void ThreadPoolWait::Abort()
    if (_wait != nullptr)
        delete _context;
        _wait = nullptr;


The Default Thread-Pool
Every process comes with a default thread-pool. To use it, all you need to do is to pass NULL as the environment pointer parameter. The default thread-pool can’t have clean-up behavior because it is destroyed when the process dies. In addition, if your library gets unloaded you must manually cancel / wait for all work-items you have scheduled on the default thread-pool.

Combining powerful Win-API implementation with modern C++ 11 semantics will let you write more robust and scalable concurrent code. Getting the semantics right is challenging, but my ideas might help you get started. I’m particularly exited to see how this API can be combined with Visual C++ 15 Resumable Functions.
For more comprehensive information on the Windows Thread-Pool, including code samples, please see Windows via C++, 5th Edition by Jeffrey Richter.
As usual, the full code is available here and here.

Leave a Reply

Your email address will not be published. Required fields are marked *