Boost ASIO Stackful Coroutines May Resume In A Different Thread

Currently I'm learning C++ and specifically how to write binary Python modules with Boost.Python. As a part of my learning I've decided to develop a PEP-3333 compliant multithreaded WSGI server written in C++ using Boost.Asio and Boost.Python.

Initially stackful coroutines looked like a good choice because of their promise of "asynchronous logic in a synchronous manner". I implemented the server logic, including parsing a HTTP request, creating an environ dictionary, calling a WSGI application and then walking over the iterable that it returns. Everything worked fine until I added threads. Then Python interpreter started to crash complaining about thread state not being current.

I guess everyone familiar with Python well enough knows about the infamous Global Interpreter Lock or GIL. The current thread that works with Python-C API must hold the GIL to prevent other threads from running and accessing the same Python objects that are inherently not thread-safe. However the GIL can be released if a program works on pure C/C++ level not involving Python, thus allowing other Python threads to run while the current threads works with C data.

Initial plan was simple: to release the GIL on server start, then to acquire it each time the server needs to call the Python-C API, either directly or wrapped with Boost.Python, and release the GIL again after the job is done. But, as I said, when I created a thread pool by calling boost::asio::io_service.run() in several threads the Python interpreter started to crash. Since you cannot debug Python extensions directly in Visual Studio (or can you?) I added plenty of logging messages to investigate at which point my server crashes. And eventually I found out that after an asynchronous call (async_write to send a next chunk of data from a WSGI app, to be exact) execution of a request handler function resumed in a different thread than it was started, meaning that the request handler tried to release the GIL not in the same thread that it was acquired leading to Python interpreter crash.

My initial reaction was "WTF?", because the official documentation is not clear enough about thread safety of stackful coroutines, but after googling it I found out that accessing some kind of a mutex (and the GIL is a mutes too) from a coroutine running in a thread pool indeed leads to undefined behavior. This means that each asynchronous operation (async_read, asunc_write and so on) that accepts boost::asio::yield_context as a handler may resume in a different thread than it was called, despite the fact that the code indeed looks like synchronous with consecutive execution order.

From technical point of view, this is not surprising because with a thread pool asio::io_service dispatches a callback in any thread that is currently "free", and in a stackful coroutine all code after an asynchronous operation is essentially a callback for that asynchronous operation.

All this means that if you planning to use Boost.Asio stackful coroutines with a thread pool you must avoid accessing non-thread-safe objects, especially some kinds of thread locks, within the same code block before and after asynchronous operations, because the code block may end in a different thread than it was started.

At the end I decided to stick to synchronous reads and writes while processing requests to a WSGI application. My WSGI server is almost ready and I plan to publish it in the near future.