OSRE: The threading model

When I started to work on my OSRE-Engine I was just curious how to answer the following question: how can I work with a separate render-thread without getting mad all day long. And what the fuck is a thread. So I decided to implement my own multi-threaded renderer some years ago. Special when you want to learn something like this you need some time to think about it. Unfortunately I was getting a father during this period of time ( 6 years ago ) as well. So it took quit a while to get a good understanding about it.

After 6 years I have the feeling that maybe someone else is interesting in my design as well so I decided to write a blogpost about it. Hopefully you can enjoy it:

Before C++11 threading was not a default feature of C/C++. So if you wanted to use it you needed to implement it for the runtime of your Operation-System. I started using the Win32-API, because my plan was to run a separate render-thread for my own RenderDevice based on DirectX9. You can find some kind of documentation about threading here .

When you wants to start aapplication it first starts its own process. A process is some kind of os-specific container for your application. When you are writing a simple Hello-World-app you compiler will generate an executable which contains an entry point for a new process ( in C/C++this entry-point will be called main() ). A process owns its own adress-space, simplified it encapsulates the context ( handles / memory / resources ) of your application. No other application is allowed to access any memory adress of your process. And: in a process you can start different threads. When starting a process there will be one thread running you Hello-World. A thread is a OS-specific way to run code. A process can start several threads. SO what is the reason to run more than one thread. There are a lot of reasons to run threads:

  • You have to run a time-consuming computation and you do not want to block you main thread because he is controlling your UI ( and you do not want to have a blocked UI )
  • You want to run several computations at the same time on different cores
  • You are running a server and you don’t want to get one client blocked by other clients, so each client handling is running in its own thread

In my ase my plan was to run one main thread, which will run the main engine code. And I wanted to spawn one render thread, which will encapsulating the rendering completely. THe draw calls will not be blocked by any other computation. I started a simple Win32-Thread and implemented a renderer based on D3D9 for it. This is simple: just take an example Triangle render-example and start it in a separate thread:

class RendererD3D9 {
    void renderScene(); // do the render magic here

static void RenderFunc(void *data) {
    RendererD3D9 *renderer = (RendererD3D9*) data;
    bool Stop = false;
    while( Stop ) {
        renderer->renderScene(); // do the rendering here

int main( int argc, char argv[] ) {
    var renderer  = CreateRenderer();
    RunThread( renderer );

    return 0;

Looks simple, you have to use the Win32-API to spawn the function RenderFunc in a separate thread and your triangle will be rendered. Unfortunately we need some kind of way to communicate with this renderer during runtime:

  • I want to update the uniform buffers, because my triangle needs to move
  • I want to change the scene

You need a functionality to communicate with this thread: a concurrent-queue. When the thread will be spawned it looks for any stuff which was enqueued and handles it:

class RendererD3D9 {
    void updateScene( void *data );
    void renderScene();

class ConcurrentQueue {
    void enqueue( void *data );
    void *dequeue();
    bool isEmpty() const;

    Mutex m_mutex;      // to lock the access
    Condition m_finish; // to terminate the thread;

struct ThreadData {
    RendererD3D9 *renderer;
    ConcurrentQueue *queue;

static void RenderFunc(void *data) {
    ThreadData *threadContext= (RendererD3D9*) data;
    bool Stop = false;
    while( Stop ) {
        if ( !threadContext->queue->isEmpty() ) {
            threadContext->renderer->updateScene( threadContext->queue->dequeue() );

The main- and the render-thread are able to access the concurrent queue, If the main thread wants to add any updates he can enqueue the update data for the renderer. In the render-thread the thread will look before rendering the next frame if there are any updates for the scene he shall render. If this is the case this data will be dequeued from the concurrent-queue and the updateScene-call will be performed. Afterwards the rendering of the next frame will be performed.

This concepts works fine for long-runnning tasks like rendering a frame 60 times a second. But the design works only for a renderer at the moment. If you want to be able to run arbitrary stuff in a thread you need a way to install your own handlers. So I introduced an abstract interface called AbstractEventHandler. The data will be encapsulated in an event ( a small class / struct which contains the payload for the next handling ):

class Event {};     // contains the Id of the event
calss EventData {}; // contains the data assigned to the event
class AbstractEventHandler {
    virtual void onEvent( const Event &event, conste EventData *data )

struct ThreadData {
    Condition *finished;
    AbstractEventHandler *handler;
    ConcurrentQueue *queue;

static void ThreadFunc(void *data) {
    ThreadData *threadContext= (ThreadData *) data;
    bool Stop = false;
    while( Stop ) {
        if ( !threadContext->queue->isEmpty() ) {
            threadContext->handler->onEvent( threadContext->queue->dequeue().getEvent(), threadContext->queue->dequeue().getEventData() );
        Stop = finished->isSignaled(); 

So a developer can install his own handler-code by deriving his own EventHandler-classed from AbstractEventHandler, the data to deal will be descriped by the event and the corresponding data is stored in EventData. If you want to stop the thread execution you can use the Condition, which is a way to use a flag in a threadsafe way.

The next step was to build an interface for working with threads on different API’s. I called it AbstractThread. It encapsulates the implementation details for running a thread:

class OSRE_EXPORT AbstractThread {
    ///	The function pointer for a user-specific thread-function.
    typedef ui32 (*threadfunc) ( void * );

    ///	@brief	This enum describes the priority of the thread.
    enum class Priority {
        Low,	///< Low prio thread.
        Normal,	///< Normal prio thread.
        High	///< High prio thread.

    ///	@enum	ThreadState
    ///	@brief	Describes the current state of the thread.
    enum class ThreadState {
        New,			///< In new state, just created
        Running,		///< thread is currently running
        Waiting,		///< Awaits a signal
        Suspended,		///< Is suspended
        Terminated		///< Thread is terminated, will be destroyed immediately

    virtual ~AbstractThread();
    virtual bool start( void *pData ) = 0;
    virtual bool stop() = 0;
    virtual ThreadState getCurrentState() const;
    virtual bool suspend() = 0;
    virtual bool resume() = 0;
    virtual void setName( const String &name ) = 0;
    virtual const String &getName() const = 0;
    virtual void waitForTimeout( ui32 ms ) = 0;
    virtual void wait() = 0;
    virtual AbstractThreadEvent *getThreadEvent() const = 0;
    virtual void setPriority( Priority prio ) = 0;
    virtual Priority getPriority() const = 0;
    virtual const String &getThreadName() const = 0;
    virtual AbstractThreadLocalStorage *getThreadLocalStorage() = 0;
    virtual void setThreadLocalStorage( AbstractThreadLocalStorage *tls ) = 0;
    virtual void setThreadId( const ThreadId &id ) = 0;
    virtual ThreadId getThreadId() = 0;

It contains some more functionality:

  • Each thread has its own id to be able to identify him
  • The interface offers a way to define a priority for the thread execution
  • You can assign a name to your thread if your operation system is offering this feature ( when using Win32-API there is a way to assign a name to a dedicated thread which will be shows in your debugger )
  • There is a state-machine which shows the internal state of the thread. This will help a user to monitor the state itself for debugging during runtime.

To define a way how to execute a thread with its assigned EventHandler-Instance there is a class called SystemTask:

class OSRE_EXPORT SystemTask : public AbstractTask {
    virtual bool start( Platform::AbstractThread *pThread );
    virtual bool stop();
    virtual bool isRunning() const;
    virtual bool execute();
    virtual void setThreadInstance( Platform::AbstractThread *pThreadInstance );
    virtual void onUpdate();
    virtual void awaitUpdate();
    virtual void awaitStop();
    virtual void attachEventHandler( Common::AbstractEventHandler *pEventHandler );
    virtual void detachEventHandler();
    virtual bool sendEvent( const Common::Event *pEvent, const Common::EventData *pEventData );
    virtual ui32 getEvetQueueSize() const;

You can start a system-task by assigning it its thread instance and its event-handler. Of course you can stop it as well. And you can send events to your thread. The events will be enqueued in the thread-specific concurrent-queue.

So I resolved my targets:

  • I was able to run a deticated thread for dealing with my render device by implementing a render-specific eventhandler class
  • I can communicate to the thread by using a concurrent queue
  • I can start / stop the thread execution
  • I can use this concept to define other tasks by implementing different event-handlers

One nice side-effect: By defining only the event-based protocoll how to render a scene you can decouple the way how to implement your renderer. It is encapsulated by the render-thread-specific event-handler. The user will only see the events how to work with it.