Saturday, August 18, 2018

Mana Engine: Achieving thread safety

I've recently been talking more about the engine I've spent a few years developing. It's been through a lot of revisions, entirely dumping the core of how it works a time or two.

It's most recent iteration has been developed by myself and Rob Brink (@rob_brink on twitter). We both have a bit of experience developing game engines at this point and wanted to really push a core-neutral, thread-safe, easy to use engine. As Rob has often said (in some variation or another) "It needs to be easier to use than falling on your face."

There is a great talk in the GDC vault by the tech director of Overwatch, where he goes over the ECS of their engine. At some point he mentioned that some systems can run in parallel, because you can statically know that they won't ever touch the same component data.

We wanted to take this to the next level by letting the engine handle all of that for you.

What we came up with (and Rob did most of the implementation of) was a way to generate a graph of dependent tasks. We would statically enforce access to components by type, so that we could guarantee that the engine knew when a component type was being read from or written to. Using this knowledge, it could then build the graph.

A system ends up looking something like this:

class MySystem : public AuthorizedSystem< const MyComponent1, const MyComponent2, MyComponent3 >
{
    void Update()
    {
        //operate on the data.
    }
}

There's a bit more to it than that. For instance, accessing components requires some special syntax so that the engine can know whether this system is allowed to use the data or not. But ignoring those details for a moment, lets see how this helps us write thread-safe code.

AuthorizedSystem is a special type of system that is templated in a similar way to a tuple. Using some typetraits similar to std::is_const and some other static helpers, we can identify which ones are allows to be written to or read from. The scheduler knows that when two systems don't access the same data, or only read from that data, they can be run in parallel. It also knows that any writers need to go before any readers. There ends up being some possibilities of circular dependencies, and I'll discuss more about resolving those later.

Currently, in the test game we're making to prove out the engine, this is what our dependency graph looks like.


Side bar: If you're not familiar with webgraphviz.com, it's a very worthwhile and easy to use tool.

In reality, this is a pruned version of our dependency graph. We have some other systems that I didn't include in this run because I'm not testing them. Rob wrote a nice Boids system that I didn't want to use in this run because I was testing other things.

And here's a delicious bonus to all this. Want to know what it took to remove the 5 Boids systems? In my main initialization function, I just had to comment out this line:

Boid::RegisterSystems(world);

We also have some auto generated systems that helps do some bookkeeping around components that I've omitted from the graph for the sake of simplicity.


Okay, so this looks neat and all, but why does it matter?

One thing I've noticed over the years as a developer is that some programmers tend to have an affinity for the harder problems, often within a specific domain. We have graphics programmers, engine programmers, gameplay programmers, networking programmers, audio programmers. The list of specializations goes on. Often enough, especially now when multi-threaded programming has yet not been a must-have understanding to write game code, programmers of all sorts of specialties have not needed to know how to deal with multi-threaded problems.

And in the Mana Engine, they don't have to. The engine is built to help you deal with multi-threaded problems and whenever possible, statically assert that you'll never run into race conditions or performance problems because of things like false sharing.

Recently, Monster Hunter World on PC was revealed (unconfirmed by the developer) to spend a quarter of it's processing time just managing thread overhead, and had around 100 or so of them. Lock-free and wait-free programming isn't easy. And even if you've solved it before that doesn't mean you won't mess up if you need to do it again.

Riot's recent engineering blogpost about performance shows a graph where the main thread is waiting on the particle thread before it can continue. It'd be nice if, in cases like this, instead of waiting the main thread could assist on some of the work the particle thread is doing. But then you get into other issues where now particle work is being done on the main thread and locking might get screwed up. Rob and I have both seen our fair share of bugs and deadlocks caused by the pattern of letting threads "assist" with another thread's work.

In the Mana Engine, we don't even have a main thread. We've chosen to solve the problem before it even happens.

In the Mana Engine, there are always the same or fewer threads than the computer's logical cores and the overhead is absolutely minimal. Solved that problem before it even happened.


In future posts, I hope to cover some more details about the Mana Engine, some of the problems we've ran into, and how we've solved them.

In the mean time, if you have any interest in the Mana Engine and it's details, message me on twitter and I'll be happy to share what we've learned.