Posts In:

Avoiding GPU Timeout via Dynamic Load Balancing

December 18, 2012 General News 0 Comments

There's really nothing better in life than when you conceive of something, imagine that it's probably going to be quite difficult to do, then end up getting it to work in five minutes. Seriously. Best feeling ever. It's what just happened with one of the core pieces of the Limit Theory Engine: the GPU texture generator.

I've always known that the engine would crash on under-powered GPUs if it tried to generate really high-res textures (for example, 2048 resolution skyboxes), because the job would take too long, and Windows would think that the driver crashed and would kill it and restart. I believe the default timeout is 2 seconds. So if your texture can't generate in 2 seconds, you're dead. Needless to say, a really high-quality, procedural skybox needs more than 2 seconds to generate on integrated chips. So to alleviate the problem, I split the job up into several pieces (rendering only a certain portion of the texture at once), forcing a GPU sync (glFinish) after each one. In theory, this ensures that each piece takes less than 2 seconds to generate, so Windows doesn't get angry. But it's inefficient to split up the job, as it increases overhead significantly. For powerful GPUs that can generate it all in one go, you don't want to split the job at all.

The solution? Very simple, really: define job size n. Initialize it to 1. Now, generate n columns of the texture and time the operation. Use the stencil test to effortlessly select n columns of the texture for rendering without having to do tricky quad math. Make sure to force GPU sync after each job (glFinish). Now, use the elapsed time to adjust job size, then repeat until all columns have been generated. It's a no-brainer, really, but one might expect that unexpected complexities creep up in implementation.

Nope. Worked the first time. I'm now able to generate 2048 skyboxes on my laptop without crashing! The scheme, in theory, will never crash, because it uses actual timings to adjust the load on the GPU. For now, I've set the target time at 1 second, just to be safe. So my powerful desktop machine will pretty much do the whole thing in one job (after it times the initial size-1 job), while my laptop determines that 609 columns is optimal, so uses about 4 jobs per cubemap face.

Now, it scares me a bit to think that the first job, which is only 1 column wide, would be used to make an initial guess at the maximal load. You might imagine that if the timer accuracy isn't great, we could end up overestimating the GPU's capability and crashing. So it might be wise to implement a gradual scale-up, such that the job size can't change too dramatically during the first few iterations. At first, I did this, but have yet to have the naive scheme crash on me, so I backed off and am happily using the naive scheme for now.

Procedural Space Stations II

A little bit over a year ago, I spent an hour or so whipping up a really silly/stupid algorithm for building radial space stations. Over the past few days, I've been cooking up some more advanced geometric technology to use in my second attempt. Today, I started building a new algorithm for procedural space stations. I'm very pleased to say that my results in one day are a whole lot better than they were last year!! Good to know I'm making progress 🙂

Right now, the procedural building blocks that the algorithm is using are extremely simple, just beveled boxes and a radial wheel-like thing (borrowing from my first algorithm). But this repertoire can obviously be greatly expanded, and the algorithm is able to automatically analyze the pieces and figure out how to build a structure out of them! It's pretty cool stuff...I must say, this new algorithm is simple but elegant, just the way I like them 🙂

Here's an example schematic:

And here's me exploring it in-game:

Again, notice how structurally simple everything is. There's definitely nothing fancy going on here yet, but you can already see some nice visuals feels quite cool to be able to duck in and out of those spokes and fly around this superstructure! I can't wait to see how good this algorithm can get. I'm actually way more excited about these results than the ship results! Although, in the end, I think the algorithms can be unified...because a ship and a space station aren't really that different, are they?