Avoiding GPU Timeout via Dynamic Load Balancing

December 18, 2012 General News 0 Comments

There's really nothing better in life than when you conceive of something, imagine that it's probably going to be quite difficult to do, then end up getting it to work in five minutes. Seriously. Best feeling ever. It's what just happened with one of the core pieces of the Limit Theory Engine: the GPU texture generator.

I've always known that the engine would crash on under-powered GPUs if it tried to generate really high-res textures (for example, 2048 resolution skyboxes), because the job would take too long, and Windows would think that the driver crashed and would kill it and restart. I believe the default timeout is 2 seconds. So if your texture can't generate in 2 seconds, you're dead. Needless to say, a really high-quality, procedural skybox needs more than 2 seconds to generate on integrated chips. So to alleviate the problem, I split the job up into several pieces (rendering only a certain portion of the texture at once), forcing a GPU sync (glFinish) after each one. In theory, this ensures that each piece takes less than 2 seconds to generate, so Windows doesn't get angry. But it's inefficient to split up the job, as it increases overhead significantly. For powerful GPUs that can generate it all in one go, you don't want to split the job at all.

The solution? Very simple, really: define job size n. Initialize it to 1. Now, generate n columns of the texture and time the operation. Use the stencil test to effortlessly select n columns of the texture for rendering without having to do tricky quad math. Make sure to force GPU sync after each job (glFinish). Now, use the elapsed time to adjust job size, then repeat until all columns have been generated. It's a no-brainer, really, but one might expect that unexpected complexities creep up in implementation.

Nope. Worked the first time. I'm now able to generate 2048 skyboxes on my laptop without crashing! The scheme, in theory, will never crash, because it uses actual timings to adjust the load on the GPU. For now, I've set the target time at 1 second, just to be safe. So my powerful desktop machine will pretty much do the whole thing in one job (after it times the initial size-1 job), while my laptop determines that 609 columns is optimal, so uses about 4 jobs per cubemap face.

Now, it scares me a bit to think that the first job, which is only 1 column wide, would be used to make an initial guess at the maximal load. You might imagine that if the timer accuracy isn't great, we could end up overestimating the GPU's capability and crashing. So it might be wise to implement a gradual scale-up, such that the job size can't change too dramatically during the first few iterations. At first, I did this, but have yet to have the naive scheme crash on me, so I backed off and am happily using the naive scheme for now.

Procedural Space Stations II

A little bit over a year ago, I spent an hour or so whipping up a really silly/stupid algorithm for building radial space stations. Over the past few days, I've been cooking up some more advanced geometric technology to use in my second attempt. Today, I started building a new algorithm for procedural space stations. I'm very pleased to say that my results in one day are a whole lot better than they were last year!! Good to know I'm making progress 🙂

Right now, the procedural building blocks that the algorithm is using are extremely simple, just beveled boxes and a radial wheel-like thing (borrowing from my first algorithm). But this repertoire can obviously be greatly expanded, and the algorithm is able to automatically analyze the pieces and figure out how to build a structure out of them! It's pretty cool stuff...I must say, this new algorithm is simple but elegant, just the way I like them 🙂

Here's an example schematic:

And here's me exploring it in-game:

Again, notice how structurally simple everything is. There's definitely nothing fancy going on here yet, but you can already see some nice visuals emerging...it feels quite cool to be able to duck in and out of those spokes and fly around this superstructure! I can't wait to see how good this algorithm can get. I'm actually way more excited about these results than the ship results! Although, in the end, I think the algorithms can be unified...because a ship and a space station aren't really that different, are they?

Stable Exponential Smoothing Under Variable Framerate

November 26, 2012 General News 2 Comments

The title might sound fancy, but this post is about a very practical thing that you might encounter quite often. A great way to smooth any kind of changing feature in your game is to use linear interpolation each frame to interpolate between the current value and the target value, like so:

myVar = lerp(myVar, target, 0.5)

It's so easy! And it creates a very nice, smooth transition between the old value of myVar and the new value, target. If you think about it for a bit, you'll realize that this thing is somehow exponential, because the rate at which myVar approaches target is obviously dependent on the difference between myVar and target. Hence the "exponential smoothing" in the title.

Now, there's a bit of a problem: where did I get the value 0.5? I could have just played with the constant until I was pleased with the rate of smoothing. But there's a bigger problem than arbitrary constants here: this code is highly framerate-dependent. Why? Well, if we're running at 60fps, then this interpolation happens 60 times in a second, which will obviously leave myVar a lot closer to target than if we were running at 10fps and the interpolation happened 10 times. The end result? You will notice things happening faster when your framerate is higher, even if you're using the proper time differential for physics. The problem is that we didn't account for the time differential when we wrote this function.

I'm not going to put the full derivation in this post, since my blog doesn't support Latex yet...but I will give you the answer to this problem:

myVar = lerp(myVar, target, 1. - pow(.5, dt))

What exactly is going on here? You can see that I'm now using dt, which is the time differential between frames (hence, we are taking into account framerate). But what of the pow and the .5? It turns out that the .5 now has a very precise meaning - it means that for every second that passes, .5 of the difference between myVar and target will decay. So after 3 seconds, myVar will have come 87.5% of the way to target (measuring from the initial value of myVar). This gives you a very easy way to think about the constant!

Not only is this a great little math trick to have up your sleeves, but it turns out that it is intimately related to the solution for making linear drag framerate-independent, which is something that you will not achieve if you just do the obvious discrete physics. I'll probably post on that later 🙂

Limit Theory

November 20, 2012 General News 5 Comments

And here we have it, the last few months of my life, distilled into one video. Now, what I'd like to do is turn that into a full year and beyond of work...who knows, maybe people will be as excited about procedural space games as I am? We'll find out 🙂

Beam Weapons and Epic Battles

October 15, 2012 General News 5 Comments

Beam weapons are now fully-implemented (the previous implementation was just a hacky prototype). A nice detail of my current weapon implementation is that turrets on ships actually rotate to track their target. It's pretty cool seeing those massive turrets on capital ships swivel towards an enemy ship, then rip them apart with a huge beam of energy.

I'm doing some stress tests with loads of ships in a free-for-all just to see where the bottlenecks in the engine are located. Not surprisingly, almost all of the time is spent in the code for pulse and beam weapons, performing the necessary scene raycasts in order to make collision work. I've already accelerated this a lot with a spatial hashing structure, enabling fast raymarching for short rays (pulses and beams qualify for this). Then, there's a world space AABB test and, finally, OPCODE does its thing with those AABB-trees.

Apparently I've done a pretty good job of optimizing so far, because I can handle about 50 ships in an all-out, epic war before my fps drops below 60. Still, I'd like to push that number to at least 100. I'm not going to settle for limiting the number of ships that can be in an engagement in this game - if you've got the cash to hire a hundred wingmen, you should be able to use them!

Coming Together

October 10, 2012 General News 0 Comments

The past few weeks have been a flurry of small changes, incremental improvements, and endless tweaking to hone the look of the game. Everything's starting to come together! With a decent ship algorithm, a good metal shader, and some decent procedural textures, things are starting to look pretty nice.

I honestly never thought it would end up looking this good. I guess I got carried away...then again, that's why I'm in graphics. It's just far too fun and gratifying, trying to make everything look as good as possible. Still, at a certain point, I am going to seriously shift my focus back to gameplay. For now, though, I need to impress for Kickstarter!

Metal and Ambient Occlusion

October 6, 2012 General News 0 Comments

I've spent a LOT of time on the look of metal lately. Finally, I've managed to make ships look like they're made of metal!!! This is the first time I've ever had any success with metal BRDFs (even in my offline rendering experiments). I think the main key that I was missing is that the distribution really needs to be exponential rather than power-based to look convincing, otherwise the surface looks like plastic. Another key is to modulate the specular amplitude in some interesting way (just modulating it with albedo looks great).

On top of good metal shaders, I now have new-and-improved SSAO in the engine. Together, the metal and AO are making my ships look better than ever before! In addition, I think I'm finally getting closer to a good procedural ship algorithm. It's taken forever, and I'm not there yet...but I'm getting closer.

Here's a cool shot that I took tonight. This has a bit of post-process on it for dramatic effect; I wanted to use it as a wallpaper. Still, you can get a nice sense of the metallic surface.

Clouds

October 2, 2012 General News 1 Comment

They turned out way better than I expected! 🙂

Terrestrial-like planets: solved?

Specularity

September 28, 2012 General News 0 Comments

Turns out it makes a pretty big difference...

🙂

Better Planets

September 28, 2012 General News 0 Comments

I really don't want to spend all my time on planets, because I see how often these kinds of projects get wrapped up in planetary generation and then never come back...still, I guess I need to have good planets since pretty much everybody does procedural planets these days.

They're getting better, although I constrained them to looking "Earth-like" for now. I think instead of picking random colors like before, I'm going to have to constrain the generator to picking from a list of palettes that I will create, since some of the planets before just looked plain ugly...purple land and green water don't really go well together.

The lighting model still needs a lot of work. In particular, the bumps are too exaggerated (and actually based off of luminance instead of height), and the water does not react to light differently than the ground, which is a terrible thing. Nonetheless, it's a step forward!