Posts In: General News

## Ambient Occlusion II

January 25, 2013

Last time I talked about AO, but I left out a teensy little detail: although per-vertex AO is very easy to compute, and also extremely fast to render, it's extremely slow to compute during the pre-process.  To get high-quality, noise-free AO requires somewhere in the vicinity of 1000 samples of the density field per vertex.  Not exactly a cheap operation!  On the CPU, it quickly becomes prohibitively expensive as either the complexity of the density field or the resolution of the mesh increase.

Today, I moved the computation to the GPU, and have once again been blown away by the computational abilities of modern GPUs.  Now that I have every piece of the mesh computation process - the field evaluation, the gradient evaluation, and, finally, the AO evaluation - running on the GPU, it's simply mind-blowing how high I can push the quality of the mesh and the complexity of the density field.

Here's a mesh consisting of 50 unioned and subtracted round boxes (round boxes are very expensive compared to sharp-edged ones), contoured on a grid of 300 x 300 x 300 (that's an insane level of detail, FYI), resulting in half a million vertices, each of which takes 1024 AO samples.  The GPU performs this work in ~3 seconds.  Incredible.

High-Quality, Per-Vertex AO Computed on the GPU

But that's not even the most amazing part.  The amazing thing is that, after profiling, it would seem that the GPU actually takes less than 1 second to complete this work.  It is OpenGL's shader compiler (which, of course, is running on the CPU) that takes the majority of the time.  This isn't too surprising, as the shaders to compute these things are massive, since I actually bake the field equation into the shader.  I'm sure GL spends a long time analyzing and optimizing the equation, which is a good thing, because the shader runs absurdly fast.

Unfortunately, this brings up a few unwanted issues - I now have a CPU bottleneck that can't be easily offloaded to another thread.  Since the bottleneck is inside GL, I will need to explore multithreaded GL contexts in order to compile the shaders in another thread while the game runs, because I can't have the game stalling every time a new asset enters the region and the corresponding shaders have to be compiled.  Sadly, this probably won't be too easy, but I'm sure I'll learn a lot...!

Another, less-tractable problem is that the shader compiler flat-out crashes after a certain field complexity is hit.  I will need to explore this some more.  It might just be the fact that my field function dumps an incredibly-ugly equation into the shader (it's literally a single line, with hundreds of functions wrapped together).  Perhaps breaking it up will prevent the crash.  Or maybe I've hit some kind of hard limit on the allowed complexity of pixel shaders.  If that's the case, I could explore a solution that uploads the equation as a texture, and create a shader that understands how to parse an equation from a texture.  But that would no doubt be significantly slower than baking the equation into the shader...probably at least an order of magnitude slower :/

But for now, I will allow myself to be happy with these results, and am most definitely looking forward to working on ships again with this technology in hand!

## Ambient Occlusion

January 24, 2013

Finally, just like every other modern game, Limit Theory now has highly-exaggerated ambient occlusion.  At least mine isn't screen-space, though.  A while back I dropped SSAO because I was tired of it.  It's expensive, requires absurd levels of tweakery just to get right, varies with distance to the object, etc.  It's bad, let's face it.  Per-vertex AO probably isn't an option for many games due to tessellation level on the mesh.  For per-vertex to work, you need a consistent level of tessellation across the whole surface.  Hey, guess which really stupid method of creating surfaces provides just that?  Yep, uniform-grid contouring of distance fields.  Marching cubes, Dual Contouring, you name it.  I used to think that the uniform level of tessellation, even on really flat surfaces, was a big problem.  Now I realize that we can actually use it to our advantage by storing useful information at the vertex level!

Oh, another bonus: occlusion factor is almost too easy to compute with a scalar field.  As in, the function that I wrote to take a mesh and bake AO data in is literally 15 lines of very-readable, simple code.  Man, I love fields 🙂

The result is beautiful! (At least I think.)

Per-Vertex Ambient Occlusion

I love this kind of occlusion, because, unlike SSAO, it does not vary with distance.  So you can zoom in as far as you like, and those darkly-shaded corners are still dark (a result that is impossible with screen-space ambient occlusion techniques).  Note that I exaggerated it a lot for this shot, it probably won't be that heavy in the game!

Anyway, I'm very excited about using this in LT.  I can't wait to see how good ships will look once I get the ship generator converted to use scalar fields rather than polygons!!!

PS ~ You might be concerned about performance given the high tessellation of surfaces because of the large number of vertices that methods like MC and DC output.  Indeed, at first it seems like a valid concern.  But then again, vertices are extremely cheap these days.  Vertex shaders are significantly lighter than fragment shaders, and also get run significantly less frequently.  This means that in almost every modern game, you will find yourself very much bottlenecked by the fragment shader.  Furthermore, with field methods, you get perfect LOD meshes for free, you just turn down the contouring resolution.  So even if you're putting more vertices on screen, you can easily control this by setting up LOD levels.  With analytic normals, even low-resolution meshes tend to look quite good!  So I don't think vertex density is something to be too worried about.

## Distance Fields for Smoothed, Convex Polyhedra

January 23, 2013

Today was a very fun math day, and I want to share a bit of my experimentation.  Of all the things I did today, one of the most useful was to come up with a formula for the distance field of a convex polyhedron, defined by some set of planes.  I also implemented a smoothed version, which is the one I'll post, because it's quite interesting.

Disclaimer: I'm calling it a distance field because it looks like one and because I don't see many people properly differentiate (teehee?) between distance fields and things that look like distance fields but aren't actually.  So I'll join that club.  The 0-isosurface is the convex polyhedron (or smoothed version thereof), and it does exhibit a nice differentiable behavior, but I don't think that the scalar value actually represents a distance.  Certainly not a Euclidean one if it does.

Random, Smooth Convex Polyhedron

Suppose we have a set of $k$ planes defined by normals ${n_1, ..., n_k}$ and d-values (the dot product of some point on the plane with the plane normal) ${d_1, ..., d_k}$.  Define the shape factor, $p$, to represent how sharp the edges of the polyhedron are, where $p = 2$ corresponds to sphere-like edges, and $p = \infty$ corresponds to hard edges.  This definition is a reference to the p-norm, to which the following formula is closely related.  The field value at a point $x$ is given by:

Just so we're clear, that inner bit is the dot product of the position in the field $x$ and $i$th plane's normal, $n_i$. Intuitively, you can see that, with the $\infty$-norm, this gives the maximal (scaled) distance to any plane - 1, which will be 0 if we are on the polyhedron, and will fall off with the closest plane's distance.  This corresponds to our idea of what should happen for a hard-edged polyhedron.  For other norms, you can verify that it gives interesting, smooth results.

Now comes the real work.  This is great, but we also want analytic gradients for our fields, so that we can compute perfect normals for the surfaces rather than using ugly central differences.  Perfect normals go a long way towards making the polygonized result look good.  It's a bit painful, but if you just remember your multivariable calculus (I'll admit I struggled with trying to remember it for quite a while), it's not too terrible.  It's just application of a bunch of  rules.  I'm going to spare the derivation and give the result:

The left side is a scalar factor, closely related to the field value.  The right side is the actual vector component, which is just a sum of scaled plane normals.  All of it can be easily implemented in a single loop over the planes!

With these formulas in hand, you can create lots of cool, smooth shapes. Boxes, discrete cylinders, prisms, etc...or just random convex shapes!

Smooth Convex Cylinder

Smoother Convex Cylinder

If you're wondering how I came up with these formulas, it was by studying and generalizing the field formula for a smooth box, which is pretty obvious. This polyhedron formula is less obvious, in my opinion, but is still basically just a straightforward generalization of the box formula.

PS ~ As usual, I invite any lurking mathies to speak up if I've made a blunder 🙂

PPS ~ I hope you like the Latex plugin that I installed, solely for the purpose of this post..

## Triplanar Bumpmapping

January 13, 2013

The first article in the latest GPU gems is brilliant for so many reasons. It's no wonder they chose it for the first article. Not only does Ryan basically blow apart all previous heightmap-based work as far as terrain generation goes, but, on the side, he slips in an elegant (and blindingly-simple) little method for texturing arbitrary surfaces that don't seem easily-texturable. To cap it off, he mentions how to generate tangent bases for performing good-looking bumpmapping on the resultant surface. He could have stopped after GPU terrain generation with geometry shaders and still had a killer article...but I'm oh so glad that he didn't 🙂

Incidentally, I missed the bit about tangent bases on the first few reads, which is why I've been doing them wrong until now. In my defense, there's literally only one paragraph dedicated to the trick. But it's so, so simple: compute an approximate tangent basis separately for each projection, and blend the resulting bump information. How? Easy...you know the axis of the normal for each plane, so the intuitive thing to do is rotate the normal 90 degrees about the other two axes to obtain two approximate tangents. In practice, this means we can just define:

vec3 tanX = vec3( normal.x, -normal.z, normal.y); vec3 tanY = vec3( normal.z, normal.y, -normal.x); vec3 tanZ = vec3(-normal.y, normal.x, normal.z);

Then for the x projection, you use tanY and tanZ as your tangent frame, tanX and tanZ for y projection, etc. Now, if you stare at it a bit, you may quickly realize why it's a hack: these tangent vectors are not necessarily orthogonal to the normal!! Not much of a tangent plane if it's not..tangent -_- Only in the case of a perfectly-axis-aligned normal do we get an actually-tangent tangent frame. Otherwise, we get something that is not-tangent, where the not-tangency of each tangent vector is proportional to the square of the normal's component along that axis. Let me try to make an even more precise run-on sentence: we get a frame where the deviation of the cosine of each tangent vector to a true tangent vector is proportional to the square of the normal's component along the corresponding axis of rotation. Pull that one out at a dinner party and see how many twitter followers you acquire. But what I was trying to say, intuitively, is that, if you think about the rotations we performed, they only yield orthogonal vectors when the vector to be rotated is orthogonal to the axis of rotation. Right.

Is this a problem? Not really. In practice it looks good. If you want to get fancy, I guess you could do G-S orthogonalization or something of that nature. Better yet, you can probably come up with a compact little closed-form answer for each tangent frame based on axis of projection (because if we restrict to each hemisphere, we can chart it with a continuous frame...diff geom experts correct me if I am mistaken). But this looks good to me! And it's so simple!!

Thanks, NVIDIA, for an excellent article in an excellent book 🙂

## Fast, Accurate Collision Detection for Arbitrary Triangle Meshes

January 8, 2013

This is probably the single most exciting post in my three years of blogging. It's definitely what I feel to be the biggest technical achievement, in terms of what people usually think of as being technical achievements. At first I wrote an exceedingly long post and a mini-history of my experiences with collision detection, and the development of this new algorithm. But it was too long and boring, so I'm just going to get to the facts.

Tonight, frustrated by the poor performance of Bullet's concave-concave solver (GIMPACT), I've developed a general-purpose intersection algorithm that works for any kind of mesh (convex, concave, degenerate, ... pretty much whatever you want), allows any kind of world transform (including scaling, shearing, ... pretty much whatever you want), and is remarkably fast. Maybe it's not that great, maybe everyone already knows about it...but I feel like I would have heard about it by now. I mean, it takes less than 200 lines to implement this incredibly simple, powerful algorithm that blows GIMPACT out of the water. Am I missing something?? Maybe so. I need to do some more tests to make sure I'm not being premature, but initial tests show that it takes an absurdly small amount of time, even for insane meshes. For example, I tried checking two full-quality asteroids against one another...a test that should be prohibitively expensive (they're concave and roughly 50000 polys)...and it takes almost no time at all (concrete numbers to come in the future). Wow. I can run full physics on my full meshes now...! And best of all, I can quit worrying about this convexity nonsense, and also drop all the hacks that I had to use to get other physics engines to cooperate with scaling.

I've never been more excited about an algorithm than I am tonight. This may mean ditching the 3rd-party physics library entirely, which would be a dream come true!!! After three years, I've finally gotten collision detection right. I hope to write something on this algorithm soon, so that I can get it out there, and people can tell me whether it's actually novel or not.

If this algorithm is really as good as initial tests show it to be, my belief that everything is fundamentally simple with the right insights...will just be cemented into my mind beyond any shadow of a doubt. If concave-concave collision detection is actually trivial...then can there be anything that isn't?!?

## Computing Shield Shapes

January 6, 2013

Today I decided to give shields a shot. The question of how to compute the shield mesh has been in my mind for quite some time. At first I thought about not having any shape at all, and simply using a distance offset when raytracing weaponry to simulate shield hits (the result would be an implicitly-expanded mesh). But then I wouldn't be able to do any sort of global graphics effect, which would be a shame...shields definitely lend themselves to cool graphics, and I like cool graphics!

To that end, I spent part of the day trying to compute shield meshes for arbitrary shapes. The first thing that comes to your mind is probably a convex hull, or some smooth and expanded version thereof. I'm sure that would work just fine, but it's a little heavy-handed for my tastes, so I wanted to try a more intuitive solution.

The first attempt was really an obvious one: point-repulsion. Create an expanded sphere that surrounds the ship, then try to collapse it, while applying repulsion forces from the ship's vertices. In theory it sounds pretty great, and intuition tells me that the resulting shape would be lovely and smooth. We could also apply spring forces between edges on the sphere to ensure a nice vertex distribution in the result. Unfortunately, this method failed pretty miserably, mainly due to computation time. It's expensive. To collapse everything nicely, you need a lot of iterations, and each iteration is very expensive without acceleration. Here's the result after only a few iterations:

Smooth, but quite ugly, and doesn't really capture the shape of the ship. To make it better, you'd want to decrease the repulsion force. But then you'd have to integrate very slowly, otherwise you end up missing the vertices and collapsing the shield into the ship, resulting in a shield that won't do a very good job of shielding! I saw this happen a lot with this model. Again, I think it's a fantastic idea in theory, and would probably provide the highest quality of mesh, but it's not feasible for real-time...it just needs too many iterations and a very small integration step.

Next thought: raytrace it. Take the sphere, raytrace to the center of the ship's bounding box, and add some offset to expand the result. This is much faster, since it's not an iterative method. Unfortunately, the results are still mediocre:

Better, since it definitely captures the shape of the ship in only one iteration. Sadly, it's not very smooth. To make it smooth, you could smooth out the mesh with a mass-spring system postprocess, but then you're going back to iterative. However, that's not the main problem - like the last method, this one often results in shields that are collapsed into the ship, especially when the ship is thin and has long pieces sticking out (like engines). We just can't accept degenerate shields, as they'll not only look terrible, but will prevent your shields from doing their job!

Ok, third time's the charm, right? Right!!!

This one's fast, smooth, non-iterative, and, most importantly, non-degenerate! Great! What's the algorithm? Well...can't give all the Limit Theory implementation away! But, if you look very carefully at the shield vertex distribution, you can probably guess it 😉

## Avoiding GPU Timeout via Dynamic Load Balancing

December 18, 2012

There's really nothing better in life than when you conceive of something, imagine that it's probably going to be quite difficult to do, then end up getting it to work in five minutes. Seriously. Best feeling ever. It's what just happened with one of the core pieces of the Limit Theory Engine: the GPU texture generator.

I've always known that the engine would crash on under-powered GPUs if it tried to generate really high-res textures (for example, 2048 resolution skyboxes), because the job would take too long, and Windows would think that the driver crashed and would kill it and restart. I believe the default timeout is 2 seconds. So if your texture can't generate in 2 seconds, you're dead. Needless to say, a really high-quality, procedural skybox needs more than 2 seconds to generate on integrated chips. So to alleviate the problem, I split the job up into several pieces (rendering only a certain portion of the texture at once), forcing a GPU sync (glFinish) after each one. In theory, this ensures that each piece takes less than 2 seconds to generate, so Windows doesn't get angry. But it's inefficient to split up the job, as it increases overhead significantly. For powerful GPUs that can generate it all in one go, you don't want to split the job at all.

The solution? Very simple, really: define job size n. Initialize it to 1. Now, generate n columns of the texture and time the operation. Use the stencil test to effortlessly select n columns of the texture for rendering without having to do tricky quad math. Make sure to force GPU sync after each job (glFinish). Now, use the elapsed time to adjust job size, then repeat until all columns have been generated. It's a no-brainer, really, but one might expect that unexpected complexities creep up in implementation.

Nope. Worked the first time. I'm now able to generate 2048 skyboxes on my laptop without crashing! The scheme, in theory, will never crash, because it uses actual timings to adjust the load on the GPU. For now, I've set the target time at 1 second, just to be safe. So my powerful desktop machine will pretty much do the whole thing in one job (after it times the initial size-1 job), while my laptop determines that 609 columns is optimal, so uses about 4 jobs per cubemap face.

Now, it scares me a bit to think that the first job, which is only 1 column wide, would be used to make an initial guess at the maximal load. You might imagine that if the timer accuracy isn't great, we could end up overestimating the GPU's capability and crashing. So it might be wise to implement a gradual scale-up, such that the job size can't change too dramatically during the first few iterations. At first, I did this, but have yet to have the naive scheme crash on me, so I backed off and am happily using the naive scheme for now.

## Procedural Space Stations II

December 13, 2012

A little bit over a year ago, I spent an hour or so whipping up a really silly/stupid algorithm for building radial space stations. Over the past few days, I've been cooking up some more advanced geometric technology to use in my second attempt. Today, I started building a new algorithm for procedural space stations. I'm very pleased to say that my results in one day are a whole lot better than they were last year!! Good to know I'm making progress 🙂

Right now, the procedural building blocks that the algorithm is using are extremely simple, just beveled boxes and a radial wheel-like thing (borrowing from my first algorithm). But this repertoire can obviously be greatly expanded, and the algorithm is able to automatically analyze the pieces and figure out how to build a structure out of them! It's pretty cool stuff...I must say, this new algorithm is simple but elegant, just the way I like them 🙂

Here's an example schematic:

And here's me exploring it in-game:

Again, notice how structurally simple everything is. There's definitely nothing fancy going on here yet, but you can already see some nice visuals emerging...it feels quite cool to be able to duck in and out of those spokes and fly around this superstructure! I can't wait to see how good this algorithm can get. I'm actually way more excited about these results than the ship results! Although, in the end, I think the algorithms can be unified...because a ship and a space station aren't really that different, are they?

## Stable Exponential Smoothing Under Variable Framerate

November 26, 2012

The title might sound fancy, but this post is about a very practical thing that you might encounter quite often. A great way to smooth any kind of changing feature in your game is to use linear interpolation each frame to interpolate between the current value and the target value, like so:

myVar = lerp(myVar, target, 0.5)

It's so easy! And it creates a very nice, smooth transition between the old value of myVar and the new value, target. If you think about it for a bit, you'll realize that this thing is somehow exponential, because the rate at which myVar approaches target is obviously dependent on the difference between myVar and target. Hence the "exponential smoothing" in the title.

Now, there's a bit of a problem: where did I get the value 0.5? I could have just played with the constant until I was pleased with the rate of smoothing. But there's a bigger problem than arbitrary constants here: this code is highly framerate-dependent. Why? Well, if we're running at 60fps, then this interpolation happens 60 times in a second, which will obviously leave myVar a lot closer to target than if we were running at 10fps and the interpolation happened 10 times. The end result? You will notice things happening faster when your framerate is higher, even if you're using the proper time differential for physics. The problem is that we didn't account for the time differential when we wrote this function.

I'm not going to put the full derivation in this post, since my blog doesn't support Latex yet...but I will give you the answer to this problem:

myVar = lerp(myVar, target, 1. - pow(.5, dt))

What exactly is going on here? You can see that I'm now using dt, which is the time differential between frames (hence, we are taking into account framerate). But what of the pow and the .5? It turns out that the .5 now has a very precise meaning - it means that for every second that passes, .5 of the difference between myVar and target will decay. So after 3 seconds, myVar will have come 87.5% of the way to target (measuring from the initial value of myVar). This gives you a very easy way to think about the constant!

Not only is this a great little math trick to have up your sleeves, but it turns out that it is intimately related to the solution for making linear drag framerate-independent, which is something that you will not achieve if you just do the obvious discrete physics. I'll probably post on that later 🙂

## Limit Theory

November 20, 2012

And here we have it, the last few months of my life, distilled into one video. Now, what I'd like to do is turn that into a full year and beyond of work...who knows, maybe people will be as excited about procedural space games as I am? We'll find out 🙂