Ambient Occlusion II

Last time I talked about AO, but I left out a teensy little detail: although per-vertex AO is very easy to compute, and also extremely fast to render, it's extremely slow to compute during the pre-process.  To get high-quality, noise-free AO requires somewhere in the vicinity of 1000 samples of the density field per vertex.  Not exactly a cheap operation!  On the CPU, it quickly becomes prohibitively expensive as either the complexity of the density field or the resolution of the mesh increase.

Today, I moved the computation to the GPU, and have once again been blown away by the computational abilities of modern GPUs.  Now that I have every piece of the mesh computation process - the field evaluation, the gradient evaluation, and, finally, the AO evaluation - running on the GPU, it's simply mind-blowing how high I can push the quality of the mesh and the complexity of the density field.

Here's a mesh consisting of 50 unioned and subtracted round boxes (round boxes are very expensive compared to sharp-edged ones), contoured on a grid of 300 x 300 x 300 (that's an insane level of detail, FYI), resulting in half a million vertices, each of which takes 1024 AO samples.  The GPU performs this work in ~3 seconds.  Incredible.

But that's not even the most amazing part.  The amazing thing is that, after profiling, it would seem that the GPU actually takes less than 1 second to complete this work.  It is OpenGL's shader compiler (which, of course, is running on the CPU) that takes the majority of the time.  This isn't too surprising, as the shaders to compute these things are massive, since I actually bake the field equation into the shader.  I'm sure GL spends a long time analyzing and optimizing the equation, which is a good thing, because the shader runs absurdly fast.

Unfortunately, this brings up a few unwanted issues - I now have a CPU bottleneck that can't be easily offloaded to another thread.  Since the bottleneck is inside GL, I will need to explore multithreaded GL contexts in order to compile the shaders in another thread while the game runs, because I can't have the game stalling every time a new asset enters the region and the corresponding shaders have to be compiled.  Sadly, this probably won't be too easy, but I'm sure I'll learn a lot...!

Another, less-tractable problem is that the shader compiler flat-out crashes after a certain field complexity is hit.  I will need to explore this some more.  It might just be the fact that my field function dumps an incredibly-ugly equation into the shader (it's literally a single line, with hundreds of functions wrapped together).  Perhaps breaking it up will prevent the crash.  Or maybe I've hit some kind of hard limit on the allowed complexity of pixel shaders.  If that's the case, I could explore a solution that uploads the equation as a texture, and create a shader that understands how to parse an equation from a texture.  But that would no doubt be significantly slower than baking the equation into the shader...probably at least an order of magnitude slower :/

But for now, I will allow myself to be happy with these results, and am most definitely looking forward to working on ships again with this technology in hand!

Ambient Occlusion

Finally, just like every other modern game, Limit Theory now has highly-exaggerated ambient occlusion.  At least mine isn't screen-space, though.  A while back I dropped SSAO because I was tired of it.  It's expensive, requires absurd levels of tweakery just to get right, varies with distance to the object, etc.  It's bad, let's face it.  Per-vertex AO probably isn't an option for many games due to tessellation level on the mesh.  For per-vertex to work, you need a consistent level of tessellation across the whole surface.  Hey, guess which really stupid method of creating surfaces provides just that?  Yep, uniform-grid contouring of distance fields.  Marching cubes, Dual Contouring, you name it.  I used to think that the uniform level of tessellation, even on really flat surfaces, was a big problem.  Now I realize that we can actually use it to our advantage by storing useful information at the vertex level!

Oh, another bonus: occlusion factor is almost too easy to compute with a scalar field.  As in, the function that I wrote to take a mesh and bake AO data in is literally 15 lines of very-readable, simple code.  Man, I love fields

The result is beautiful! (At least I think.)

I love this kind of occlusion, because, unlike SSAO, it does not vary with distance.  So you can zoom in as far as you like, and those darkly-shaded corners are still dark (a result that is impossible with screen-space ambient occlusion techniques).  Note that I exaggerated it a lot for this shot, it probably won't be that heavy in the game!

Anyway, I'm very excited about using this in LT.  I can't wait to see how good ships will look once I get the ship generator converted to use scalar fields rather than polygons!!!

PS ~ You might be concerned about performance given the high tessellation of surfaces because of the large number of vertices that methods like MC and DC output.  Indeed, at first it seems like a valid concern.  But then again, vertices are extremely cheap these days.  Vertex shaders are significantly lighter than fragment shaders, and also get run significantly less frequently.  This means that in almost every modern game, you will find yourself very much bottlenecked by the fragment shader.  Furthermore, with field methods, you get perfect LOD meshes for free, you just turn down the contouring resolution.  So even if you're putting more vertices on screen, you can easily control this by setting up LOD levels.  With analytic normals, even low-resolution meshes tend to look quite good!  So I don't think vertex density is something to be too worried about.

Distance Fields for Smoothed, Convex Polyhedra

Today was a very fun math day, and I want to share a bit of my experimentation.  Of all the things I did today, one of the most useful was to come up with a formula for the distance field of a convex polyhedron, defined by some set of planes.  I also implemented a smoothed version, which is the one I'll post, because it's quite interesting.

Disclaimer: I'm calling it a distance field because it looks like one and because I don't see many people properly differentiate (teehee?) between distance fields and things that look like distance fields but aren't actually.  So I'll join that club.  The 0-isosurface is the convex polyhedron (or smoothed version thereof), and it does exhibit a nice differentiable behavior, but I don't think that the scalar value actually represents a distance.  Certainly not a Euclidean one if it does.

Suppose we have a set of $k$ planes defined by normals ${n_1, ..., n_k}$ and d-values (the dot product of some point on the plane with the plane normal) ${d_1, ..., d_k}$.  Define the shape factor, $p$, to represent how sharp the edges of the polyhedron are, where $p = 2$ corresponds to sphere-like edges, and $p = \infty$ corresponds to hard edges.  This definition is a reference to the p-norm, to which the following formula is closely related.  The field value at a point $x$ is given by:

Just so we're clear, that inner bit is the dot product of the position in the field $x$ and $i$th plane's normal, $n_i$. Intuitively, you can see that, with the $\infty$-norm, this gives the maximal (scaled) distance to any plane - 1, which will be 0 if we are on the polyhedron, and will fall off with the closest plane's distance.  This corresponds to our idea of what should happen for a hard-edged polyhedron.  For other norms, you can verify that it gives interesting, smooth results.

Now comes the real work.  This is great, but we also want analytic gradients for our fields, so that we can compute perfect normals for the surfaces rather than using ugly central differences.  Perfect normals go a long way towards making the polygonized result look good.  It's a bit painful, but if you just remember your multivariable calculus (I'll admit I struggled with trying to remember it for quite a while), it's not too terrible.  It's just application of a bunch of  rules.  I'm going to spare the derivation and give the result:

The left side is a scalar factor, closely related to the field value.  The right side is the actual vector component, which is just a sum of scaled plane normals.  All of it can be easily implemented in a single loop over the planes!

With these formulas in hand, you can create lots of cool, smooth shapes. Boxes, discrete cylinders, prisms, etc...or just random convex shapes!

If you're wondering how I came up with these formulas, it was by studying and generalizing the field formula for a smooth box, which is pretty obvious. This polyhedron formula is less obvious, in my opinion, but is still basically just a straightforward generalization of the box formula.

PS ~ As usual, I invite any lurking mathies to speak up if I've made a blunder

PPS ~ I hope you like the Latex plugin that I installed, solely for the purpose of this post..

Procedural Nebulae III

What an exciting night. If you search the tag "procedural nebulae" on this blog, you'll find no shortage of posts. Indeed, I've been fascinated with nebulae for as long as I've been a graphics programmer, and have been trying to crack them procedurally for as long as I've known about procedural generation. My attempts have always been spirited, but lackluster. For Limit Theory, I once again revisited the topic, starting from scratch and trying to come up with some new ways to attack the problem. The solution was fairly good, and certainly my best so far (well, not counting the path-traced nebula, which was cheating). But, as you can see from the screenshots, they really don't look that much like nebulae. I've been quite "content" with this solution...at least, content enough to ship the game with it. But not truly satisfied.

Tonight, I am truly satisfied. I allotted a few hours for revisiting the nebula code, just to see if I could push the quality a little further. With a few new creative ideas and a wee bit heavier math (but not much), I've succeeded. They aren't perfect, and, of course, they're still obviously not real. But they are far more real than any of my previous work, and I would be more than happy if this was the algorithm that shipped with the game.

The nice thing is that I've only just started to explore the possibilities of this new approach, so I'm sure I'll be able to push it even a bit further than this with a few more hours of tweaking!

Triplanar Bumpmapping

The first article in the latest GPU gems is brilliant for so many reasons. It's no wonder they chose it for the first article. Not only does Ryan basically blow apart all previous heightmap-based work as far as terrain generation goes, but, on the side, he slips in an elegant (and blindingly-simple) little method for texturing arbitrary surfaces that don't seem easily-texturable. To cap it off, he mentions how to generate tangent bases for performing good-looking bumpmapping on the resultant surface. He could have stopped after GPU terrain generation with geometry shaders and still had a killer article...but I'm oh so glad that he didn't

Incidentally, I missed the bit about tangent bases on the first few reads, which is why I've been doing them wrong until now. In my defense, there's literally only one paragraph dedicated to the trick. But it's so, so simple: compute an approximate tangent basis separately for each projection, and blend the resulting bump information. How? Easy...you know the axis of the normal for each plane, so the intuitive thing to do is rotate the normal 90 degrees about the other two axes to obtain two approximate tangents. In practice, this means we can just define:

vec3 tanX = vec3( normal.x, -normal.z, normal.y); vec3 tanY = vec3( normal.z, normal.y, -normal.x); vec3 tanZ = vec3(-normal.y, normal.x, normal.z);

Then for the x projection, you use tanY and tanZ as your tangent frame, tanX and tanZ for y projection, etc. Now, if you stare at it a bit, you may quickly realize why it's a hack: these tangent vectors are not necessarily orthogonal to the normal!! Not much of a tangent plane if it's not..tangent -_- Only in the case of a perfectly-axis-aligned normal do we get an actually-tangent tangent frame. Otherwise, we get something that is not-tangent, where the not-tangency of each tangent vector is proportional to the square of the normal's component along that axis. Let me try to make an even more precise run-on sentence: we get a frame where the deviation of the cosine of each tangent vector to a true tangent vector is proportional to the square of the normal's component along the corresponding axis of rotation. Pull that one out at a dinner party and see how many twitter followers you acquire. But what I was trying to say, intuitively, is that, if you think about the rotations we performed, they only yield orthogonal vectors when the vector to be rotated is orthogonal to the axis of rotation. Right.

Is this a problem? Not really. In practice it looks good. If you want to get fancy, I guess you could do G-S orthogonalization or something of that nature. Better yet, you can probably come up with a compact little closed-form answer for each tangent frame based on axis of projection (because if we restrict to each hemisphere, we can chart it with a continuous frame...diff geom experts correct me if I am mistaken). But this looks good to me! And it's so simple!!

Thanks, NVIDIA, for an excellent article in an excellent book

Fast, Accurate Collision Detection for Arbitrary Triangle Meshes

This is probably the single most exciting post in my three years of blogging. It's definitely what I feel to be the biggest technical achievement, in terms of what people usually think of as being technical achievements. At first I wrote an exceedingly long post and a mini-history of my experiences with collision detection, and the development of this new algorithm. But it was too long and boring, so I'm just going to get to the facts.

Tonight, frustrated by the poor performance of Bullet's concave-concave solver (GIMPACT), I've developed a general-purpose intersection algorithm that works for any kind of mesh (convex, concave, degenerate, ... pretty much whatever you want), allows any kind of world transform (including scaling, shearing, ... pretty much whatever you want), and is remarkably fast. Maybe it's not that great, maybe everyone already knows about it...but I feel like I would have heard about it by now. I mean, it takes less than 200 lines to implement this incredibly simple, powerful algorithm that blows GIMPACT out of the water. Am I missing something?? Maybe so. I need to do some more tests to make sure I'm not being premature, but initial tests show that it takes an absurdly small amount of time, even for insane meshes. For example, I tried checking two full-quality asteroids against one another...a test that should be prohibitively expensive (they're concave and roughly 50000 polys)...and it takes almost no time at all (concrete numbers to come in the future). Wow. I can run full physics on my full meshes now...! And best of all, I can quit worrying about this convexity nonsense, and also drop all the hacks that I had to use to get other physics engines to cooperate with scaling.

I've never been more excited about an algorithm than I am tonight. This may mean ditching the 3rd-party physics library entirely, which would be a dream come true!!! After three years, I've finally gotten collision detection right. I hope to write something on this algorithm soon, so that I can get it out there, and people can tell me whether it's actually novel or not.

If this algorithm is really as good as initial tests show it to be, my belief that everything is fundamentally simple with the right insights...will just be cemented into my mind beyond any shadow of a doubt. If concave-concave collision detection is actually trivial...then can there be anything that isn't?!?

Computing Shield Shapes

Today I decided to give shields a shot. The question of how to compute the shield mesh has been in my mind for quite some time. At first I thought about not having any shape at all, and simply using a distance offset when raytracing weaponry to simulate shield hits (the result would be an implicitly-expanded mesh). But then I wouldn't be able to do any sort of global graphics effect, which would be a shame...shields definitely lend themselves to cool graphics, and I like cool graphics!

To that end, I spent part of the day trying to compute shield meshes for arbitrary shapes. The first thing that comes to your mind is probably a convex hull, or some smooth and expanded version thereof. I'm sure that would work just fine, but it's a little heavy-handed for my tastes, so I wanted to try a more intuitive solution.

The first attempt was really an obvious one: point-repulsion. Create an expanded sphere that surrounds the ship, then try to collapse it, while applying repulsion forces from the ship's vertices. In theory it sounds pretty great, and intuition tells me that the resulting shape would be lovely and smooth. We could also apply spring forces between edges on the sphere to ensure a nice vertex distribution in the result. Unfortunately, this method failed pretty miserably, mainly due to computation time. It's expensive. To collapse everything nicely, you need a lot of iterations, and each iteration is very expensive without acceleration. Here's the result after only a few iterations:

Smooth, but quite ugly, and doesn't really capture the shape of the ship. To make it better, you'd want to decrease the repulsion force. But then you'd have to integrate very slowly, otherwise you end up missing the vertices and collapsing the shield into the ship, resulting in a shield that won't do a very good job of shielding! I saw this happen a lot with this model. Again, I think it's a fantastic idea in theory, and would probably provide the highest quality of mesh, but it's not feasible for real-time...it just needs too many iterations and a very small integration step.

Next thought: raytrace it. Take the sphere, raytrace to the center of the ship's bounding box, and add some offset to expand the result. This is much faster, since it's not an iterative method. Unfortunately, the results are still mediocre:

Better, since it definitely captures the shape of the ship in only one iteration. Sadly, it's not very smooth. To make it smooth, you could smooth out the mesh with a mass-spring system postprocess, but then you're going back to iterative. However, that's not the main problem - like the last method, this one often results in shields that are collapsed into the ship, especially when the ship is thin and has long pieces sticking out (like engines). We just can't accept degenerate shields, as they'll not only look terrible, but will prevent your shields from doing their job!

Ok, third time's the charm, right? Right!!!

This one's fast, smooth, non-iterative, and, most importantly, non-degenerate! Great! What's the algorithm? Well...can't give all the Limit Theory implementation away! But, if you look very carefully at the shield vertex distribution, you can probably guess it