Ambient Occlusion II

January 25, 2013 General News 7 Comments

Last time I talked about AO, but I left out a teensy little detail: although per-vertex AO is very easy to compute, and also extremely fast to render, it's extremely slow to compute during the pre-process.  To get high-quality, noise-free AO requires somewhere in the vicinity of 1000 samples of the density field per vertex.  Not exactly a cheap operation!  On the CPU, it quickly becomes prohibitively expensive as either the complexity of the density field or the resolution of the mesh increase.

Today, I moved the computation to the GPU, and have once again been blown away by the computational abilities of modern GPUs.  Now that I have every piece of the mesh computation process - the field evaluation, the gradient evaluation, and, finally, the AO evaluation - running on the GPU, it's simply mind-blowing how high I can push the quality of the mesh and the complexity of the density field.

Here's a mesh consisting of 50 unioned and subtracted round boxes (round boxes are very expensive compared to sharp-edged ones), contoured on a grid of 300 x 300 x 300 (that's an insane level of detail, FYI), resulting in half a million vertices, each of which takes 1024 AO samples.  The GPU performs this work in ~3 seconds.  Incredible.

High-Quality, Per-Vertex AO Computed on the GPU

 

But that's not even the most amazing part.  The amazing thing is that, after profiling, it would seem that the GPU actually takes less than 1 second to complete this work.  It is OpenGL's shader compiler (which, of course, is running on the CPU) that takes the majority of the time.  This isn't too surprising, as the shaders to compute these things are massive, since I actually bake the field equation into the shader.  I'm sure GL spends a long time analyzing and optimizing the equation, which is a good thing, because the shader runs absurdly fast.

Unfortunately, this brings up a few unwanted issues - I now have a CPU bottleneck that can't be easily offloaded to another thread.  Since the bottleneck is inside GL, I will need to explore multithreaded GL contexts in order to compile the shaders in another thread while the game runs, because I can't have the game stalling every time a new asset enters the region and the corresponding shaders have to be compiled.  Sadly, this probably won't be too easy, but I'm sure I'll learn a lot...!

Another, less-tractable problem is that the shader compiler flat-out crashes after a certain field complexity is hit.  I will need to explore this some more.  It might just be the fact that my field function dumps an incredibly-ugly equation into the shader (it's literally a single line, with hundreds of functions wrapped together).  Perhaps breaking it up will prevent the crash.  Or maybe I've hit some kind of hard limit on the allowed complexity of pixel shaders.  If that's the case, I could explore a solution that uploads the equation as a texture, and create a shader that understands how to parse an equation from a texture.  But that would no doubt be significantly slower than baking the equation into the shader...probably at least an order of magnitude slower :/

But for now, I will allow myself to be happy with these results, and am most definitely looking forward to working on ships again with this technology in hand!