Posts In: particle system

GPU Particle System

August 25, 2012 General News 0 Comments

Some recent profiling has made it very clear to me that a CPU-side particle system just isn't going to work out for large numbers of particles. It's more of a bottleneck than I anticipated, just because of the fact that it has to ship a vertex/index array to the GPU each frame.

Although there's no real conceptual difficulty in a particle system on the GPU, the real trick is getting it to run fast (no surprises there). With a vanilla GPU particle system, if you want to support 100K particles, then you have to draw the vertices for 100K particles each and every frame, barring any geometry shader fanciness. Not cool. Naturally, you can check in the vertex shader to see if the particle is active or not, and if it isn't, then you cull it. This saves pixel processing. Still, that's a lot of vertices burning through the GPU. When I got this far in implementing my GPU particle system, it wasn't doing much better than the CPU one.

Well, wouldn't it be great if we could draw only the active particle vertices? But that would require having all active particles coalesced in a linear region of the texture. Hmmm...defragmenter, perhaps? Wait a minute...there's nothing stopping us from using stack-based allocation! By that, I mean that we always allocate the next free block, and when we go to free a particle, we swap it with the last one in use. This is a classic memory scheme for maintaining a tightly-packed linear array. And it works perfectly in this situation, because the particles do not care about order! Using this strategy, we can specify the exact size of the active vertex buffer, so that no inactive particle vertices are drawn in the draw call.

To break up all this text, here's a somewhat cheesy picture of loads of particles! It'll look a lot better once I apply texture to them.

Another hidden benefit of stack-based allocation is that we can now save a LOT of time on particle allocation by coalescing allocations per frame. Without stack-based allocation, you'll have to make several calls to TexSubImage2D on particle creation. TexSubImage2D is expensive, and will be a HUGE bottleneck if called for each particle creation (trust me, I've profiled it). In fact, it will quickly become the bottleneck of the whole system. However, if we know that particles are packed tightly in a stack, we can defer the allocation to a once-per-frame coalesced allocation, uploading whole rows at a time to the data textures. This is actually easy and intuitive to implement, but yields massive savings on TexSubImage2D calls! With a 256x256 particle texture, you can now perform (in theory) 256 allocations at once! A little probability theory tells us that, on average, we'll make 128 times less calls to TexSubImage2D. Another big win for stack allocation!

One final detail - when freeing a particle and performing the swap, you need to be able to swap texture data. I couldn't find a way to read a subregion of a texture in OGL, so I opted to shadow the GPU data with a CPU copy. One might declare that doing so defeats the purpose of having a GPU PS, but that's not true - the real saving (at least for my application) is the draw call. Having the vertex/index data already on the GPU is a huge saving. To avoid shadowing, you could use a shader and pixel-sized quad (or scissor test) to copy the data. But that's messier and requires double-buffering. So I opted for the easy way. As always, to back my decision, I profiled, and indeed found that keeping a CPU shadow of the data is virtually free. Even with having to perform Euler integration on both the CPU and the GPU, the cost is nowhere close to the cost of drawing the particles, which at this point should be the bottleneck.

Now, for something totally different, here's a (questionably) pretty picture of my dumb AI friends and I flying around planet Roldoni. Yes, the terrain looks terrible. But now that I've turned my attention back to planetary surfaces, it won't stay that way for long. I've also ported my sky shader over from the XDX engine, and it's still looking nice.

PS ~ A final note : you might think that you can save even more speed on particle allocation if you detect the case in which multiple rows can be uploaded at a time (i.e., you have a massive number of particles to upload in one frame), so you coalesce them into a single TexSubImage2D call covering a large rectangle. It turns out that this optimization really doesn't save much time at all, so there's no need to go that far unless you regularly expect to upload hundreds of thousands of particles (I was testing with allocations of ~10,000 at a time, and the saving was not noticeable). Any way you slice it, the vertex cache is going to get hit really hard if you allocate a huge number of new particles at once, and that, if anything, is going to cause a stall - not the uploading.

AI, Particles, Normal Maps

August 23, 2012 General News 0 Comments

It's been a great week for progress. Using the path finding algorithm discussed in the last post, I've implemented some rudimentary AI players that simply navigate to a random planet using the trade lane network, then select a new destination and repeat upon arrival. Although simple, the AI sure livens up the bleak little star system. I also implemented a basic, CPU-side particle system for creating engine trails and explosions. Finally, I've implemented GPU-based normal map generation for heightmaps, and used it to make planets and asteroids look a whole lot better.

Here's the new and improved world. The streaks of yellow are AI players up ahead that have been instructed to pick a random asteroid and orbit it. I'm following them as they journey towards their respective rocks.

And here you can see me chilling with my AI entourage (they were told to protect me). The engine trails show that the AI paths are always smooth, thanks to some nice math that controls the AI movement. They look very natural while flying around me. Now all we need are some enemies for them to attack!

It's been a really exciting week for this project, and I can't wait to see what happens when I start playing with enemies/attacking, police, real traders, etc! Oh, and I guess someday I should stop being lazy and fix the fact that the skydome is mirrored vertically...kinda an immersion-breaker.

It Looks Like Chaos, But...

January 23, 2012 Algorithmic Art 1 Comment

Trace any individual element, and you see the order.

Sounds deep. It's really not.

Procedural Nebulae

January 19, 2012 Algorithmic Art 0 Comments

As a follow-up to the last post, I just couldn't wait to see what I could do in terms of procedural nebulae with the particle system. The initial results aren't jaw-dropping, but they do beat previous attempts by a fair amount. The particle-sim nebulae definitely have a more volumetric feel to them (not surprising, considering that they are volumetric, unlike the 2D-noise-based previous attempts).

I actually did a bit of cheating in this shot: I've composited the original screenshot with several blurred version of itself. In principle, this is easy enough to do on the GPU with textures. I just didn't want to go write that code in the simulator, since I want to keep it as a real-time application. There's still a long way to go in the procedural nebula department, but I believe this represents a step in the right direction.

Complex Field Simulator

January 19, 2012 Algorithmic Art 0 Comments

I've been having a bit of fun with particles and the GUI engine lately.  I experimented with a basic particle simulator in which particles are subjected to a force, where the force's vector field is defined by functions of complex variables.  My rational behind this was that equations in complex variables have an uncanny ability to produce interesting behavior even with simple operations.  Indeed, I witnessed some pretty cool results!  Thanks to the new-and-improved GUI engine, I was able to control loads of parameters of the simulation in real time, and had quite a bit of fun playing with the math 🙂 Of course, particles do have a tendency to make everything seem cooler than it really is!

I'd call the experiment a success both in terms of it being the first real test of the new GUI engine, as well as the first particle system implemented in XDX.  Oh, and the math was a success as well 😀 I'm thinking this kind of work may even lead to realistic procedural nebula rendering at some point.  That could be exciting.