Since last week I’ve continued working on the steering behaviors to see if I could improve their performance. The initial demo I shared with you was able to run about 40 boids (bird + android cuz it’s flocking and a computer) at once. I’ve since tried a couple micro and macro optimizations to varying degrees of success and thought I’d share them with you. Check out the results, updated source code, and some explanation below.
First off the numbers. Originally I was able to run the flocking test case (now case #3) with forty boids. Today it’s running comfortably with one hundred boids! Even more exciting, for the Grid and Dithering tests I am able to run the test case with four hundred boids! For the grid test case I am able to maintain a steady 30 FPS on my macbook. I’d love to hear what FPS you’re seeing and what type of machine you have in the comments.
In order to reach this number I experimented with a number of optimizations. I started off with two micro optimizations which let me comfortably run one hundred boids at once in the flocking test. The first was to combine the three sub-behaviors (alignment, separation, and cohesion) into a single behavior so that the final steering force is calculated in a single loop instead of three. The second was to reduce the number of new points created when calculating the steering force. It turns out that it’s expensive to create new objects. As a result whenever possible I did vector math in place.
Of course, one hundred boids is good but I wanted to create a massive swarm effect. To do this I needed to attack the complexity of the steering algorithm from a different angle. I had two ideas about how to solve the problem. The first was to update the boids in a schedule. For example, instead of trying to update all four hundred boids each T milliseconds I would try to update one hundred boids each T / 4 milliseconds. Obviously the net result would be the same but by spreading the load more evenly I thought it might be possible to improve the experience. This culminated in the dithering test which updates boids in batches every hundred milliseconds (so in this case there are four hundred milliseconds between updates for a batch). I find that the result is a little more performant but also suffers from the sparser updates. Simply put the boids no longer appear to be as in sync. That said it may be possible to tweak the parameters driving the test in order to achieve a better effect.
After trying the dithering approach I wanted to try to break down the complexity of the problem. The originally flocking algorithm has O(n^2) complexity due to the inner loop which traverses every other boid in order to determine the steering force. However at any given time the steering algorithm only cares about a subset of the boids — the nearby ones — so I wanted to find a way to reduce the complexity to O(n * c). From testing I knew that for reasonable values of c (read ~20) the performance was acceptable. In order to accomplish this I modified the Crash library slightly (changing it to be a loose grid) and then combined it with the steering algorithm. Now, instead of passing the entire list of boids to each boid in order to determine the correct steering force I pass only the set of neighboring boids.
Alright, well that turned into a rather lengthy look at the latest changes. Hopefully you’ve found it interesting or even useful for a problem you’re working on. Without further ado here’s the latest version of the code:
Hello, my name is Alex Schearer. I grew up in New York and currently live in Seattle.
One Comment
Man it runs super smooth. my fps is about 70-80 on a quadcore with 8 gigs gf9800. Im currently rewriting my java flockings to as3, at first i was totally shocked how slow it ran in as, but seeing this…BAM PEWPEWPEW! Keep up the good work;)