Cargo Space devlog #5 - Adventures in physics engine development with bevy_ecs

I started filling in the gaps in my home-made physics implementation. In this post I go painstakingly detailed about my adventures in implementing one-way platforms and prematurely optimizing broadphase collision detection. Bonus yak-shaving sections on bevy_screen_diagnostics, and bevy_sparse_grid_2d

Ignore common sense and make a physics engine?

Why on earth am I making a physics engine when there is Rapier?

As I've now experienced first hand, and we'll see in the rest of this post, making physics can be a deep rabbit hole, with so many yaks to shave you never get out.

To be perfectly honest, the real answer is probably: "because I want to".

Although I do have a couple of half-good reasons:

1. Determinism and rollback

While rapier_2d itself has a deterministic mode, and its state can be serialized and restored, bevy_rapier_2d does not (yet) provide good support for ensuring determinism and convenient rollback. Christopher Corley has made a really promising deterministic example with GGRS and Rapier, but it still requires patching bevy_rapier.

I'm sure Christopher's work will make its way upstream eventually, and it would probably be better for the Bevy community if I spent my efforts helping out the bevy_rapier rollback story better.

Still, it feels like even if I were try to use Rapier, I would have to spend some time tinkering with physics engine anyway.

2. 2D platformers and their "non-physical" hacks

When you look for advice for how to make a 2D platformer that feels good, people will often tell you to not use a physics engine.

To be fair: Some people have made good platformers with traditional physics engines. For instance Box2D rigidbodies are used in Hollow Knight, and that feels great, so it's definitely doable. So again, on its own this is not the best reason to roll my own physics.

However, in general, it seems like if you try to make the player a rigid body in a traditional impulse-based physics engine, you will fight the engine a lot.

I'm also somewhat scared of rapier's warning that modifying body positions may result in odd behaviors. And force/impulse-based physics engines do have a reputation of exploding in your face when you do weird things.

Put another way, it kind of seems safer to implement something I understand completely myself, so I can make the adjustments necessary to support the weird features and behavior I want.

3. A physics engine implemented with bevy_ecs is cool

I just like the idea, and it's a familiar rabbit whole I've been down before

TL;DR: I'm making a physics engine because NIH, if you will.

Ship-to-asteroid collisions

Ship-to-asteroid collisions turned out to be tricker than I expected. The upside was that I got to make lots of punny commits and unit test about "corner cases".

What I implemented was basically an AABB (axis-aligned box) to tilemap collision check with contact normal. It turned out to be particularly tricky on "inner" corners. Here is my initial naive attempt at writing the collision checker:

As you can see, it teleports an entire tile when it gets into the corner. The problem is that I only considered collision on each axis individually. That meant if I had a collision on both axes at once it would just choose the "best" one and spit out the normal and penetration for that axis alone.

So being a pedantic yak-shaver, I wrote a unit test for this:

fn tilemap_aabb_inner_corner() {
    let tilemap_size = UVec2::splat(4).into();
    let mut tile_storage = TileStorage::empty(tilemap_size);

    //  3
    // 12
    tile_storage.set(&TilePos::new(0, 0), Entity::from_raw(1));
    tile_storage.set(&TilePos::new(1, 0), Entity::from_raw(2));
    tile_storage.set(&TilePos::new(1, 1), Entity::from_raw(3));

    let aabb = Aabb {
        min: vec2(0.0, 0.9),
        max: vec2(1.1, 1.9),
    let c = tilemap_aabb_contact(&tile_storage, aabb).unwrap();

    // the exact thing that happens isn't that important, just make sure there isn't a crazy high response!
    assert!(c.penetration < 0.2);
    // and that the normal isn't pointing in the wrong direction
    assert!(c.normal.x >= 0.);
    assert!(c.normal.y <= 0.);

The fix, however wasn't particularly pretty. When there is a collision on both axes at once, I simply try to check if the collision could instead be resolved by moving the box to the closest corner instead. While it's not pretty, it works well in practice, and I have a test if I want to revisit it later:

I'm quite happy with how the the characters keep their momentum and slide around when the ship collides; it's not something I actually intended, just a result of my existing physics and platformer controller code, but it definitely feels like the appropriate response.

One-way platforms

I was missing an important platformer feature: the ability to jump up on platforms from below:

Marking tiles as one-way platforms

First, I needed to add some metadata to my LDTK "levels" about which tiles of ship modules were one-way collision platforms. LDTK provides a plethora of ways to add metadata, so it was a bit hard to know where to decide where exactly to put it. I ended up using the "Enum for tile marking" feature on the tileset itself. The name is somewhat confusing though, as it allows for more than one "enum tag" on each tile, so it's more like a bitfield or flags feature.

Accessing that metadata with bevy_ecs_ldtk also turned out to be relatively simple:

fn process_tile_enum_tags(
    mut commands: Commands,
    new_tile_enums: Query<(Entity, &TileEnumTags), Added<TileEnumTags>>,
) {
    for (entity, tile_enum_tag) in new_tile_enums.iter() {
        for tag in tile_enum_tag.tags.iter() {
            if "OneWayPlatform" == tag {

I simply add a marker component to all tiles that have a matching TileEnumTags. Having the marker component lets me easily make queries for tiles with or without one-way platforms, and also decouples my game from LDTK, which is nice, as I eventually want to generate ship modules procedurally.

One downside, though, is that due to the Added query and how bevy_ecs_ldtk spawns levels, it's yet another thing that will have a one-frame delay. Ideally, I'd like the levels to be spawned with everything ready the first frame.


So now that my bevy_ecs_tilemaps had the appropriate metadata, I needed to figure out how best to collide with them...

Previously, I'd implemented a tilemap_aabb_contact function for the ships colliding with asteroids. However, it had no concept of one-way platforms; it simply treated all collision tiles as solid tiles. So I somehow needed to make it work with one-way platforms as well.

So what exactly is a one-way platform? It might seem obvious, but its nice to actually think though it explicitly:

  1. When the player is above, it should block the player from falling down through it
  2. When the player's feet is below it, it should not block the player from falling down
  3. It should never block "horizontal" movement
  4. If the player stands on a one-way platform it should be possible to "drop down" from it by holding down and the jump button at the same time.


I found a a great article about how to implement classic 2D platformers by Emanuele Feronato. They suggest using the previous player position to determine whether the player was completely above the platform the previous frame and if so, treat the tile as solid. This seemed to me like a solid approach, and since my physics implementation is using position-based dynamics I already store the previous positions. Perfect fit :)

However, since my ships are also moving (and can collide with things), it means that rather than just comparing the previous player position, I needed to also take the previous ship position into account (by just subtracting the two). So my high-level approach is basically:

// rust pseudo-code

// convert character position to tilemap space
let character_tile_pos = character_pos - tilemap_pos;

let overlapping_tiles = tilemap_aabb_overlapping_tiles(tile_storage, character_tile_pos, character_size);

if overlapping_tiles.is_empty() {
    return; // no collisions can occur, return early

let overlapping_one_way_tiles: HashSet<_> = overlapping_tiles
    .filter(|t| one_way_platforms.contains(*t))

let ignored_one_way_tiles = if overlapping_one_way_tiles.is_empty() {
} else {
    // convert previous character position to (previous) tilemap space
    let prev_character_tile_pos = character_prev_pos - tilemap_prev_pos;
    let prev_feet_y = prev_character_tile_pos.y - character_size.y / 2.;

        .filter(|t| > prev_feet_y)

let contact = tilemap_aabb_contact(
    // this is a lambda that returns a tile entity and the directions in which it blocks
    |tile_pos| {
            // requirement 2 (don't block if we were already below it)
            .filter(|t| !ignored_one_way_tiles.contains(t))
            .map(|t| {
                let dirs = if overlapping_one_way_tiles.contains(t) {
                    // requirement 3 (one-way tiles should only block downward movement)
                } else {
                (t, dirs)

if let Some(contact) = contact {
    // handle collision response as before

I modified and generalized my tilemap/aabb contact function so it now takes a lambda which determines if a given position should block in the given direction. This removed its dependency on bevy_ecs_tilemap, and also made it much more flexible if I need further special-casing (one-way doors? Doors that only block certain characters?).

With this in place, I could finally jump up on the ledge from below:

Ship-to-ship collisions

I also implemented ship-to-ship collisions, this actually took up nearly half of my time this month.

The tricky thing about ships, is that I wanted them to behave as tilemaps when colliding with players and "small" objects, but as bigger, simplified boxes when colliding with each other and the environment. In order to support this, I needed to implement what is often called compound or composite colliders and collision layers. Getting this right took quite some time, and triggered several other refactors and rewrites of physics stuff to reduce duplication, and I'm still not sure if the solution I came up with is the best, so I'll not go through it all here. Maybe in a later post.

Still, it works okay for now: ships collide, job done.

Yak 1: Premature optimization of collision broadphase

As I implemented ship-to-ship collision—or in physics engine terms—refactored and generalized dynamic-to-dynamic rigidbody collisions, there was one yak I really couldn't leave unshaved:

// collision broadphase, collect all potential rigidbody contacts in a resource
fn collect_collision_pairs(
    query: Query<(Entity, &Rollback, &Aabb)>,
    mut collision_pairs: ResMut<CollisionPairs>,
) {

    // perf: here we are iterating over all combinations of bodies (in n*n time)
    for [(entity_a, rollback_a, aabb_a), (entity_b, rollback_b, aabb_b)] in
        if aabb_a.intersects(aabb_b) {
                .push(if < {
                    ((,, entity_a, entity_b)
                } else {
                    ((,, entity_b, entity_a)

    // sort collision pairs for determinism

This is a (deterministic) "collision broadphase" implemented in the most stupidly simple way possible.

n² complexity means any simulation with a significant number of rigidbodies would break down pretty quickly. i.e. even with just 1,000 rigidbodies, we would have to loop through $$1,000 \times 1,000 = 1,000,000$$ combinations every frame.

So while I didn't actually need 1,000 dynamic bodies for my gameplay (yet?), I wanted to improve on this, so I set up a "test scene". Spawning a thousand bright green debug boxes.

I was prepared to see framerate drop to a crawl, however, to my surprise, Bevy in release mode actually handled this quite well. I still got a steady 60 FPS.

While I could probably have let it go at this point, I was determined to do some premature optimization, so I cranked the numbers up a bit. At 2,500 boxes (6,250,000 combinations), the frame rate dropped below 1, and I finally had something to work with.

So this brought me back to the offending line:

    // perf: here we are iterating over all combinations of bodies (in n*n time)
    for [(entity_a, rollback_a, aabb_a), (entity_b, rollback_b, aabb_b)] in

So how do we avoid iterating over all combinations of rigidbodies? One of the simplest ways to deal with this is a spatial grid hash.

The algorithm is pretty simple. First you build the spatial grid:

  • Initialize a standard HashMap, with integer grid keys, i.e. (i32, i32) and values HashSet<Entity>.
  • Figure out which grid cells each AABB overlaps.
  • For each overlapping cell, look up the cell in the hash map and add the entity to that cell's set of entities.

Then, when iterating instead of looping over all combinations, you can simply check which cells the aabb overlaps and loop through all the entities in those cells.

I've implemented this sort of thing once before, for a (non-physics) game, so I just copied the code over, and I added a system to build the hash:

#[derive(Resource, Reflect, From, Deref, DerefMut, Default, Debug)]
pub struct DynamicsDb(SparseGrid2d);

fn update_dynamics_db(query: Query<(Entity, &Aabb), With<Vel>>, mut db: ResMut<DynamicsDb>) {
    for (entity, aabb) in query.iter() {
        db.insert_aabb(*aabb, entity);

And updated my broadphase system to use the spatial hash:

fn collect_collision_pairs(
    query: Query<(Entity, &Rollback, &Aabb)>,
    mut collision_pairs: ResMut<CollisionPairs>,
    mut diagnostics: ResMut<Diagnostics>,
    db: Res<DynamicsDb>,
) {

    for (entity_a, rollback_a, aabb_a) in query.iter() {
        for entity_b in db.query_aabb(*aabb_a) {
            if entity_a == entity_b {
                // we don't want self-collisions

            let (entity_b, rollback_b, aabb_b) = query.get(entity_b).unwrap();

            if aabb_a.intersects(aabb_b) {
                collision_pairs.push(if < {
                    ((,, entity_a, entity_b)
                } else {
                    ((,, entity_b, entity_a)

    // sort collision pairs for determinism

With this simple change, I was able to handle 5,000 bodies at 60 FPS. 5x more than I started out with. While not completely breathtaking, it's definitely a lot better.

And looking at its performance trace...

...we see that the physics stage, which is within the ggrs_update stage to the left, is taking up a lot of our frame time, and the broadphase is still the biggest part of it. However, it's now not completely out of proportion compared to the parts of the update loop. In fact, rendering is now taking up much more time than physics. And the editor is now the main hog.

By disabling the editor, I was able to crank it up to 20,000 before the frame rate started dropping again.

As you can see physics is still an expensive part of the frame, but so is rendering, and it increases with the number of bodies. There is no way I could get another 5-10x improvement just by optimizing the physics alone, I'd have to start removing sprites for bodies in invisible sectors.

That didn't stop me, though, I was on a premature optimization trip.

First, I avoided allocating a temporary hashmap for the sparse grid query results by instead returning an iterator and passing that directly to Bevy's query.iter_many method:

for (entity_a, rollback_a, aabb_a) in query.iter() {
    for (entity_b, rollback_b, aabb_b) in
        query.iter_many(db.aabb_iter(*aabb_a).filter(|e| e != &entity_a))

That got me to 25k bodies, a small but significant improvement.

Then, I parallelized the loop that queries the spatial hash. I used Bevy's ParallelSlice::par_splat_map in order to spread the work on multiple cores:

fn collect_collision_pairs(
    query: Query<(Entity, &Rollback, &Aabb)>,
    mut collision_pairs: ResMut<CollisionPairs>,
    db: Res<DynamicsDb>,
) {
    let data: Vec<_> = query.iter().collect();
        data.par_splat_map(ComputeTaskPool::get(), None, |chunk| {
            let mut chunk_pairs = Vec::new();
            for (entity_a, rollback_a, aabb_a) in chunk.iter() {
                for (entity_b, rollback_b, aabb_b) in
                    query.iter_many(db.aabb_iter(**aabb_a).filter(|e| *e != *entity_a))
                    if aabb_a.intersects(aabb_b) {
                        chunk_pairs.push(if < {
                            ((,, *entity_a, entity_b)
                        } else {
                            ((,, entity_b, *entity_a)

And that really cut it down:

However, it also made my code uglier, and since I'm trying to also support WASM, it's not going to run in parallel there anyway, and will just cause unnecessary overhead... So I decided to take it out again.

Then I started looking into whether there was something more I could do about making the actual querying of the spatial hash quicker. The internal structure I'd for my spatial hash was a HashMap<(i32, i32), Vec<Entity>>. I had a strong suspicion that this was slow because each cell's Vec stores its data somewhere else on the heap, causing a lot of indirection and cache misses.

So I tried replacing Vec<Entity> with SmallVec<[Entity; 4]>, which means the first four entities of each cell will be stored directly in the hash maps value, instead of going straight to another place on the heap.

This turned out to help quite a bit:

About a 2-3x improvement compared to my previous single-threaded attempt.

For fun I also tried the improved spatial hash in the parallel version:

It's safe to say that in either case, the collect_collision_pairs system is no longer the bottleneck of my physics engine, and definitely not my game.

Yak 2: More premature optimization

In fact, I'd done all of these optimizations while running the game in single-player mode. So I tried running a two-player session, and I found I could "only" get to around 1000 bodies before the frame rate started to drop.

As the trace, shows, the broadphase is now just a very tiny portion of the ggrs_stage.

What's actually slowing it down is the rest of the ggrs_update stage, in particular the RollbackPreStage and update_archetypes within my Physics stage. There's also a lot of empty white space, with no tracing information. This is because bevy_ggrs does not currently offer any tracing instrumentation, so it's really hard to tell what's going in the other two thirds of the ggrs stage.

Fortunately, I remembered seeing a PR for adding tracing spans to bevy_ggrs, so I checked it out and ran the trace again.

We see that it's actually creating the ggrs snapshots that take up the majority of the frame time. This is likely because bevy_ggrs currently uses reflection to make its snapshots.

Ironically, this is an indicator that using Rapier for physics instead would probably have been faster, as it uses a much saner form of snapshotting.

As stated in the bevy_ggrs readme, though, there is a planned rewrite to solve the slow snapshot issues after the Bevy stageless changes goes in, so hopefully this will become less of an issue sometime after Bevy 0.10 is released.

So I finally decided to park it there... for now, at least ;)

Here's a video of the final result: 500 rigidbodies synced by p2p on wasm.

Yak 3: Releasing the spatial hash as a crate

So while I had finally stopped it with the premature optimization stuff, I did not stop yak shaving.

While I don't like copying code, I normally try to follow a "rule of three" (don't generalize something until you've copied it 3 times). However, the sparse grid / spatial hash implementation was an almost clean copy paste from another and very different project. And I didn't really change the API much, just optimized the internals, so I felt it would still make sense to put it into its own crate.

So bevy_sparse_grid_2d was born. You might find it useful if you want to do a similar kind of thing as I just did in the previous section. Or any other kind of checking-things-that-are-close-ish-together-without-n*n-complexity in a simple way.

It should be trivial to convert to 3D as well, if that's your thing.

Yak 4: Screen diagnostics

While I was optimizing, I also wanted an easy way to tell how framerate was affected by my various attempts (without going through the hassle of creating and opening a trace).

While it would have sufficed to just use the built-in bevy diagnostics, which can log the framerate to the console every few seconds, I remembered seeing a neat crate made by laundmo, called bevy_screen_diagnostics. It's a crate that integrates with the existing bevy diagnostics framework and makes it ridiculously simple to show diagnostics directly on the screen:

I'm really impressed by how neatly it integrates with bevy_diagnostics, and how little code was needed.

I added a diagnostic measurement for the number of collision pairs in my physics simulation, i.e. the size of the output of the collision broadphase: the bodies that have to be checked in detail for collisions.

What's next?

So I've been really deep down the physics and optimization rabbit hole, and kind of lost track of what I was doing.

If I backtrack, it takes me back to implementing one-way platforms, which was essentially about improving platformer game feel.

Picking up that thread, the next steps would be to:

  1. Implement dropping down from platforms
  2. Make sure walking off edges feels good (right now the collision box is too wide)
  3. Add more poses and animations (run cycle etc.)
  4. Add coyote time and a couple of other tweaks

However, I also feel like I should get some actual gameplay in place soon. So perhaps I'll look into prototyping mechanics for attaching ships together or installing modules in the ships (turrets, engines etc.).


For future updates, join the discord server or follow me on Mastodon


Loading comments...