Cargo Space devlog #3 - Sprites, de-syncs, cross-platform p2p, kayak_ui, cargo-deny

I drew a couple of sprites, added cross-platform multiplayer, added some ui using kayak_ui and fixed a lot of network-related bugs. Last time, I thought I would be able to take a break from the technical side. It turned out I was wrong. I had a lot of tough bugs, both with de-syncs, and NAT-traversal. I guess it's one of those reasons why they tell you not to make a networked multiplayer game.

Debugging de-syncs with GGRS' synctest session

Last time I thought I'd managed to get a completely deterministic simulation. As I tested more thoroughly with friends over the internet, that turned out not to be the case. After a long time of playing, positions of players and the ship would go slightly out of sync.

I'd run out of obvious non-determinism bugs to fix (registering gameplay components for rollback, solving system order ambiguities, see my previous post). So I thought I'd make use of GGRS' synctest session, which automatically rolls back and then forwards a couple of frames each display frame, and compares checksums of frame snapshots each time it gets to the same frame number.

This turned out to be problematic, however, since bevy_ecs_ldtk relies on Changed and Added queries in order to spawn levels, and bevy_ggrs triggers these each time it restores a snapshot, regardless of whether the component actually changed. Since I'd used an LDTK level for the ship module, this meant the ship module respawned every single frame since this mode triggers a rollback every frame so level spawning never finished successfully and just restarted over and over. So I had the option of going down the rabbit hole of trying to make LDTK level spawning rollback-safe, or implement non-ldtk ship modules a bit prematurely. I went with the second option.

This was a bit visually demotivating, since it meant I just implemented the exact same mechanics without the graphics (for now).

Also, in order to actually check for de-syncs, I needed to add #[reflect(Hash)] to the components I wanted to be part of the checksums. And that meant I had to implement the std::hash::Hash trait as well... even for components where it doesn't really make sense, i.e.:

#[derive(Component, Reflect, Default, From, Deref, DerefMut, Clone, Copy, Debug)]
#[reflect(Component, Hash)]
pub struct Pos(pub Vec2);

impl std::hash::Hash for Pos {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        self.0.x.to_bits().hash(state);
        self.0.y.to_bits().hash(state);
    }
}

This added a lot of boiler-plate and kind of rubbed me the wrong way... When I got it up and running, however, I finally got the error message I needed help focus my debugging efforts:

2022-12-27T15:25:43.803025Z  WARN bevy_ggrs::ggrs_stage: Detected checksum mismatch during rollback on frame 2.

So something was obviously wrong.

The problem now was figuring out what exactly caused the checksum mismatch... bevy_ggrs just tells us on which frame the game de-synced, not what de-synced it. At this point I almost started looking into implementing some sort of diffing support for bevy_ggrs, but in the end, I went with a more simple-stupid approach.

First, I tried temporarily enabling and disabling large parts of gameplay-related code. When I disabled the ship plugin, I got no de-syncs, so I knew it had to do with the ship. Then I disabled physics, and got no de-syncs either. I started logging some of the most significant gameplay state, only it was pretty hard to tell which logs belonged to which frame, so I added a little bit of logging to bevy_ggrs as well in order to tell which frame was which. In the end, I had log output like the following:

2022-12-27T15:54:53.362078Z DEBUG bevy_ggrs::ggrs_stage: advancing to frame: 2
2022-12-27T15:54:53.362159Z  WARN cargo_space::physics: GRAVITY: global transform scale: Vec3(1.0, 1.0, 1.0)
2022-12-27T15:54:53.362215Z  WARN cargo_space::physics: GRAVITY: in_ship_module: Some(Mut(InShipModule(Some(10015v0))))
2022-12-27T15:54:53.362262Z  WARN cargo_space::physics: GRAVITY: effective_gravity: [0, -1]
2022-12-27T15:54:53.362541Z DEBUG cargo_space::character_movement: update_player_state
2022-12-27T15:54:53.362588Z DEBUG cargo_space::character_movement:   player: 10014v0 Mut(Idle), EffectiveGravity(Vec2(0.0, -1.0))
2022-12-27T15:54:53.362689Z DEBUG bevy_ggrs::ggrs_stage: frame 2 completed
2022-12-27T15:54:53.363013Z  INFO cargo_space::character_movement: Adding particles to player
2022-12-27T15:54:53.375848Z DEBUG bevy_ggrs::ggrs_stage: saving snapshot for frame 2
2022-12-27T15:54:53.376440Z DEBUG bevy_ggrs::ggrs_stage: advancing to frame: 3
2022-12-27T15:54:53.376541Z  WARN cargo_space::physics: GRAVITY: global transform scale: Vec3(0.125, 0.125, 1.0)
2022-12-27T15:54:53.376595Z  WARN cargo_space::physics: GRAVITY: in_ship_module: Some(Mut(InShipModule(None)))
2022-12-27T15:54:53.376642Z  WARN cargo_space::physics: GRAVITY: effective_gravity: [0, 0]
2022-12-27T15:54:53.376975Z DEBUG cargo_space::character_movement: update_player_state
2022-12-27T15:54:53.377032Z DEBUG cargo_space::character_movement:   player: 10014v0 Mut(Idle), EffectiveGravity(Vec2(0.0, 0.0))
2022-12-27T15:54:53.377081Z  INFO cargo_space::character_movement: Idle -> Fly
2022-12-27T15:54:53.377172Z DEBUG bevy_ggrs::ggrs_stage: frame 3 completed
2022-12-27T15:54:53.390356Z DEBUG bevy_ggrs::ggrs_stage: restoring snapshot for frame 1
2022-12-27T15:54:53.390978Z DEBUG bevy_ggrs::ggrs_stage: advancing to frame: 2
2022-12-27T15:54:53.391097Z  WARN cargo_space::physics: GRAVITY: global transform scale: Vec3(0.125, 0.125, 1.0)
2022-12-27T15:54:53.391151Z  WARN cargo_space::physics: GRAVITY: in_ship_module: Some(Mut(InShipModule(None)))
2022-12-27T15:54:53.391198Z  WARN cargo_space::physics: GRAVITY: effective_gravity: [0, 0]
2022-12-27T15:54:53.391758Z DEBUG cargo_space::character_movement: update_player_state
2022-12-27T15:54:53.391806Z DEBUG cargo_space::character_movement:   player: 10014v0 Mut(Idle), EffectiveGravity(Vec2(0.0, 0.0))
2022-12-27T15:54:53.391853Z  INFO cargo_space::character_movement: Idle -> Fly
2022-12-27T15:54:53.392065Z DEBUG bevy_ggrs::ggrs_stage: frame 2 completed

At this point I had enough information to figure out that the de-sync was caused by player gravity being different on frame 2 the first and second time it was simulated. I looked at the system calculating player gravity (inside or outside ships), and found out the source of the problem: It depended on the ships' GlobalTransform, which is computed from Transform and the parent-child hierarchy... The funny thing is that Transforms were actually properly synced, however the transform propagation system, which makes sure that the GlobalTransform values correspond to the actual place in the hierarchy is only run in the built-in bevy PostUpdate stage, which runs in-between render frames, not on every rollback update. I couldn't just roll back GlobalTransform, as that would essentially just make it stuck on the first frame, so in order to fix this bug, I basically had two options:

Don't use GlobalTransforms in gameplay systems
Actually propagate GlobalTransforms in the rollback schedule as well

I went with option 2 for now. As it turned out to actually be a one-line fix:

// ggrs schedule:
    .with_system(bevy::transform::transform_propagate_system.after(tick_frame_count)),

I'm kind of worried this will subtly break something else that expect transform propagation only to happen in the PostUpdate stage. It also turned out to be quite a performance hog to propagate transforms every ggrs frame... As you can see in this this profile:

However I'll park it here for now. And keep it in mind if I have performance issues or subtle bugs with the transform hierarchy.

Finding the actual de-sync

Now that the game wasn't immediately de-syncing, I tried running it again in regular networked mode. However, I was still seeing de-syncs after a while.

I tried playing the game for a long while in synctest session, but it detected nothing.

I had a very strong suspicion that the de-sync could be due to iteration order being non-deterministic somewhere in the physics module, as I'd heard about other people having this problem. Thanks to Zicklag and the Fish Folk project for sharing their experience with ggrs in their game, Jumpy :).

So even though I thought I knew what it was, I wanted an actual predictable error to confirm that I'd fixed it, since triggering a visible de-sync could often take a couple of minutes of gameplay. So what I did was I implemented some makeshift version of local multiplayer, where one player input is controlling several characters:

After I had this in place, I got a desync checksum error immediately after touching the ship. So I went through the systems in my physics module, and fixed the suspected system by ordering the entities before iterating:

fn solve_pos_dynamic_tilemaps(
    mut players: Query<(Entity, &Rollback, &mut Pos, &Mass, &InShipModule), With<Character>>,
    mut dynamic_tilemap_bodies: Query<(&mut Pos, &Mass), Without<Character>>,
    // ...
) {
    let mut players: Vec<_> = players.iter_mut().collect();
    // sort by rollback component to stay deterministic
    players.sort_by_key(|p| p.1.id());
    for (player_entity, _rollback, mut pos_a, mass_a, in_ship_module) in players {
        // mutate positions on tilemap collisions
    }
}

Once I'd done this, I saw no de-syncs while pushing the ship around.

This might seem like a very roundabout way of making triple sure the bug is actually fixed, but I'm pretty sure this isn't the last de-sync I will see during the development of this game, so I'm counting on all the trouble paying off in the long run. Being able to run synctest sessions seem like a very valuable tool for debugging.

I posted about my de-sync gotcha findings on this bevy_ggrs issue, the plan is to eventually transform it into documentation either in bevy_ggrs or perhaps Bevy itself.

Yak 1: Supporting cross-platform sessions

My networking library, Matchbox, is built upon WebRTC, which means its primary target was always web browsers. As it turned out, however, the WebRTC.rs project implemented the WebRTC protocol for native rust as well, which meant it was also possible to add an alternative implementation of the matchbox_socket crate using WebRTC.rs. And this is what we did in Matchbox 0.4.

This meant Matchbox could be used for native applications as well, and it turns out that's actually a pretty good way to establish direct connections between two users behind a NAT...

...the only problem was that there were some slight incompatibilities between the web and native implementations, which meant connections between native and web failed.

A couple of weeks ago, I received a PR fixing the last of these issues. I was quite excited to get this in, so after testing it locally I merged it and was pretty eager to try it out with Cargo Space.

And it worked great... at first. Although after trying to test the game online with a friend, I realized that it had introduced two regressions:

The wasm version would occasionally panic while connecting, freezing the game.
It had started failing to punch through some NAT (old router) setups that it happily pierced before. So I could no longer test the game with my friend.

The panic was quite easily fixed, but the NAT-punching issue was harder.

I couldn't reproduce the issue on my own, not even over the internet between my PC and my phone (yes, Cargo Space can actually run in my phone's browser :)). I was reliant on my friend to help test whenever I thought I'd found a fix for the issue.

And I thought I'd found the bug numerous times. I cleaned up old hacks, I added a bunch of debug logging to the project and in the end, after I restored a piece of code that really shouldn't be needed, it started working again... So I wrote and merged a fix that I don't really understand how works, which rubs me the wrong way, but I also can't see how it can do any harm, and at least the regression is gone and cross-play still works.

This took a lot of time, but I now have a game that can be played across browsers and native. This opens some interesting alternatives, like letting people easily try the wasm version as a demo, then buy/download the premium version on Steam/Itch for better performance and the full experience. Maybe the native version could spit out a link that you send to friends to let them join your game?

I've also learned that I should probably test with this specific friend before I merge things that may affect NAT traversal.

Yak 2: CI and automatic deploy to web

So, when I was going to test with people over the web, I needed to distribute the game somewhere. I decided to go with the web version, since clicking a url is a lot less hassle for testers than downloading and extracting an archive, and setting up a Steam beta still seems too early.

I could have done this manually, and just use command line tools to push the build and assets to my s3 instance, but while I managed to not go down the Steam route, I decided to go full-on yak-shaving on this one and automate everything.

Rather than scripting, I used the cargo xtask approach to build and optimize the wasm binaries. The advantage of this is that I can run cargo xtask dist both locally on windows, and in CI (linux), and it will give me the same result. It also means build tool dependencies are tracked in Cargo.lock which makes the results even more predictable.

After I'd done this, I added a simple gitlab ci job that built the project:

build-web:
  stage: build
  image: rust:latest
  script:
    - rustup target add wasm32-unknown-unknown
    - cargo xtask dist
  artifacts:
    expire_in: 30 days
    paths:
      - "target/release/dist/web"

And another that just uploaded the files to my s3 instance and posted a link in my private Discord.

This means that each time I push to my repo, it only takes a couple of minutes before I get a link I can share with people to test the latest version.

Yak 3: Player ping overlay, exploring kayak_ui

As I was debugging connection issues I wanted an easy way see player ping and connection status, and while logs are perfectly fine for this, I felt an inexplicably strong desire to make some UI.

Now comes the question, how do I make that UI? I'd already used the built-in bevy_ui for the "waiting for x more player" lobby screen, which I copied from my matchbox_demo example, so that would have been a natural choice.

However, to me bevy_ui feels really awkward to use, it relies on a lot of boilerplate to do really simple things, for instance to update text, you need to index into a sections object:

    query.single_mut().sections[0].value = if remaining > 1 {
        format!("waiting for {remaining} more players")
    } else {
        "waiting for 1 more player".to_string()
    };

And the spawning and styling of the text is not exactly succinct either:

// All this is just for spawning some centered text.
commands
    .spawn((
        LobbyUI, // marker to destroy when exiting lobby
        NodeBundle {
            style: Style {
                size: Size::new(Val::Percent(100.0), Val::Percent(100.0)),
                position_type: PositionType::Absolute,
                justify_content: JustifyContent::Center,
                align_items: AlignItems::FlexEnd,
                ..default()
            },
            background_color: Color::BLACK.into(),
            ..default()
        },
    ))
    .with_children(|parent| {
        parent.spawn((
            LobbyText, // marker to update text
            TextBundle {
                style: Style {
                    align_self: AlignSelf::Center,
                    justify_content: JustifyContent::Center,
                    ..default()
                },
                text: Text::from_section(
                    "entering lobby...",
                    TextStyle {
                        font: asset_server.load("fonts/quicksand-light.ttf"),
                        font_size: 48.,
                        color: Color::GRAY,
                    },
                ),
                ..default()
            },
        ));
    });

So I was quite excited when I saw kayak_ui on Chris Biscardi's YouTube channel. It has a declarative approach to ui similar to React, which I'm already familiar with due to my background as a consultant/web developer, so it seemed like the perfect fit.

I read its book, but ended up having some trouble getting its examples to run. I managed to get them to work after comparing against the examples in the repo's example folder, so I submitted a couple of PRs to fix the snippets in the documentation.

Once I had it running, it was quite easy to get a UI up and running. The layout concepts were similar enough and documentation good enough that it felt pretty familiar to set up a basic player stats overlay:

I really like that rendering and updating is now the same thing. For instance, this is the code for each row in the player stats table:

let (status, ping) = if let Some(info) = info {
    let status = match info.status {
        ConnectionStatus::Running => "".to_string(),
        _ => format!("{:?}", info.status)
    };
    (status, format!("{}", info.ping))
} else {
    ("".into(), "".to_string())
};
constructor! {
    <BackgroundBundle styles={KStyle {
        height: Units::Pixels(25.).into(),
        width: Units::Stretch(1.).into(),
        layout_type: LayoutType::Row.into(),
        ..default()
    }}>
        <TextWidgetBundle
            styles={KStyle {
                width: Units::Pixels(150.).into(),
                ..default()
            }}
            text={TextProps{
                content: player.to_string(),
                size: 16.,
                ..default()
            }}
        />
        <TextWidgetBundle
            styles={KStyle {
                width: Units::Stretch(1.).into(),
                ..default()
            }}
            text={TextProps{
                content: status,
                size: 16.,
                ..default()
            }}
        />
        <TextWidgetBundle
            text={TextProps{
                content: ping,
                size: 16.,
                ..default()
            }}
        />
    </BackgroundBundle>
}

One thing I don't like, though, is the XML, and while I prefer it over using plain bevy spawn commands, I think a syntax more like QML or Dioxus' markup would have been more pleasant. i.e.:

cx.render(rsx!(
    Container {
        Light { color: "red", enabled: state == "red", }
        Light { color: "yellow", enabled: state == "yellow", }
        Light { color: "green", enabled: state == "green", }
    }
))

I had a lot of trouble implementing functionality to properly hide the UI after the tab button was released. In the end, it turned out to be a bug in kayak_ui. It has issues with diffing removed items properly. However, the crate maintainer, StarArawn, was really helpful, and with his help I managed to get it working using the not-yet-merged bugfixes branch and the workarounds in this comment.

I also have some issues with text rendering:

It only happens on non-hidpi screens, and it seems to depend on the actual font size. The corresponding github issue is here.

All-in-all kayak_ui seems like a promising library, but as the repository's readme says: it's still in very early stages of development. I really hope development continues, though, I really like this pattern of UI development.

Yak 4: cargo-deny

As I mucked around with a lot of different git releases and patches to various crates this time, I ran into the problem of having duplicate incompatible dependencies quite a few times.

For instance, bevy_pancam has a bevy_egui feature, which depends on bevy_egui. bevy-inspector-egui depends on bevy_egui as well, which is fine, however when a new bevy-inspector-egui version was released, it bumped its bevy_egui dependency to a newer version than the one bevy_pancam. This meant I ended up having two versions of bevy_egui in my repo, and the feature in bevy_pancam no longer did what it was supposed to.

In order to avoid issues like this, I decided to try out cargo-deny, which is a pretty cool cargo plugin made to avoid exactly these kinds of issues, among other things.

This means the aforementioned issue would result in an error.

as you can see, it tells me exactly what is causing it. I could easily add this check to CI, and catch both build size issues and bugs like this early.

While I was out shaving yaks, I also checked out the other features of cargo-deny, and they seem really useful to have down the line. In particular, it can scan your dependencies for license incompatibilities. This is useful for me, since I (at least for now) intend my game to be closed-source, and I want to avoid GPL-licensed code.

Another cool thing, is that it can also generate a web page in order to comply with attribution requirements in various licenses. This is way too early to think about for my game, but nice to know about nevertheless.

While I was testing this, I also noticed kayak_ui was missing license metadata, so I submitted a PR for that as well.

Returning to actual development

So at this point, I'd gone pretty deep yak shaving, fixing a lot of stuff I could easily have avoided or at least postponed.

It was time to return to what the original plan was, improve platformer game feel and player visuals.

I didn't do much about game feel, but I added a character animation system based on the one gschup made for our jam game, A Janitor's Nightmare

And I drew a couple of character poses:

They're pretty rough, and I should probably do another pass soon, but it's starting to look and feel a bit more like an actual platformer game:

Status & plan

That's it for now, I did a lot of yak-shaving and bug-fixing this time, so my plan is more or less exactly the same as last time: Continue improving the look and feel of the game and implementing the missing ship-to-asteroid and ship-to-ship collisions.

Updates

For future updates, join the discord server or follow me on Mastodon.

Comments

Loading comments...