On scene recordings and networking
(July 2025 note: I started writing this post a few years ago and left it as a forgotten draft. With the development on MeXanimatoR since then and the surge of summer research activity, I figured it was time to roll it out!)
In a previous post, I mentioned game recordings and how the need for them fits into the collaborative VR performance project. Here, I'll talk more about that from the standpoints of a player and a developer.
Like game games, I'm a sucker for games with replay support. It's immensely satisfying to have proof of how great you did in a round. Even better to enter replay mode, fiddle with the camera and playback speed, and make yourself look awesome. Or even better, make yourself look terrible. I'm a big fan of Rocket League's keyframe system, where you can hit a button during a match to indicate that 1) this replay should be saved after it's over, and 2) you just did something worth revisiting in that replay. Similarly, the PS4's video replay buffer and clip editing capabilities are incredibly impressive to me. Being able to decide after the fact that what just happened is worth preserving, even if it's an awfully cheap shot, is just nice. It's enough to make me consider playing games on PC with OBS's replay buffer always on – just in case.
It's not just the benefit of the replay that entices me. The software that goes into a replay system is inherently interesting as a topic in game development. On paper, it seems straightforward: save a copy of the entire game state every frame, then use that data to recreate the state for playback. But that's a lot of data, so one might save a copy of the inputs for every frame instead, then replay those inputs from the program start to recreate the state. This works well if the system is fully deterministic, but how important is perfect reproducibility? Imagine that a player swishes a magical bubble wand to trigger a particle effect that leaves a trail of whimsical bubbles. Does the replay system capture the particle system state sufficiently well that the randomized bubble sizes and directions are perfectly recreated as the player originally experienced them? Or does it only know enough state to turn on the effect and get something close enough, but not quite exact? If you're making a bubble wand for virtual reality that factors in velocity and rotation to feel natural and fun (and who says I'm not?), these things matter!
It also gets more complicated in a multiplayer game. But if you're already making the effort to capture input for replay purposes, you have part of the puzzle for transmitting inputs, recreating state, and resolving differences in real-time multiplayer environments. Transmitting player inputs instead of positions is a technique that competitive games use because it prevents an exploit where a player runs code to circumvent the game's rules. It also reduces message sizes: if I'm holding W to walk forward, does the server really need to know about it 20 times per second? Or does it just need to know when I start and stop moving, because that's enough information to recreate the game state for all the frames in between?
This technique some limits: interpolating position data from input is OK, but I've had trouble doing the same with orientation data. And from what I've seen, plenty of games just accept the client's look direction to address this. You can imagine a scenario, then, where the malicious player can't teleport through a wall (a position change that can't be reproduced on the server side using the player's transmitted input), but they can run an aimbot to help them get lock onto enemy players. Even for social, non-competitive games, these are the kinds of issues worth at least a passing discussion. If your VR game includes safety mechanisms to protect a player's virtual space, you don't want someone to be able to bypass those.
In MeXanimatoR, I want players to feel like they're sharing a virtual space with one another. Like plenty of other VR games, that means we transmit tracked device positions to capture body movement, and we spatialize voice chat. Unlike flat multiplayer games, input capture doesn't really cut it here. We have to share positions and orientations to effectively reconstruct body movement so other players can get a true representation of their behavior–hey, this sounds a lot like replay functionality! They share a lot of the same problems, funnily enough. Which objects to save (or broadcast)? How is instantiation handled at the start of replay (or when joining a game)? What framerate for recording (or broadcasting) gives the best tradeoff between accuracy and performance?
MeXanimatoR can record a player's data (body movement and audio) to create a replayable performance. It can also transmit this data for socializing and collaborating in VR. The problems are quite similar, and yet the code that enables these two features has very little overlap. That's kind of frustrating! But understandable, to a limit. Player recording might happen at 60 FPS (or maybe whatever the headset's refresh rate is set to), but that framerate might be too high for the network. On the flipside, if we want to locally record data at 60 FPS from a networked player broadcasting at 20 FPS, what do we do? Is the network movement system interpolating in between those frames? Is our interpolation "close enough" to the real thing to not matter, or do we lose anything special in the process? Are we wasting space recording interpolations that could just as well be recreated, or is the frame-to-frame stability preferable during replay over spending some compute?
If I were writing this kind of VR software completely from scratch like a cool guy would, I might have worked on unifying the shared functionality between recording to disk and broadcasting over the network. And heck, that might be a big enough tradeoff to pursue some time in the future. But MeXanimatoR began not as a single-player game, but as an asynchronous multiplayer game that was "one single player at a time." Recording functionality came long before the use of NGO and Dissonance to bring this thing online. But who knows. If I ever enter a strange mood and decide to scrap the use of Unity plugins to make my own package that elegantly handles both of these things, I'll keep you posted.