As a single SpatialOS game is made out of hundreds of stateful game servers, we were faced with interesting challenges related to observability. Not only in relation to understanding the state of large complex game worlds, but also in the systems-level observability that many dev-ops systems already address.
The SpatialOS Runtime is not only the canonical source of truth for the entire game state, but it provides other crucial services to a running game, such as the Server Scheduler. To tackle the hard problem of understanding what the SpatialOS Runtime is really up to, we had to build a set of bespoke web tools. The most advanced of these was the Inspector. The Inspector is a general purpose tool for viewing, debugging and modifying the behaviour of the Runtime and the internal game state.
As SpatialOS provides the hosting for the game servers that make up a SpatialOS game, we also need to provide the usual offering of a cloud provider – logs and metrics. A single SpatialOS game world consists of hundreds of game servers; take into account that we run hundreds of similar game worlds at the same time and you can see that the observability requirements for our platform are at a much larger scale than the standard logs and metrics needs of a web service stack.
In order to address the massive volume of SpatialOS monitoring data, we developed and open-sourced Thanos, a distributed and globally scalable Prometheus stack that allows long-term storage of metrics. Thanos is an incredible success story for us. Not only did we contribute back to the open-source community, on which we’ve built a lot of our technology, but we also had others help us make our product better by contributing to Thanos.
Whether you’re interested in building highly scalable data backends, efficient APIs or complex visualisation tools, SpatialOS Core Platform has all these challenges and much much more.