As with most things at Improbable, we run Prometheus at unusually large scale
We’re very lucky to have access to some incredible open source software and tools. The open source ecosystems powered by organisations such as CoreOS enable us to build a platform at very large scale, quickly, safe in the knowledge that these are tried and tested projects with many experienced contributors.
In this post, we want to go into a little detail about some of the projects we love, and some of the changes we’ve contributed to those projects.
Our Head of Infrastructure, Michał Witkowski, spoke at CoreOS Fest about how Improbable uses open source, and announced the release of Flagz, an annotation-based flags library for Java, Scala and Go. You can find out more about our open source projects, Flagz and Polyglot, here.
CoreOS is a lightweight Linux distribution, designed to make large, distributed deployments easier to scale and manage. It fully embraces containers, supporting Docker as a container runtime, as well as the CoreOS project’s own rkt.
At Improbable, we use CoreOS at scale within our platform, and have contributed back to the project through bug reports. As well as an operating system, CoreOS is an ecosystem of related components, including Fleet and etcd.
etcd is a distributed key-value store, allowing easy storage and access of data throughout a cluster of machines. We use etcd at large scale, with more than a dozen clusters, and have contributed to the integration of Prometheus-based monitoring within etcd.
Fleet is an init system for clusters; a shared systemd across many machines. As a cluster manager, fleet allows deployment of single container anywhere in the cluster, or multiple containers in many configurations across varied hosts. Fleet can be used to protect against failure by maintaining a fixed number of instances of a given container or service. Our use case for fleet is a bit unique, but we’ve found it a fantastic tool, and have also contributed fixes to scalability and monitoring.
Prometheus is a monitoring system built and open sourced by Soundcloud. Prometheus uses a target model whereby it pulls from applications providing metrics according to the specification, making it extremely flexible and scalable. As well as monitoring, Prometheus contains a component, Alertmanager, with a powerful expression language and management of alerts through grouping, deduplicating and other utilities.
As with most things at Improbable, we run Prometheus at unusually large scale, pushing its experimental federation support to its limit, and work closely with the Prometheus developers on improving scalability of the system. We also provide Prometheus metrics of SpatialOS deployments direct to users.
GRPC and Gateway
gRPC is an cross-language RPC mechanism built by Google, following the finalisation of the HTTP/2 standard. gRPC is our tool of choice for inter-service communication, across all layers of our platform, using a set up very similar to that discussed here. We’ve contributed some ongoing bug fixes to gRPC, as well as contributing to design discussions.
A project building off of gRPC is gRPC gateway, an automatic RESTful service generator. gRPC Gateway generates a REST API served by reverse-proxy over the specifying gRPC service. As users of gRPC Gateway, we contributed some error handling code, and support for version 3 of protobuf, Google’s data interchange format.
Bazel is the build tool by Google. Bazel has an emphasis on reproducibility, and can be used for both client and server. We love building our Scala with Bazel, so we contributed improved build mechanisms for Scala, particularly around tests.
We strongly believe in the importance of contributing back to the projects that make our lives easier every day. If you are interested in getting a start with contributing to open source, GitHub have a great guide on how to get started here.
We especially welcome contributions to Flagz and Polyglot, why not start your first open source contribution there?