Keeley Erhardt is a Software Engineer at Improbable, working on SpatialOS. She previously worked as a Graduate Research Assistant at MIT, where she developed a blockchain-based framework for verifiably safe data exchange.
Like most companies, Improbable has many secrets; certificates, credentials for databases, API keys for external services, credentials for service-oriented architecture communication and more; and many, many ways of securing these secrets.
We manage each type of secret differently; however, all of our certificates are generated, secured, stored, and accessed using Hashicorp Vault. Vault is an open-source tool that provides an interface to secrets stored in persistent storage; it supports a large number of storage backends, including etcd and Google Cloud Storage (GCS), among others.
The Problem: locking our keys in the vault
When we started using Vault, we were using etcd. However, we recently decided to migrate our Vault storage backend from etcd to GCS, and to upgrade from Vault v0.6.5 to v0.9.1. Unfortunately, as we belatedly discovered, the open-source version of Vault does not support the migration of storage backends.
The number of secrets we had stored in Vault was relatively small, consisting only of certificates. However, Vault offers no built-in mechanism for exporting the bits of critical information that we needed to migrate. Crucially, we use Vault’s PKI secrets engine to generate dynamic X.509 certificates, and the API provides no way to export the private keys associated with these certificates.
Essentially, we needed to move our Vault, but some of the keys required to enable this migration are protected by Vault itself and cannot be revealed. After some brainstorming, we devised a cunning plan that would extract the necessary data from our Vault, unblock the migration and enable the upgrade of our Vault.
First, we needed access to the Vault unseal keys. Vault starts in a sealed state, which means that the encryption key needed to read and write from the storage backend is not yet known. To unseal the vault, we had to exchange a master key for an encryption key. So we just needed to access our master key.
To mitigate the risk of a malicious actor gaining access to the master key and using it to decrypt the entire Vault, the master key is split into multiple shares using an implementation of Shamir’s Secret Sharing technique. Only a subset of these shares is needed to reconstruct the master key. These shares are revealed when Vault is first initialized and should then be securely stored. Given that we still had access to these shares, we could reconstruct our master key.
Copying the Vault
Next, we needed a copy of the Vault storage backend holding our secrets. Vault storage backends are responsible for the durable storage of encrypted data. The Vault we wanted to migrate was using the etcd storage backend, used to persist Vault’s data in etcd.
Copying the Vault storage backend required (1) exec-ing into the Kubernetes pod running the etcd storage backend, (2) snapshotting the keyspace from a running etcd member, (3) copying the snapshot from the pod, (4) restoring the snapshot to a local, temporary, 1-member etcd cluster, and (5) starting the local etcd cluster. This gave us the copy we needed.
Building more barriers to overcome
Now that we had a copy of the Vault storage backend running on a local etcd node, we had to get into it. However, the data in the backend is (of course) encrypted. We needed a mechanism for extracting and decrypting arbitrary key-value pairs from the backend, in particular, the key-value pairs that Vault restricts access to even when provided with the unseal key. Hashicorp’s Vault implementation does not permit users to export the private keys associated with CA certificates, so we needed to devise a mechanism for circumventing this restriction.
To get around the restriction, we wrote a modified Vault frontend to run on top of our encrypted etcd backend. The frontend consists of a custom Vault barrier, unsealed using the master key recreated from the initial shares mentioned above, that permits the extraction of arbitrary key-value pairs – an operation not permitted by the standard Vault API.
First, we configured a Vault frontend to run on top of use the local etcd node holding our encrypted secrets. We configured the Vault frontend to match the configuration of the old Vault instance.
Next, we constructed our new Vault barrier.
This barrier needed to be unsealed before any keys could be decrypted and extracted. To unseal the barrier, we had to reconstruct the master key using three of the five shares of the master key. The total number of shares that the master key is broken into and the number of shares required to unlock Vault is configurable; we use the default, five total shares, three of which are needed to unlock Vault. We combined these three shares using Shamir’s Secret Sharing technique, utilizing a library provided by Hashicorp.
Now that we had an unlocked barrier, we could extract any key in plain text! We first listed the paths to all of the keys stored in our Vault to find the ones we cared about.
Next, we passed the paths to the keys we cared about to our script to individually extract each key-value pair.
Finally, we migrated each of these extracted key-value pairs to our new GCS-backed Vault, completing the migration!
Running our modified Vault frontend enabled us to extract the private keys we needed to upgrade and move Vault. Perhaps in the future migrating between storage backends will be supported in the open-source version of Hashicorp Vault. In the meantime, we were forced to use a hacky but effective workaround – if you want to try it out, you can find the complete modified Vault frontend on our Github.
(This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, you can obtain one at http://mozilla.org/MPL/2.0/.)