Skip to content

Solution overview

R-VdP edited this page Jul 23, 2020 · 7 revisions

Motivation

The aim of this project is to provide field staff with the possibility to quickly, easily, and autonomously lock the data partition of a server in case of emergencies. To reach this goal, we developed a small web application which will be included in our standard server configuration and which will allow the management of a field project or mission, to simply click on a button to deactivate the decryption key of all servers in their network for which this functionality has been enabled. Whenever required, a backup key can be used to re-enable the decryption key and resume normal operations, this backup key is kept by the HQ IT team.

Requirements

The main requirements were the following:

  1. The application should be as easy-to-use as possible, given that it will be mainly used during high-stress emergency situations.
  2. The operation of securely locking the servers should be quick (matter of seconds). We cannot afford needing to wipe the whole hard drive.
  3. The application should be accessible from anywhere within the local network, without authentication, we prefer an (accidental) denial of service attack rather than an unwanted disclosure of data.
  4. The application should not rely on any internet connectivity.
  5. If a site has multiple (compatible) servers deployed, they should all be locked by a single action.

Limitations

  1. We only support servers using our standard NixOS config, even though the application could probably be integrated on other platforms as well.
  2. Servers all need to be able to reach each other over the (local) network in order to be locked with a single action.
  3. The servers to be locked are currently configured statically, this could be automated (see below).

Architecture

Principle

The data partition of all our servers, the part of the hard disk which contains all of the actual sensitive data, is encrypted with an industry-standard encryption scheme. The encrypted partition can be unlocked with either of two keys:

  1. the primary key used during normal operations which the server can obtain while booting; and
  2. a backup key which is created during the server installation and securely kept by the HQ IT unit in order to be able to recover the data and restore functionality if the primary key would somehow be lost (or deliberately disabled).

A safe, and very fast method to render this data partition inaccessible, therefore consist in securely deactivating the primary key so that it no longer unlocks the data partition. The server will remain fully functional and remotely manageable, but the application data stored on it will not be accessible and any applications needing this data, will not be started any more.

To summarise, we rely on the encryption of the data partition to protect our data in the absence of a key to unlock the data partition, which is exactly what encryption is designed to do. When requested, the backup key kept at HQ can be used to re-activate the disabled primary key, restoring all functionality as before.

The web application can lock multiple servers at the same time, we can configure all servers in the same network such that locking one, automatically locks all the others as well.

Technical

The application consists of a front-end and a back-end.

The back-end is a Python application based on Flask which serves the front-end and exposes a set of REST endpoints, allowing the front-end to obtain its configuration and to send the requests to lock the servers and to verify whether locking succeeded.

Besides some parameters like the amount of retries and the time between polls, the configuration contains the list of servers to lock in this project (we might consider automatically detecting this using mDNS and Avahi instead of relying on a static configuration).

The front-end is a single-page web application written in Elm. When accessed, it retrieves the configuration from the back-end, and presents a first view where the user needs to enter a verification text (to prevent accidental locking) and can then press the button to launch the process. When launched, the front-end will, concurrently, send a request to the back-end of each configured target server, instructing them to lock themselves, retrying for a pre-defined number of times when unsuccessful. Subsequently, we will poll the verify end-point of every server until the server indicates to have:

  1. successfully disabled the primary key for the data partition;
  2. rebooted to make sure that the data partition is not mounted any more and that any applications relying on this partition have been stopped;
  3. ran a couple of sanity checks (data partition no longer mounted, uptime confirms that a reboot has taken place).

This will be done for every server configured (or detected) in the same network. The server is already securely locked once the key has been disabled, which is a matter of milliseconds once the HTTP request reached the server. The subsequent reboot is an additional safety measure to be absolutely sure that no mount points or applications are still lingering around.

We summarise this protocol in the following sequence diagram:

Deployment

The Panic Button service will be deployed as a NixOS module and be part of our standard server configuration. We can enable or disable and configure this service on a per-server basis. The NixOS module integrating this application into our config can be found here.

Re-enabling the key

More information can be found on this page.

Clone this wiki locally