Skip to content

Commit

Permalink
[PAL,LibOS,common] Add file recovery support for encrypted files
Browse files Browse the repository at this point in the history
Previously, a fatal error during writes to encrypted files could cause
file corruption due to incorrect GMACs and/or encryption keys.

To address this, we introduce a file recovery mechanism using a "shadow"
recovery file that stores data about to change and a `has_pending_write`
flag in the metadata node indicating the start of a write transaction.
During file flush, all cached blocks that are about to change are saved
to the recovery file in the format of physical node numbers (offsets)
plus encrypted block data. Before saving the main file contents, the
`has_pending_write` flag is set in the file's metadata node and cleared
only when the transaction is complete. If an encrypted file is opened
and the `has_pending_write` flag is set, a recovery process starts to
revert partial changes using the recovery file, returning to the last
known good state. The "shadow" recovery file is cleaned up on file
close.

This commit adds a new mount parameter `enable_recovery = [true|false]`
for encrypted files mounts to optionally enable this feature. We extend
the file flush logic of protected files (pf) to include the recovery
file dump and the setting/unsetting of the update flag. We make changes
to the public pf APIs: the `pf_open()` API is extended to make the pf
aware of the underlying recovery file managed by the LibOS, and recovery
information (e.g., whether the pf needs recovery) is exposed back to the
LibOS via a new `pf_get_recovery_info()` API. To facilitate the LibOS to
initiate a file recovery process on file open, a new PAL API
`PalRecoverEncryptedFile()` is introduced.

Signed-off-by: Kailun Qin <[email protected]>
  • Loading branch information
kailun-qin committed Feb 11, 2025
1 parent ef48c72 commit 00a90f3
Show file tree
Hide file tree
Showing 33 changed files with 6,740 additions and 49 deletions.
17 changes: 13 additions & 4 deletions Documentation/devel/encfiles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -508,10 +508,19 @@ Additional details
least one process writes to the file), the file may become corrupted or
inaccessible to one of the processes.

- There is no support for file recovery. If the file was only partially written
to storage when the app abruptly terminated, Gramine will treat this file as
corrupted and will return an ``-EACCES`` error. (This is in contrast to Intel
SGX SDK which supports file recovery.)
- File recovery: Gramine supports recovery for encrypted files, which can be
enabled via the ``enable_recovery`` mount parameter in the Gramine manifest.
This allows a file to be recovered from a corrupted state (caused by e.g.,
incorrect GMACs and/or encryption keys) when it was only partially written to
storage due to a fatal error (e.g., abrupt app termination). Similar to Intel
SGX SDK’s recovery mechanism, Gramine uses a "shadow" recovery file and a
``has_pending_write`` flag in the metadata node to manage write transactions.
During file flush, cached blocks about to change are saved to the recovery
file. If an encrypted file is opened with the flag set, a recovery process
reverts partial changes using the recovery file, restoring the last known good
state. The "shadow" recovery file is cleaned up on file close. Note that
enabling this feature can impact performance due to additional writes to the
shadow file on each flush.

- There is no key rotation scheme. The application must perform key rotation of
the KDK by itself (by overwriting the ``/dev/attestation/keys/``
Expand Down
1,074 changes: 1,073 additions & 1 deletion Documentation/img/encfiles/02_encfiles_representation.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,244 changes: 1,243 additions & 1 deletion Documentation/img/encfiles/04_encfiles_write_less3k.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,242 changes: 1,241 additions & 1 deletion Documentation/img/encfiles/05_encfiles_read_less3k.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,357 changes: 1,356 additions & 1 deletion Documentation/img/encfiles/06_encfiles_write_greater3k.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,321 changes: 1,320 additions & 1 deletion Documentation/img/encfiles/08_encfiles_read_greater3k.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 7 additions & 1 deletion Documentation/manifest-syntax.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1088,7 +1088,7 @@ Encrypted files
::

fs.mounts = [
{ type = "encrypted", path = "[PATH]", uri = "[URI]", key_name = "[KEY_NAME]" },
{ type = "encrypted", path = "[PATH]", uri = "[URI]", key_name = "[KEY_NAME]", enable_recovery = [true|false] },
]

fs.insecure__keys.[KEY_NAME] = "[32-character hex value]"
Expand Down Expand Up @@ -1154,6 +1154,12 @@ Gramine:
in the application is insecure. If you need to derive encryption keys from
such a "doubly-used" key, you must apply a KDF.

The ``enable_recovery`` mount parameter (default: ``false``) determines whether
file recovery is enabled for the mount. This feature allows selective enabling
or disabling of recovery for different mounted files or directories. Note that
enabling this feature can negatively impact performance, as it writes to a
second shadow file for later recovery purposes on each flush.

.. _untrusted-shared-memory:

Untrusted shared memory
Expand Down
3 changes: 3 additions & 0 deletions Documentation/pal/host-abi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -372,3 +372,6 @@ random bits, to obtain an attestation report and quote, etc.

.. doxygenfunction:: PalFreeThenLazyReallocCommittedPages
:project: pal

.. doxygenfunction:: PalRecoverEncryptedFile
:project: pal
132 changes: 126 additions & 6 deletions common/src/protected_files/protected_files.c
Original file line number Diff line number Diff line change
Expand Up @@ -429,36 +429,134 @@ static bool ipf_update_metadata_node(pf_context_t* pf) {
return true;
}

static bool ipf_write_recovery_node(pf_context_t* pf, uint64_t physical_node_number,
const void* buffer, uint64_t offset) {
assert(pf->host_recovery_file_handle);

recovery_node_t recovery_node = { .physical_node_number = physical_node_number };
memcpy(recovery_node.bytes, buffer, sizeof(recovery_node.bytes));

pf_status_t status = g_cb_write(pf->host_recovery_file_handle, (void*)&recovery_node, offset,
sizeof(recovery_node));
if (PF_FAILURE(status)) {
pf->last_error = status;
return false;
}

return true;
}

static bool ipf_write_recovery_file(pf_context_t* pf) {
assert(pf->host_recovery_file_handle);

pf_status_t status = g_cb_truncate(pf->host_recovery_file_handle, 0);
if (PF_FAILURE(status)) {
pf->last_error = status;
return false;
}

void* node;
uint64_t offset = 0;
for (node = lruc_get_first(pf->cache); node != NULL; node = lruc_get_next(pf->cache)) {
file_node_t* file_node = (file_node_t*)node;
if (!file_node->need_writing)
continue;

if (!ipf_write_recovery_node(pf, file_node->physical_node_number, &file_node->encrypted,
offset))
return false;

offset += sizeof(recovery_node_t);
}

if (!ipf_write_recovery_node(pf, /*physical_node_number=*/1, &pf->root_mht_node.encrypted,
offset))
return false;

offset += sizeof(recovery_node_t);

if (!ipf_write_recovery_node(pf, /*physical_node_number=*/0, &pf->metadata_node, offset))
return false;

return true;
}

static bool ipf_set_pending_write(pf_context_t* pf) {
pf->metadata_node.plaintext_part.has_pending_write = 1;
bool ret = ipf_write_node(pf, /*physical_node_number=*/0, &pf->metadata_node);

/* Unset the `has_pending_write` in memory, which will be cleared on disk at the end of the
* flush when we write the metadata to disk. */
pf->metadata_node.plaintext_part.has_pending_write = 0;

return ret;
}

static bool ipf_clear_pending_write(pf_context_t* pf) {
assert(pf->metadata_node.plaintext_part.has_pending_write == 0);

if (!ipf_write_node(pf, /*physical_node_number=*/0, &pf->metadata_node))
return false;

pf_status_t status = g_cb_fsync(pf->host_file_handle);
if (PF_FAILURE(status)) {
pf->last_error = status;
return false;
}

return true;
}

static bool ipf_internal_flush(pf_context_t* pf) {
if (!pf->need_writing) {
DEBUG_PF("no need to write");
return true;
}

if (pf->metadata_decrypted.file_size > MD_USER_DATA_SIZE && pf->root_mht_node.need_writing) {
if (pf->host_recovery_file_handle) {
if (!ipf_write_recovery_file(pf)) {
pf->file_status = PF_STATUS_FLUSH_ERROR;
DEBUG_PF("failed to write changes to the recovery file");
goto recoverable_error;
}

if (!ipf_set_pending_write(pf)) {
pf->file_status = PF_STATUS_FLUSH_ERROR;
DEBUG_PF("failed to set the pending write flag");
goto recoverable_error;
}
}

if (!ipf_update_all_data_and_mht_nodes(pf)) {
// this is something that shouldn't happen, can't fix this...
pf->file_status = PF_STATUS_CRYPTO_ERROR;
DEBUG_PF("failed to update data and MHT nodes");
return false;
goto unrecoverable_error;
}
}

if (!ipf_update_metadata_node(pf)) {
// this is something that shouldn't happen, can't fix this...
pf->file_status = PF_STATUS_CRYPTO_ERROR;
DEBUG_PF("failed to update metadata node");
return false;
goto unrecoverable_error;
}

if (!ipf_write_all_changes_to_disk(pf)) {
pf->file_status = PF_STATUS_WRITE_TO_DISK_FAILED;
DEBUG_PF("failed to write changes to disk");
return false;
goto recoverable_error;
}

pf->need_writing = false;
return true;

unrecoverable_error:
if (pf->host_recovery_file_handle)
(void)ipf_clear_pending_write(pf);
recoverable_error:
return false;
}

static file_node_t* ipf_get_mht_node(pf_context_t* pf, uint64_t offset) {
Expand Down Expand Up @@ -751,6 +849,7 @@ static bool ipf_init_fields(pf_context_t* pf) {
ipf_init_root_mht(&pf->root_mht_node);

pf->host_file_handle = NULL;
pf->host_recovery_file_handle = NULL;
pf->need_writing = false;
pf->file_status = PF_STATUS_UNINITIALIZED;
pf->last_error = PF_STATUS_SUCCESS;
Expand Down Expand Up @@ -852,7 +951,8 @@ static void ipf_try_clear_error(pf_context_t* pf) {
}

static pf_context_t* ipf_open(const char* path, pf_file_mode_t mode, bool create, pf_handle_t file,
uint64_t real_size, const pf_key_t* kdk_key, pf_status_t* status) {
uint64_t real_size, const pf_key_t* kdk_key,
pf_handle_t recovery_file_handle, pf_status_t* status) {
*status = PF_STATUS_NO_MEMORY;
pf_context_t* pf = calloc(1, sizeof(*pf));

Expand Down Expand Up @@ -892,6 +992,8 @@ static pf_context_t* ipf_open(const char* path, pf_file_mode_t mode, bool create
pf->host_file_handle = file;
pf->mode = mode;

pf->host_recovery_file_handle = recovery_file_handle;

if (!create) {
if (!ipf_init_existing_file(pf, path))
goto out;
Expand Down Expand Up @@ -1126,12 +1228,14 @@ void pf_set_callbacks(pf_read_f read_f, pf_write_f write_f, pf_fsync_f fsync_f,
}

pf_status_t pf_open(pf_handle_t handle, const char* path, uint64_t underlying_size,
pf_file_mode_t mode, bool create, const pf_key_t* key, pf_context_t** context) {
pf_file_mode_t mode, bool create, const pf_key_t* key,
pf_handle_t recovery_file_handle, pf_context_t** context) {
if (!g_initialized)
return PF_STATUS_UNINITIALIZED;

pf_status_t status;
*context = ipf_open(path, mode, create, handle, underlying_size, key, &status);
*context = ipf_open(path, mode, create, handle, underlying_size, key, recovery_file_handle,
&status);
return status;
}

Expand Down Expand Up @@ -1297,3 +1401,19 @@ pf_status_t pf_flush(pf_context_t* pf) {

return PF_STATUS_SUCCESS;
}

pf_status_t pf_get_recovery_info(pf_context_t* pf, bool* out_recovery_needed,
size_t* out_node_size) {
if (out_recovery_needed) {
// read metadata node
if (!ipf_read_node(pf, /*physical_node_number=*/0, (uint8_t*)&pf->metadata_node))
return pf->last_error;

*out_recovery_needed = (pf->metadata_node.plaintext_part.has_pending_write == 1);
}

if (out_node_size)
*out_node_size = sizeof(((recovery_node_t*)0)->bytes);

return PF_STATUS_SUCCESS;
}
32 changes: 23 additions & 9 deletions common/src/protected_files/protected_files.h
Original file line number Diff line number Diff line change
Expand Up @@ -212,19 +212,21 @@ const char* pf_strerror(int err);
/*!
* \brief Open a protected file.
*
* \param handle Open underlying file handle.
* \param path Path to the file. If NULL and \p create is false, don't check path
* for validity.
* \param underlying_size Underlying file size.
* \param mode Access mode.
* \param create Overwrite file contents if true.
* \param key Wrap key.
* \param[out] context PF context for later calls.
* \param handle Open underlying file handle.
* \param path Path to the file. If NULL and \p create is false, don't check path
* for validity.
* \param underlying_size Underlying file size.
* \param mode Access mode.
* \param create Overwrite file contents if true.
* \param key Wrap key.
* \param recovery_file_handle (optional)Underlying recovery file handle.
* \param[out] context PF context for later calls.
*
* \returns PF status.
*/
pf_status_t pf_open(pf_handle_t handle, const char* path, uint64_t underlying_size,
pf_file_mode_t mode, bool create, const pf_key_t* key, pf_context_t** context);
pf_file_mode_t mode, bool create, const pf_key_t* key,
pf_handle_t recovery_file_handle, pf_context_t** context);

/*!
* \brief Close a protected file and commit all changes to disk.
Expand Down Expand Up @@ -302,3 +304,15 @@ pf_status_t pf_rename(pf_context_t* pf, const char* new_path);
* \returns PF status.
*/
pf_status_t pf_flush(pf_context_t* pf);

/*!
* \brief Get the recovery info of a PF.
*
* \param pf PF context.
* \param[out] out_recovery_needed (optional) Whether recovery is needed for \p pf.
* \param[out] out_node_size (optional) Size of the \p pf node.
*
* \returns PF status.
*/
pf_status_t pf_get_recovery_info(pf_context_t* pf, bool* out_recovery_needed,
size_t* out_node_size);
6 changes: 6 additions & 0 deletions common/src/protected_files/protected_files_format.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ typedef struct {
uint8_t minor_version;
pf_nonce_t metadata_key_nonce;
pf_mac_t metadata_mac; /* GCM mac */
uint8_t has_pending_write; /* flag for file recovery */
} metadata_plaintext_t;

typedef struct {
Expand Down Expand Up @@ -95,6 +96,11 @@ typedef struct {
} encrypted_node_t;
static_assert(sizeof(encrypted_node_t) == PF_NODE_SIZE, "sizeof(encrypted_node_t)");

typedef struct {
uint64_t physical_node_number;
uint8_t bytes[PF_NODE_SIZE];
} recovery_node_t;

static_assert(sizeof(mht_node_t) == sizeof(data_node_t), "sizes of MHT and data nodes differ");

// Data struct that wraps the 4KB encrypted-node buffer (bounce buffer) and the corresponding 4KB
Expand Down
3 changes: 3 additions & 0 deletions common/src/protected_files/protected_files_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ struct pf_context {
pf_file_mode_t mode; // read-only, write-only or read-write
bool need_writing; // whether file was modified and thus needs writing to storage

pf_handle_t host_recovery_file_handle; // opaque recovery file handle (e.g. PAL handle) used by
// callbacks

pf_status_t file_status; // PF_STATUS_SUCCESS, PF_STATUS_CRYPTO_ERROR, etc.
pf_status_t last_error; // FIXME: unclear why this is needed

Expand Down
6 changes: 6 additions & 0 deletions libos/include/libos_fs.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ struct libos_mount_params {

/* Key name (used by `chroot_encrypted` filesystem), or NULL if not applicable */
const char* key_name;

/* Whether to enable file recovery (used by `chroot_encrypted` filesystem), false if not
* applicable */
bool enable_recovery;
};

struct libos_fs_ops {
Expand Down Expand Up @@ -532,6 +536,8 @@ struct libos_mount {

void* data;

bool enable_recovery;

void* cpdata;
size_t cpsize;

Expand Down
Loading

0 comments on commit 00a90f3

Please sign in to comment.