From de1c3e69b2ae73dff7ad444be0b556c95a22c84d Mon Sep 17 00:00:00 2001 From: Eric Blake Date: Fri, 23 Aug 2024 11:54:59 -0500 Subject: [PATCH 1/2] spec: Clarify what CreateVolume empty volume is The spec was silent on whether a just-created volume can read back garbage data left over in the storage from a previous use. There are likely applications that depend on a block volume reading as all zeroes. Empty filesystems are more likely to do what is desired (as mkfs is the most likely step to have happened during the creation), at which point the filesystem will not allow access to any uninitialized part of the underlying storage. But putting in an explicit statement that an empty block volume MUST read as zero is good for security reasons (one tenant cannot read the leftover garbage leftover by a previous tenant), even if it the Plugin has to take longer to actually ensure that scenario. More likely, Plugins can exploit hardware with efficient zeroing operations (for example, this is an extension in the SCSI spec but commonly available) or other techniques (any storage based by a filesystem has POSIX guarantees that the file started out reading as all zero; LVM technology has modes where it can guarantee that all allocations to a dynamcally sized LV starts life reading all zero). Even if we later add an optional mode to favor faster creation of block volumes without explicit zeroing, it should be a non-default opt-in setup (and only used by a CO that knows the block volume being allocated will only be used by software that doesn't read the uninitialized portions of the disk, such as when passing that block to mkfs). Signed-off-by: Eric Blake --- spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec.md b/spec.md index 0d1d33f2..81f79cca 100644 --- a/spec.md +++ b/spec.md @@ -789,7 +789,7 @@ Plugins MAY create 3 types of volumes: - From an existing volume. When plugin supports cloning, and reports the OPTIONAL capabilities `CREATE_DELETE_VOLUME` and `CLONE_VOLUME`. If CO requests a volume to be created from existing snapshot or volume and the requested size of the volume is larger than the original snapshotted (or cloned volume), the Plugin can either refuse such a call with `OUT_OF_RANGE` error or MUST provide a volume that, when presented to a workload by `NodePublish` call, has both the requested (larger) size and contains data from the snapshot (or original volume). -Explicitly, it's the responsibility of the Plugin to resize the filesystem of the newly created volume at (or before) the `NodePublish` call, if the volume has `VolumeCapability` access type `MountVolume` and the filesystem resize is required in order to provision the requested capacity. +Explicitly, it's the responsibility of the Plugin to resize the filesystem of the newly created volume at (or before) the `NodePublish` call, if the volume has `VolumeCapability` access type `MountVolume` and the filesystem resize is required in order to provision the requested capacity. Likewise, if an empty volume is created, the Plugin must ensure that an access type `BlockVolume` exposes all bytes to initially read as zero, while an access type `MountVolume` exposes a filesystem with no files pre-populated. ```protobuf message CreateVolumeRequest { From c72b656249e5c9cb664c1ce9638bdb87d53dc1d4 Mon Sep 17 00:00:00 2001 From: Eric Blake Date: Fri, 23 Aug 2024 12:23:58 -0500 Subject: [PATCH 2/2] spec: Allow opt-in for faster block CreateVolume Requiring a newly-created block to read as all zeroes is good security practice, but for some hardware, the time required to explicitly zero out the hardware can be lengthy. Enhance the spec with an option for a CO to request volume creation without regards to contents of the empty volume (safe if the volume will be handed to something that will in turn initialize it, such as mkfs, but risky if handed to something that will try to learn how the data was left by a previous tenant). Existing Plugins that ignore this field (and thereby always zero contents) are still compliant, but adding the field allows for some faster allocations, in a carefully controlled environment where the uninitialized storage is not setting up a data leak. Signed-off-by: Eric Blake --- spec.md | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/spec.md b/spec.md index 81f79cca..f9a98d5f 100644 --- a/spec.md +++ b/spec.md @@ -789,7 +789,7 @@ Plugins MAY create 3 types of volumes: - From an existing volume. When plugin supports cloning, and reports the OPTIONAL capabilities `CREATE_DELETE_VOLUME` and `CLONE_VOLUME`. If CO requests a volume to be created from existing snapshot or volume and the requested size of the volume is larger than the original snapshotted (or cloned volume), the Plugin can either refuse such a call with `OUT_OF_RANGE` error or MUST provide a volume that, when presented to a workload by `NodePublish` call, has both the requested (larger) size and contains data from the snapshot (or original volume). -Explicitly, it's the responsibility of the Plugin to resize the filesystem of the newly created volume at (or before) the `NodePublish` call, if the volume has `VolumeCapability` access type `MountVolume` and the filesystem resize is required in order to provision the requested capacity. Likewise, if an empty volume is created, the Plugin must ensure that an access type `BlockVolume` exposes all bytes to initially read as zero, while an access type `MountVolume` exposes a filesystem with no files pre-populated. +Explicitly, it's the responsibility of the Plugin to resize the filesystem of the newly created volume at (or before) the `NodePublish` call, if the volume has `VolumeCapability` access type `MountVolume` and the filesystem resize is required in order to provision the requested capacity. Likewise, if an empty volume is created, the Plugin must ensure that an access type `BlockVolume` exposes all bytes to initially read as zero (unless the wipe_mode was `UNINITIALIZED`), while an access type `MountVolume` exposes a filesystem with no files pre-populated. ```protobuf message CreateVolumeRequest { @@ -922,7 +922,23 @@ message CreateVolumeResponse { message VolumeCapability { // Indicate that the volume will be accessed via the block device API. message BlockVolume { - // Intentionally empty, for now. + enum WipeMode { + // The Plugin MUST ensure that all bytes of the volume initially + // read as zero. + ALL_ZEROES = 0; + + // Bytes in the volume may initially have unspecified contents; a CO + // that uses this mode for potentially faster creation times MUST + // ensure that the end use of the storage will not be confused by + // reading uninitialized data (do not use this option in a scenario + // where reading prior contents can constitue a security leak). + UNINITIALIZED = 1; + } + + // This field is OPTIONAL; providing it with a non-zero value in order + // to potentially speed up volume creation should only be attempted when + // the security risks have been analyzed. + WipeMode wipe_mode = 1; } // Indicate that the volume will be accessed via the filesystem API.