diff --git a/hardware-isolation/README.md b/hardware-isolation/README.md deleted file mode 100644 index 0f09692..0000000 --- a/hardware-isolation/README.md +++ /dev/null @@ -1,17 +0,0 @@ -# Hardware Isolation - -## Virtualization - -### Intel VMX - -## Memory Protection - -### Pages and Segments - -### Protection Rings - -### Protection Keys - -### IOMMU - -## Enclaves and Trusted Execution Environments diff --git a/hardware-memory-isolation/README.md b/hardware-memory-isolation/README.md new file mode 100644 index 0000000..6fb7def --- /dev/null +++ b/hardware-memory-isolation/README.md @@ -0,0 +1,195 @@ +# Hardware Memory Isolation + +## Table of Contents + +## Prerequisites + +## Introduction + +Another role of the hardware, besides computing, is to provide isolation, mainly between different software components. +You have encountered (hopefully) the main protection mechanism that the hardware uses to ensure memory isolation: pages. +There are other less-known mechanisms for ensuring memory isolation, through which we will go this session: segments, privilege rings, memory protection keys. +We will also dive into virtualization, focusing on the hardware-assisted one. +You get to write a small hypervisor. + +## Memory Isolation + +### Pages and Segments + +The system needs a way to enforce ownership and permissons on the memory zones. +For example, it needs to enforce that a certain memory zone cand only be read and executed, not written. +How can you do this? +The first answer that the CPU designers had is called **segmentation**. + +#### Segmentation + +Segmentation is the x86 CPU feature that allows assigning permissions and ownership to a certain memory zone, using segments. +Today, the modern systems don't use segmentation anymore, when performing usual operations. +Segmentation is only required at the early stages of booting, when pages cannot be used. +But segmentation is still there, and it is tied to many other components of the system, so knowledge of how segmentation works is useful. +Let's start at the beginning of time, with something called the **Real mode**. + +##### Real Mode Memory Addressing + +At the beginning of time, x86 CPUs had only 16-bits registers. +This meant that the maximum memory size that could be used, with **flat memory addressing** model, was 64KB. +It was thought that this memory is enough, until it wasn't enough. +Instead of expanding the registers, the CPU manufacturers came up with the **segment memory addressing** model. +This meant that each memory address, instead of using one register, used 2: a normal register and a segment register, each fitting 16 bits. +The way the addresses were calculated was the following: +``` +addr = SEGMENT_REGISTER * 0x10 + OFFSET_REGISTER +``` +This led to a new memory maximum of almost 1MB, which was tought to be enough. +Notice that, at that time, a segment was just a number. +It didn't enforce any permissions on the memory zone, or ownership, because there was no user and kernel separation. +The operating system had absolute power, and there was nothing but the operating system. +Then came 32-bit registers, the need for applications, the need for isolation. +The **protected mode** was born. + +##### Protected Mode Memory Addressing + +Well, all those things didn't appear at the same time. +First, the registers were extended to 32 bits. +Using the segmentation model, each segment would now fit 4GB of memory. +This meant that a system could use 68GB of memory. +But that memory wasn't available. +you could barely reach 4GB of RAM. +So, the addressing model was switched to the flat one: a memory address would be composed of only one register. + +The segments got another role: enforce isolation (pages still weren't a thing). +A segment would not be a value used to compute an address, but an index in a table, the **GDT** (Global Descriptor Table). + +###### The GDT + +The GDT is a system table, that contains descriptors of memory zones: where it starts, where it ends, how it grows, is it readable, writeable, executable, who can access it. +Each entry in the GDT has 8 bytes in size, and it looks like this: + +TODO: insert GDT entry diagram + +Let's break each field down: + * Base: the address at which the segment begins + * Limit: the first wacky one; how many bytes or pages are contained in the segment + * Access: + TODO: insert Access Byte diagram + * P: Present - 1 if the segment is valid + * DPL: Descriptor privilege level, where 0 is the highest privilege level, and 3 the lowest + * S: Descriptor type; not interesting for us today + * E: Executable; if 0, the segment is a data one; if 1, the segment is a code one + * D/C: its significance depends on the **E** bit + * if E is 0, the field is Direction; not interesting for today + * if E is 1, the field is Conforming; + * if the Conforming bit is 1, the code in this segment can be executed by equal or lower privilege level code; + for example, code in a segment with DPL equal to 3, with C equal to 1, can be executed by code from a segment with DPL equal to 1 + * if the C bit is 0, the code in this segment can be executed only by code from segments with the same DPL + * R/W: another field depending on E + * if E is 0, the field is Writeable; data segments are always readable + * if E is 1, the field is Readable; code semgments are never writeable + * A: Accessed; not interesting + * Flags: + * G - Granularity: the Limit fields is in bytes (0), or pages (1)? + * DB - Size: if 1, the segment is a 32-bit one, else it is a 16-bit one + * L - Long-mode code: if 1, the segment is a 64-bit code one + TODO: insert Flags diagram + +As complicated as it looks, The GDT did its job of enforcing some memory protection, hence a CPU with a GDT and 32-bit addresses is in the **Protected Mode**. +This came at the cost of the programmer's sanity. +If you are wondering what the designers of this model were smoking, you are not alone. +Fortunately, in modern systems, **Base** and **Limit** are ignored. +A segment always covers the entire address space. +Does that mean that the entire address space is in both code and data segments? +Yes? +Then how do we ensure separation between executable and writeable memory zones? +Also, the entire memory is accesible by all privilege levels? +This doesn't seem right. +Enter pages. +But first, something about privilege levels, also known as privilege rings. + +##### Privilege Rings + +You may have heard about kernel-space and user-space. +How do we know if a memory zone belongs to the kernel-space, or the user-space? +That memory zone is part of a segment, or page, as we will see later, that belongs to either kernel or user. +The kernel-space is, in fact, any memory zone that belongs to **ring 0**, or DPL 0, and the user-space belongs to **ring 3**, or DPL 3. +What about the other rings, 1 and 2? +They can be used, but almost no one does it. +Some drivers use those rings, but it's not a common practice. + +At this point, things get weird. +What if you want software with higher privileges than the kernel, like a hypervisor? +You get ring -1. +But what if you want a piece of code that is run by the hardware in critical moments? +You get ring -2. +Ring -3? +Someone got there. +Fear not, we will explore these weird notions later. +For now, let's do something practical. + +##### Tutorial: Reading the GDT of the Linux Kernel + +Go to the [`read-gdt`](./activities/read-gdt/) folder. +There you have a simple kernel module that reads the GDT of the operating system, then prints each field. +Run `make` to build the module, then `sudo insmod read_gdt.ko` to insert the module. +By running `sudo dmesg` you should see 16 GDT entries listed, the total size of the GDT and the virtual address where it is placed. +Only 16 entries are listed, because the ones after that are null. +Take a look at the entries, and figure out what entries 1 to 6 represent. +You should find 3 kernel entries, and 3 user ones. +Notice that entries 0 and 7 are null. +Entry 0 should always be null. +Entries from 8 onward are TSS and LDT entries, which won't be detailed in today's session. + +Note that a special instruction, `sgdt` is used to retrieve the GDT pointer descriptor. +The opposite instruction is `lgdt`. + +#### Paging + +Soon enough, people got tired of dealing with segmentation; +a new method to divide the memory was needed. +Pages were born. +Unlike segments, that can be of any size, pages have fixed sizes: 4KB. +There are also the huge pages, that usually have 2MB, 4MB or 1GB. +Pages are organised hierarchically, in a tree-like structure. +A hardware component, called the MMU (Memory Management Unit) manages this structure. +We won't go into details about how that structure is organised, as to not transform this session into a Operating Systems design session. +What is important to know is that each page has permissions, that are checked by the MMU at every access. +The hardware doesn't, however, check if a memory page is accessed by the process that should be able to access it. +That is the role of the OS. + +#### Memory Protection Keys + +We have the following scenario: +an application wants to change an area of its memory from read-write to read-only, for reasons. +To do this it will call `mprotect` on that area. +What will happen behind the scenes will be that the OS will change permissions for each page that is part of the memory area, then it will flush the TLB. +This is costly time-wise. +As a solution, Intel proposed the MPK set of instructions, that can quickly change permissions for an area of memory of any size. +How does this work? +Up to the moment when MPK was proposed, page-table entries had 4 bits that weren't used. +These 4 bits are tranformed into 16 possible `keys`. +Furthermore, a register, `PKRU`, is added to hold the permissions for each of those keys, local to each thread. +This allows an application to allocate its pages to a `protection domain`. +When accessing a page, instead of checking only the page permissions, the MMU will also check the protection domain permissions. + +Let's take a practical example: +Application A has a page with read-write permissions. +It allocates a `protection domain` with read permissions, then adds the page to that protection domain. +When performing a write on that page, a Segmentation Fault will be received, because, even though the page has the right permissions, the protection domain does not. +Everything sounds nice, doesn't it? +Well, it is not. +The reason for this is that the instruction used to modify `PKRU` is unprivileged. +So, if an attacker gains the ability to execute arbitrary code, the whole mechanism can be bypassed. +Another problem is, as detailed by [this paper](https://arxiv.org/pdf/1811.07276v1.pdf), the fact that, after an application frees a protection domain, the key isn't deleted from the page-table entries. +So, if the same key is allocated again, it will still cover the previous pages, that should no longer be under a protection domain. +A classical example of `use-after-free`. +The final problem is that there are only 16 possible keys. +For the whole system. +A system that can run hundreds, if not thousands of processes, with many more threads. +You can see how this can go wrong. + +### Control-Flow Enforcement + +#### Invalid Jump Detection + +#### Hardware Shadow Stack + +### Intel MPX diff --git a/hardware-memory-isolation/activities/read-gdt/public/Makefile b/hardware-memory-isolation/activities/read-gdt/public/Makefile new file mode 100644 index 0000000..edc8c85 --- /dev/null +++ b/hardware-memory-isolation/activities/read-gdt/public/Makefile @@ -0,0 +1,7 @@ +obj-m += read_gdt.o + +all: + make -C /home/cristi/WSL2-Linux-Kernel M=$(shell pwd) modules + +clean: + make -C /home/cristi/WSL2-Linux-Kernel M=$(shell pwd) clean diff --git a/hardware-memory-isolation/activities/read-gdt/public/read_gdt.c b/hardware-memory-isolation/activities/read-gdt/public/read_gdt.c new file mode 100644 index 0000000..c817050 --- /dev/null +++ b/hardware-memory-isolation/activities/read-gdt/public/read_gdt.c @@ -0,0 +1,74 @@ +#include +#include +#include + +MODULE_DESCRIPTION("Read GDT Kernel Module"); +MODULE_LICENSE("GPL"); + +struct gdt_desc +{ + unsigned short size; + unsigned long address; +} __attribute__((packed)); + +struct gdt_entry +{ + unsigned short limit0; + unsigned short base0; + unsigned short base1: 8, a: 1, rw: 1, dc: 1, e: 1, s: 1, dpl: 2, p: 1; + unsigned short limit1: 4, res: 1, l: 1, d: 1, g: 1, base2: 8; +} __attribute__((packed)); + +struct gdt_system_entry +{ + unsigned short limit0; + unsigned short base0; + unsigned short base1: 8, type: 4, s: 1, dpl: 2, p: 1; + unsigned short limit1: 4, res: 1, l: 1, d: 1, g: 1, base2: 8; +} __attribute__((packed)); + +static void print_gdt_entry(struct gdt_entry *entry) +{ + pr_info("\tlimit0: %hu, limit1: %hu\n", entry->limit0, entry->limit1); + pr_info("\tbase0: %hu, base1: %hu, base2: %hu\n", entry->base0, entry->base1, entry->base2); + + if (entry->s) + pr_info("\te: %hu, dc: %hu, rw: %hu, a: %hu, s: %hu, dpl: %hu, p: %hu", + entry->e, entry->dc, entry->rw, entry->a, entry->s, entry->dpl, entry->p); + else + pr_info("\ttype: %hu, s: %hu, dpl: %hu, p: %hu", + ((struct gdt_system_entry *)entry)->type, entry->s, entry->dpl, entry->p); + + pr_info("\tl: %hu, d: %hu, g: %hu\n", entry->l, entry->d, entry->g); +} + +static int __init gdt_read_init(void) +{ + int i; + + struct gdt_desc desc; + struct gdt_entry *entries; + + asm volatile("sgdt %0" : "=m" (desc)); + + pr_info("GDT size: %hu", desc.size); + pr_info("GDT address: 0x%lx", desc.address); + + entries = (struct gdt_entry *)desc.address; + + for (i = 0; i < 16; i++) + { + pr_info("Entry number %d\n", i); + print_gdt_entry(entries + i); + } + + return 0; +} + +static void __exit gdt_read_exit(void) +{ + pr_debug("Bye\n"); +} + +module_init(gdt_read_init); +module_exit(gdt_read_exit);