Add session content

cristian-vijelie · cristian-vijelie · commit fc35c4c7f854 · 2023-06-25T11:59:11.000+03:00
diff --git a/hardware-isolation/README.md b/hardware-isolation/README.md
diff --git a/hardware-memory-isolation/README.md b/hardware-memory-isolation/README.md
@@ -0,0 +1,195 @@
+# Hardware Memory Isolation
+
+## Table of Contents
+
+## Prerequisites
+
+## Introduction
+
+Another role of the hardware, besides computing, is to provide isolation, mainly between different software components.
+You have encountered (hopefully) the main protection mechanism that the hardware uses to ensure memory isolation: pages.
+There are other less-known mechanisms for ensuring memory isolation, through which we will go this session: segments, privilege rings, memory protection keys.
+We will also dive into virtualization, focusing on the hardware-assisted one.
+You get to write a small hypervisor.
+
+## Memory Isolation
+
+### Pages and Segments
+
+The system needs a way to enforce ownership and permissons on the memory zones.
+For example, it needs to enforce that a certain memory zone cand only be read and executed, not written.
+How can you do this?
+The first answer that the CPU designers had is called **segmentation**.
+
+#### Segmentation
+
+Segmentation is the x86 CPU feature that allows assigning permissions and ownership to a certain memory zone, using segments.
+Today, the modern systems don't use segmentation anymore, when performing usual operations.
+Segmentation is only required at the early stages of booting, when pages cannot be used.
+But segmentation is still there, and it is tied to many other components of the system, so knowledge of how segmentation works is useful.
+Let's start at the beginning of time, with something called the **Real mode**.
+
+##### Real Mode Memory Addressing
+
+At the beginning of time, x86 CPUs had only 16-bits registers.
+This meant that the maximum memory size that could be used, with **flat memory addressing** model, was 64KB.
+It was thought that this memory is enough, until it wasn't enough.
+Instead of expanding the registers, the CPU manufacturers came up with the **segment memory addressing** model.
+This meant that each memory address, instead of using one register, used 2: a normal register and a segment register, each fitting 16 bits.
+The way the addresses were calculated was the following:
+```
+addr = SEGMENT_REGISTER * 0x10 + OFFSET_REGISTER
+```
+This led to a new memory maximum of almost 1MB, which was tought to be enough.
+Notice that, at that time, a segment was just a number.
+It didn't enforce any permissions on the memory zone, or ownership, because there was no user and kernel separation.
+The operating system had absolute power, and there was nothing but the operating system.
+Then came 32-bit registers, the need for applications, the need for isolation.
+The **protected mode** was born.
+
+##### Protected Mode Memory Addressing
+
+Well, all those things didn't appear at the same time.
+First, the registers were extended to 32 bits.
+Using the segmentation model, each segment would now fit 4GB of memory.
+This meant that a system could use 68GB of memory.
+But that memory wasn't available.
+you could barely reach 4GB of RAM.
+So, the addressing model was switched to the flat one: a memory address would be composed of only one register.
+
+The segments got another role: enforce isolation (pages still weren't a thing).
+A segment would not be a value used to compute an address, but an index in a table, the **GDT** (Global Descriptor Table).
+
+###### The GDT
+
+The GDT is a system table, that contains descriptors of memory zones: where it starts, where it ends, how it grows, is it readable, writeable, executable, who can access it.
+Each entry in the GDT has 8 bytes in size, and it looks like this:
+
+TODO: insert GDT entry diagram
+
+Let's break each field down:
+  * Base: the address at which the segment begins
+  * Limit: the first wacky one; how many bytes or pages are contained in the segment
+  * Access:
+    TODO: insert Access Byte diagram  
+    * P: Present - 1 if the segment is valid
+    * DPL: Descriptor privilege level, where 0 is the highest privilege level, and 3 the lowest
+    * S: Descriptor type; not interesting for us today
+    * E: Executable; if 0, the segment is a data one; if 1, the segment is a code one
+    * D/C: its significance depends on the **E** bit
+      * if E is 0, the field is Direction; not interesting for today
+      * if E is 1, the field is Conforming;
+        * if the Conforming bit is 1, the code in this segment can be executed by equal or lower privilege level code;
+          for example, code in a segment with DPL equal to 3, with C equal to 1, can be executed by code from a segment with DPL equal to 1
+        * if the C bit is 0, the code in this segment can be executed only by code from segments with the same DPL
+    * R/W: another field depending on E
+      * if E is 0, the field is Writeable; data segments are always readable
+      * if E is 1, the field is Readable; code semgments are never writeable
+    * A: Accessed; not interesting
+  * Flags:
+    * G - Granularity: the Limit fields is in bytes (0), or pages (1)?
+    * DB - Size: if 1, the segment is a 32-bit one, else it is a 16-bit one
+    * L - Long-mode code: if 1, the segment is a 64-bit code one
+    TODO: insert Flags diagram
+
+As complicated as it looks, The GDT did its job of enforcing some memory protection, hence a CPU with a GDT and 32-bit addresses is in the **Protected Mode**.
+This came at the cost of the programmer's sanity.
+If you are wondering what the designers of this model were smoking, you are not alone.
+Fortunately, in modern systems, **Base** and **Limit** are ignored.
+A segment always covers the entire address space.
+Does that mean that the entire address space is in both code and data segments?
+Yes?
+Then how do we ensure separation between executable and writeable memory zones?
+Also, the entire memory is accesible by all privilege levels?
+This doesn't seem right.
+Enter pages.
+But first, something about privilege levels, also known as privilege rings.
+
+##### Privilege Rings
+
+You may have heard about kernel-space and user-space.
+How do we know if a memory zone belongs to the kernel-space, or the user-space?
+That memory zone is part of a segment, or page, as we will see later, that belongs to either kernel or user.
+The kernel-space is, in fact, any memory zone that belongs to **ring 0**, or DPL 0, and the user-space belongs to **ring 3**, or DPL 3.
+What about the other rings, 1 and 2?
+They can be used, but almost no one does it.
+Some drivers use those rings, but it's not a common practice.
+
+At this point, things get weird.
+What if you want software with higher privileges than the kernel, like a hypervisor?
+You get ring -1.
+But what if you want a piece of code that is run by the hardware in critical moments?
+You get ring -2.
+Ring -3?
+Someone got there.
+Fear not, we will explore these weird notions later.
+For now, let's do something practical.
+
+##### Tutorial: Reading the GDT of the Linux Kernel
+
+Go to the [`read-gdt`](./activities/read-gdt/) folder.
+There you have a simple kernel module that reads the GDT of the operating system, then prints each field.
+Run `make` to build the module, then `sudo insmod read_gdt.ko` to insert the module.
+By running `sudo dmesg` you should see 16 GDT entries listed, the total size of the GDT and the virtual address where it is placed.
+Only 16 entries are listed, because the ones after that are null.
+Take a look at the entries, and figure out what entries 1 to 6 represent.
+You should find 3 kernel entries, and 3 user ones.
+Notice that entries 0 and 7 are null.
+Entry 0 should always be null.
+Entries from 8 onward are TSS and LDT entries, which won't be detailed in today's session.
+
+Note that a special instruction, `sgdt` is used to retrieve the GDT pointer descriptor.
+The opposite instruction is `lgdt`.
+
+#### Paging
+
+Soon enough, people got tired of dealing with segmentation;
+a new method to divide the memory was needed.
+Pages were born.
+Unlike segments, that can be of any size, pages have fixed sizes: 4KB.
+There are also the huge pages, that usually have 2MB, 4MB or 1GB.
+Pages are organised hierarchically, in a tree-like structure.
+A hardware component, called the MMU (Memory Management Unit) manages this structure.
+We won't go into details about how that structure is organised, as to not transform this session into a Operating Systems design session.
+What is important to know is that each page has permissions, that are checked by the MMU at every access.
+The hardware doesn't, however, check if a memory page is accessed by the process that should be able to access it.
+That is the role of the OS.
+
+#### Memory Protection Keys
+
+We have the following scenario:
+an application wants to change an area of its memory from read-write to read-only, for reasons.
+To do this it will call `mprotect` on that area.
+What will happen behind the scenes will be that the OS will change permissions for each page that is part of the memory area, then it will flush the TLB.
+This is costly time-wise.
+As a solution, Intel proposed the MPK set of instructions, that can quickly change permissions for an area of memory of any size.
+How does this work?
+Up to the moment when MPK was proposed, page-table entries had 4 bits that weren't used.
+These 4 bits are tranformed into 16 possible `keys`.
+Furthermore, a register, `PKRU`, is added to hold the permissions for each of those keys, local to each thread.
+This allows an application to allocate its pages to a `protection domain`.
+When accessing a page, instead of checking only the page permissions, the MMU will also check the protection domain permissions.
+
+Let's take a practical example:
+Application A has a page with read-write permissions.
+It allocates a `protection domain` with read permissions, then adds the page to that protection domain.
+When performing a write on that page, a Segmentation Fault will be received, because, even though the page has the right permissions, the protection domain does not.
+Everything sounds nice, doesn't it?
+Well, it is not.
+The reason for this is that the instruction used to modify `PKRU` is unprivileged.
+So, if an attacker gains the ability to execute arbitrary code, the whole mechanism can be bypassed.
+Another problem is, as detailed by [this paper](https://arxiv.org/pdf/1811.07276v1.pdf), the fact that, after an application frees a protection domain, the key isn't deleted from the page-table entries.
+So, if the same key is allocated again, it will still cover the previous pages, that should no longer be under a protection domain.
+A classical example of `use-after-free`.
+The final problem is that there are only 16 possible keys.
+For the whole system.
+A system that can run hundreds, if not thousands of processes, with many more threads.
+You can see how this can go wrong.
+
+### Control-Flow Enforcement
+
+#### Invalid Jump Detection
+
+#### Hardware Shadow Stack
+
+### Intel MPX
diff --git a/hardware-memory-isolation/activities/read-gdt/public/Makefile b/hardware-memory-isolation/activities/read-gdt/public/Makefile
@@ -0,0 +1,7 @@
+obj-m += read_gdt.o
+
+all:
+	make -C /home/cristi/WSL2-Linux-Kernel M=$(shell pwd) modules
+
+clean:
+	make -C /home/cristi/WSL2-Linux-Kernel M=$(shell pwd) clean
diff --git a/hardware-memory-isolation/activities/read-gdt/public/read_gdt.c b/hardware-memory-isolation/activities/read-gdt/public/read_gdt.c
@@ -0,0 +1,74 @@
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+MODULE_DESCRIPTION("Read GDT Kernel Module");
+MODULE_LICENSE("GPL");
+
+struct gdt_desc
+{
+        unsigned short  size;
+	unsigned long   address;
+} __attribute__((packed));
+
+struct gdt_entry
+{
+	unsigned short  limit0;
+	unsigned short	base0;
+	unsigned short	base1: 8, a: 1, rw: 1, dc: 1, e: 1, s: 1, dpl: 2, p: 1;
+	unsigned short	limit1: 4, res: 1, l: 1, d: 1, g: 1, base2: 8;
+} __attribute__((packed));
+
+struct gdt_system_entry
+{
+	unsigned short  limit0;
+	unsigned short	base0;
+	unsigned short	base1: 8, type: 4, s: 1, dpl: 2, p: 1;
+	unsigned short	limit1: 4, res: 1, l: 1, d: 1, g: 1, base2: 8;
+} __attribute__((packed));
+
+static void print_gdt_entry(struct gdt_entry *entry)
+{
+        pr_info("\tlimit0: %hu, limit1: %hu\n", entry->limit0, entry->limit1);
+        pr_info("\tbase0: %hu, base1: %hu, base2: %hu\n", entry->base0, entry->base1, entry->base2);
+
+        if (entry->s)
+                pr_info("\te: %hu, dc: %hu, rw: %hu, a: %hu, s: %hu, dpl: %hu, p: %hu",
+                        entry->e, entry->dc, entry->rw, entry->a, entry->s, entry->dpl, entry->p);
+        else
+                pr_info("\ttype: %hu, s: %hu, dpl: %hu, p: %hu", 
+                        ((struct gdt_system_entry *)entry)->type, entry->s, entry->dpl, entry->p);
+
+        pr_info("\tl: %hu, d: %hu, g: %hu\n", entry->l, entry->d, entry->g);
+}
+
+static int __init gdt_read_init(void)
+{
+        int i;
+
+        struct gdt_desc desc;
+        struct gdt_entry *entries;
+
+        asm volatile("sgdt %0" : "=m" (desc));
+
+        pr_info("GDT size: %hu", desc.size);
+        pr_info("GDT address: 0x%lx", desc.address);
+
+        entries = (struct gdt_entry *)desc.address;
+
+        for (i = 0; i < 16; i++)
+        {
+                pr_info("Entry number %d\n", i);
+                print_gdt_entry(entries + i);
+        }
+
+        return 0;
+}
+
+static void __exit gdt_read_exit(void)
+{
+        pr_debug("Bye\n");
+}
+
+module_init(gdt_read_init);
+module_exit(gdt_read_exit);