|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "CHERIoT and Microvium" |
| 4 | +date: 2025-04-10 |
| 5 | +categories: javascript vm |
| 6 | +author: David Chisnall |
| 7 | +--- |
| 8 | + |
| 9 | +We've included a [port](https://github.com/CHERIoT-Platform/cheriot-rtos/tree/main/sdk/include/microvium) of the [Microvium](https://microvium.com) embedded JavaScript runtime. |
| 10 | +We originally did this port even before we open sourced the CHERIoT project |
| 11 | +We haven't talked about it much and that's something of an omission, since it is quite a nice case study in supporting a managed language on a CHERI platform |
| 12 | + |
| 13 | +# It Just Worked™ |
| 14 | + |
| 15 | +The first thing to note is that the initial 'port' didn't require any code changes. |
| 16 | +We were able to take the Microvium codebase, compile it, and run it in a compartment, unmodified. |
| 17 | + |
| 18 | +This is a nice result because language runtimes are traditionally some of the most difficult things to port to CHERI platforms. |
| 19 | +We don't get to take credit for that, it came from the fact that Microvium was written as portable C code. |
| 20 | +Language runtimes written for larger systems often do a lot of things that are tailored for specific operating systems or architectures. |
| 21 | + |
| 22 | +# Pointers are 15 bits! |
| 23 | + |
| 24 | +In some ways, Microvium is very similar to a classic Smalltalk-80 [Blue Book](http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf) implementation. |
| 25 | +Values are 16 bits and are either numbers or pointers, differentiated by a tag bit. |
| 26 | +This means that a pointer in Microvium is a 15-bit value. |
| 27 | +This can address up to 64 KiB of RAM (pointers refer to 16-bit words, not bytes). |
| 28 | + |
| 29 | +We didn't want to increase the memory size for the CHERI port. |
| 30 | +Going from 16-bit values to 64-bit ones would have quadrupled the memory consumption. |
| 31 | +Fortunately, there was no need to. |
| 32 | +Using 16-bit values on a platform with 64-bit capabilities to a 32-bit address space worked fine. |
| 33 | + |
| 34 | +Microvium has two modes for memory management. |
| 35 | +The first assumes that you are targeting a *really* tiny device and running bare metal. |
| 36 | +Here, you reserve a chunk of memory for the JavaScript heap and pointers are just added to that base address. |
| 37 | + |
| 38 | +Alternatively, for hosted environments, it allocates memory from some system-provided allocator. |
| 39 | +Pointers are now offsets within a linear address space composed from walking the list of chunks. |
| 40 | +This seems slow, but remember that Smalltalk-80 ran a complete interactive GUI with a similar amount of memory to a modern CHERIoT system but a processor around a thousandth the speed of a CHERIoT Ibex, so we can afford to waste a few cycles. |
| 41 | + |
| 42 | +# Making Microvium a library |
| 43 | + |
| 44 | +Microvium is designed for embedded targets and, in particular, for being able to instantiate multiple JavaScript VMs on a device. |
| 45 | +Each one needs a couple of hundred bytes of stack and global context, a similar amount of bytecode memory, and usually a KiB or so of heap (more for complex programs). |
| 46 | +On most systems, the code for the interpreter is shared between them and we wanted to be able to use Microvium in the same way. |
| 47 | + |
| 48 | +This required building Microvium as a *shared library*. |
| 49 | +Doing this at all required one code change in Microvium: adding a `MVM_EXPORT` macro to the functions exposed in the header file so that we could mark them with the `__cheriot_libcall` macro. |
| 50 | +This let us build the VM as a library. |
| 51 | +It wasn't quite enough to make it *work* as a library. |
| 52 | +The VM also needed to be able to allocate memory. |
| 53 | + |
| 54 | +On CHERIoT, the C `malloc` function is a wrapper around `heap_allocate`, which takes an explicit capability that authorises allocating against a quota. |
| 55 | +We needed a mechanism to pass this quota from the calling compartment down to the malloc functions. |
| 56 | +Microvium added a hook that allowed callers to pass a context parameter into the VM-creation function. |
| 57 | +This context value was then passed to the allocate and deallocate functions each time Microvium called them. |
| 58 | +Both of these changes [landed in the same PR upstream](https://github.com/coder-mike/microvium/pull/52). |
| 59 | + |
| 60 | +With this, we could build a single copy of the Microvium VM and share the code between multiple compartments. |
| 61 | + |
| 62 | +# Bounding pointers passed to C |
| 63 | + |
| 64 | +A few of the Microvium APIs expose pointers to C code. |
| 65 | +These originally spanned an entire Microvium heap slab and were read-write. |
| 66 | +We [added two hooks to allow ports to provide bounds and make the regions immutable](https://github.com/coder-mike/microvium/pull/80). |
| 67 | + |
| 68 | +With these two changes, if you pass a string from JavaScript to C (for example), the C code receives a read-only capability with the correct bounds. |
| 69 | +This gives you greater confidence that bugs in your FFI layer can't break type safety in the JavaScript code. |
| 70 | + |
| 71 | +# Temporal safety for C and JavaScript |
| 72 | + |
| 73 | +Microvium uses a copying garbage collector. |
| 74 | +Their implementation has one very nice property that makes it integrate with the CHERIoT temporal safety mechanism trivially: It does not move objects within a chunk. |
| 75 | + |
| 76 | +Microvium allocates memory from the system in chunks (ports can configure the size). |
| 77 | +The garbage collector finds live objects and copies them to *new* chunks and frees the old ones. |
| 78 | + |
| 79 | +This means that a pointer from C to JavaScript is always in one of three states: |
| 80 | + |
| 81 | + - It points to a live JavaScript object. |
| 82 | + - It points to a garbage (but not collected yet) JavaScript object. |
| 83 | + - It points to a deallocated chunk. |
| 84 | + |
| 85 | +In the first two cases, the pointer continues to point to a valid object and will work. |
| 86 | +In the third state, the chunk is gone and so the pointer's tag bit will be cleared (by the CHERIoT load filter and / or revoker), so attempts to access it from C/C++ will trap. |
| 87 | + |
| 88 | +# Lessons for other managed-languages on CHERI platforms |
| 89 | + |
| 90 | +Microvium happened to be exactly the right shape to make a CHERIoT port easy. |
| 91 | +It's optimised for low memory consumption at the expense of performance (the right trade for embedded devices, where CPU performance has increased at a rate far greater than memory size) and these choices avoided a lot of tricks that don't directly translate to CHERI platforms. |
| 92 | +The compressed-pointer representation meant that Microvium already had a notion of internal and host pointers as distinct things (something it shares with a lot of managed-language VMs), which is a convenient place to apply CHERI bounds and restrict permissions. |
| 93 | + |
| 94 | +Importantly, if you trust the implementation of your type-safe language, you don't need to make every pointer a capability *internally*. |
| 95 | +We kept 16-bit (15-bit + tag) pointers within the JavaScript interpreter, but we extended them to full capabilities at the boundary. |
| 96 | +This lets the VM provide type safety internally and the hardware provide it for FFI code. |
| 97 | +This is often the right approach for managed languages on CHERI, unless the VM is so complex that you want additional defence in depth from memory-safety bugs. |
| 98 | + |
| 99 | +CHERI systems can provide temporal safety for C and GC implementations that avoid memory reuse can be simply layered on top. |
| 100 | +On larger CHERI systems, GCs may be able to use the same underlying mechanisms as the C allocators but they'll have the same issues: you can't reuse memory immediately, until you're sure that C code hasn't reused it. |
| 101 | +This means that things like semispace compacting collectors (which eagerly reuse memory) are a problem, but mark-and-compact approaches that copy objects to new chunks are fine. |
| 102 | + |
| 103 | +This kind of integration was why I started working on CHERI 13 years ago: to be able to write code in safe languages, reuse the enormous amount of code available in C/C++, and not lose the safety properties of the safe language. |
| 104 | +It always makes me happy to see evidence that we've achieved this goal. |
0 commit comments