Skip to content

Commit 8cdaa16

Browse files
committed
Add long-overdue post about Microvium.
1 parent e4d71e4 commit 8cdaa16

File tree

1 file changed

+104
-0
lines changed

1 file changed

+104
-0
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
layout: post
3+
title: "CHERIoT and Microvium"
4+
date: 2025-04-10
5+
categories: javascript vm
6+
author: David Chisnall
7+
---
8+
9+
We've included a [port](https://github.com/CHERIoT-Platform/cheriot-rtos/tree/main/sdk/include/microvium) of the [Microvium](https://microvium.com) embedded JavaScript runtime.
10+
We originally did this port even before we open sourced the CHERIoT project
11+
We haven't talked about it much and that's something of an omission, since it is quite a nice case study in supporting a managed language on a CHERI platform
12+
13+
# It Just Worked™
14+
15+
The first thing to note is that the initial 'port' didn't require any code changes.
16+
We were able to take the Microvium codebase, compile it, and run it in a compartment, unmodified.
17+
18+
This is a nice result because language runtimes are traditionally some of the most difficult things to port to CHERI platforms.
19+
We don't get to take credit for that, it came from the fact that Microvium was written as portable C code.
20+
Language runtimes written for larger systems often do a lot of things that are tailored for specific operating systems or architectures.
21+
22+
# Pointers are 15 bits!
23+
24+
In some ways, Microvium is very similar to a classic Smalltalk-80 [Blue Book](http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf) implementation.
25+
Values are 16 bits and are either numbers or pointers, differentiated by a tag bit.
26+
This means that a pointer in Microvium is a 15-bit value.
27+
This can address up to 64 KiB of RAM (pointers refer to 16-bit words, not bytes).
28+
29+
We didn't want to increase the memory size for the CHERI port.
30+
Going from 16-bit values to 64-bit ones would have quadrupled the memory consumption.
31+
Fortunately, there was no need to.
32+
Using 16-bit values on a platform with 64-bit capabilities to a 32-bit address space worked fine.
33+
34+
Microvium has two modes for memory management.
35+
The first assumes that you are targeting a *really* tiny device and running bare metal.
36+
Here, you reserve a chunk of memory for the JavaScript heap and pointers are just added to that base address.
37+
38+
Alternatively, for hosted environments, it allocates memory from some system-provided allocator.
39+
Pointers are now offsets within a linear address space composed from walking the list of chunks.
40+
This seems slow, but remember that Smalltalk-80 ran a complete interactive GUI with a similar amount of memory to a modern CHERIoT system but a processor around a thousandth the speed of a CHERIoT Ibex, so we can afford to waste a few cycles.
41+
42+
# Making Microvium a library
43+
44+
Microvium is designed for embedded targets and, in particular, for being able to instantiate multiple JavaScript VMs on a device.
45+
Each one needs a couple of hundred bytes of stack and global context, a similar amount of bytecode memory, and usually a KiB or so of heap (more for complex programs).
46+
On most systems, the code for the interpreter is shared between them and we wanted to be able to use Microvium in the same way.
47+
48+
This required building Microvium as a *shared library*.
49+
Doing this at all required one code change in Microvium: adding a `MVM_EXPORT` macro to the functions exposed in the header file so that we could mark them with the `__cheriot_libcall` macro.
50+
This let us build the VM as a library.
51+
It wasn't quite enough to make it *work* as a library.
52+
The VM also needed to be able to allocate memory.
53+
54+
On CHERIoT, the C `malloc` function is a wrapper around `heap_allocate`, which takes an explicit capability that authorises allocating against a quota.
55+
We needed a mechanism to pass this quota from the calling compartment down to the malloc functions.
56+
Microvium added a hook that allowed callers to pass a context parameter into the VM-creation function.
57+
This context value was then passed to the allocate and deallocate functions each time Microvium called them.
58+
Both of these changes [landed in the same PR upstream](https://github.com/coder-mike/microvium/pull/52).
59+
60+
With this, we could build a single copy of the Microvium VM and share the code between multiple compartments.
61+
62+
# Bounding pointers passed to C
63+
64+
A few of the Microvium APIs expose pointers to C code.
65+
These originally spanned an entire Microvium heap slab and were read-write.
66+
We [added two hooks to allow ports to provide bounds and make the regions immutable](https://github.com/coder-mike/microvium/pull/80).
67+
68+
With these two changes, if you pass a string from JavaScript to C (for example), the C code receives a read-only capability with the correct bounds.
69+
This gives you greater confidence that bugs in your FFI layer can't break type safety in the JavaScript code.
70+
71+
# Temporal safety for C and JavaScript
72+
73+
Microvium uses a copying garbage collector.
74+
Their implementation has one very nice property that makes it integrate with the CHERIoT temporal safety mechanism trivially: It does not move objects within a chunk.
75+
76+
Microvium allocates memory from the system in chunks (ports can configure the size).
77+
The garbage collector finds live objects and copies them to *new* chunks and frees the old ones.
78+
79+
This means that a pointer from C to JavaScript is always in one of three states:
80+
81+
- It points to a live JavaScript object.
82+
- It points to a garbage (but not collected yet) JavaScript object.
83+
- It points to a deallocated chunk.
84+
85+
In the first two cases, the pointer continues to point to a valid object and will work.
86+
In the third state, the chunk is gone and so the pointer's tag bit will be cleared (by the CHERIoT load filter and / or revoker), so attempts to access it from C/C++ will trap.
87+
88+
# Lessons for other managed-languages on CHERI platforms
89+
90+
Microvium happened to be exactly the right shape to make a CHERIoT port easy.
91+
It's optimised for low memory consumption at the expense of performance (the right trade for embedded devices, where CPU performance has increased at a rate far greater than memory size) and these choices avoided a lot of tricks that don't directly translate to CHERI platforms.
92+
The compressed-pointer representation meant that Microvium already had a notion of internal and host pointers as distinct things (something it shares with a lot of managed-language VMs), which is a convenient place to apply CHERI bounds and restrict permissions.
93+
94+
Importantly, if you trust the implementation of your type-safe language, you don't need to make every pointer a capability *internally*.
95+
We kept 16-bit (15-bit + tag) pointers within the JavaScript interpreter, but we extended them to full capabilities at the boundary.
96+
This lets the VM provide type safety internally and the hardware provide it for FFI code.
97+
This is often the right approach for managed languages on CHERI, unless the VM is so complex that you want additional defence in depth from memory-safety bugs.
98+
99+
CHERI systems can provide temporal safety for C and GC implementations that avoid memory reuse can be simply layered on top.
100+
On larger CHERI systems, GCs may be able to use the same underlying mechanisms as the C allocators but they'll have the same issues: you can't reuse memory immediately, until you're sure that C code hasn't reused it.
101+
This means that things like semispace compacting collectors (which eagerly reuse memory) are a problem, but mark-and-compact approaches that copy objects to new chunks are fine.
102+
103+
This kind of integration was why I started working on CHERI 13 years ago: to be able to write code in safe languages, reuse the enormous amount of code available in C/C++, and not lose the safety properties of the safe language.
104+
It always makes me happy to see evidence that we've achieved this goal.

0 commit comments

Comments
 (0)