|
| 1 | +# `VALUE`s in C extensions |
| 2 | + |
| 3 | +## Semantics on MRI |
| 4 | + |
| 5 | +Before we discuss the mechanisms used to represent MRI's `VALUE` |
| 6 | +semantics we should outline what those are. A `VALUE`in a local |
| 7 | +variable (i.e. on the stack) will keep the associated object alive as |
| 8 | +long as that stack entry lasts (so either until the function exits, or |
| 9 | +until that variable is no longer live). We can also wrap C structures |
| 10 | +in Ruby objects, and when we do this we're able to specify a marking |
| 11 | +function. This marking function is used by MRI's garbage collector to |
| 12 | +find all the objects reachable from the structure, and allows it to |
| 13 | +mark them in the same way it would with normal instance |
| 14 | +variables. There are also a couple of utility methods and macros for |
| 15 | +keeping a value alive for the duration of a function call even if it |
| 16 | +is no longer being held in a variable, and for globally preserving a |
| 17 | +value held in a static variable. |
| 18 | + |
| 19 | +Because `VALUE`s are essentially tagged pointers on MRI there are also |
| 20 | +some semantics that may be obvious but are worth stating anyway: |
| 21 | + |
| 22 | +* Any two `VALUE`s associated with the same object will be |
| 23 | + identical. In other words as long as an object is alive its `VALUE` |
| 24 | + will remain constant. |
| 25 | +* A `VALUE` for a live object can reuse the same tagged pointer that |
| 26 | + was previously used for a now dead object. |
| 27 | + |
| 28 | +## Emulating the semantics in TruffleRuby |
| 29 | + |
| 30 | +Emulating these semantics on TruffleRuby is non-trivial. Although we |
| 31 | +are running under a garbage collector it doesn't know that a `VALUE` |
| 32 | +maps to an object, and neither does it have any mechanism for |
| 33 | +specifying a custom mark function to be used with particular |
| 34 | +objects. As long as `VALUE`s can remain as `ValueWrapper` objects then |
| 35 | +we don't need to do much. Ruby objects maintain a strong reference to |
| 36 | +their associated `ValueWrapper`, and vice versa, so we only really |
| 37 | +need to consider situations where `VALUE`s are converted into native |
| 38 | +handles. |
| 39 | + |
| 40 | +### Keeping objects alive on the stack |
| 41 | + |
| 42 | +We implement an `ExtensionCallStack` object to keep track of various |
| 43 | +bits of useful information during a call to a C extension. Each stack |
| 44 | +entry contains a `preservedObject`, and an additional potential |
| 45 | +`preservedObjects` list which together will contain all the |
| 46 | +`ValueWrapper`s converted to native handles during the process of a |
| 47 | +call. When a new call is made a new `ExtensionCallStackEntry` is added |
| 48 | +to the stack, and when the call exits that entry is popped off again. |
| 49 | + |
| 50 | +### Keeping objects alive in structures |
| 51 | + |
| 52 | +We don't have a way to run markers when doing garbage collection, but |
| 53 | +we know we're keeping objects alive during the lifetime or a C call, |
| 54 | +and we can record when the structure is accessed via DATA_PTR (which |
| 55 | +should be required for the internal state of that structure to be |
| 56 | +mutated). To do this we keep a list of objects to be marked in a |
| 57 | +similar manner to the objects that should be kept alive, and when we |
| 58 | +exit the C call we'll call those markers. |
| 59 | + |
| 60 | +### Running mark functions |
| 61 | + |
| 62 | +We run markers by recording the object being marked on the extension |
| 63 | +stack, and then calling the marker which will in turn call |
| 64 | +`rb_gc_mark` for the individual `VALUE`s which are held by the |
| 65 | +structure. We'll record those marked objects in a temporary array also |
| 66 | +held on the extension stack, and then attach that to the object |
| 67 | +wrapping the struct when the mark function has finished. |
| 68 | + |
| 69 | + |
| 70 | +## Managing the conversion of `VALUE`s to and from native handles |
| 71 | + |
| 72 | +When converted to native, the `ValueWrapper` takes the following long values. |
| 73 | + |
| 74 | +| Represented Value | Handle Bits | Comments | |
| 75 | +|-------------------|-------------------------------------|----------| |
| 76 | +| false | 00000000 00000000 00000000 00000000 | | |
| 77 | +| true | 00000000 00000000 00000000 00000010 | | |
| 78 | +| nil | 00000000 00000000 00000000 00000100 | | |
| 79 | +| undefined | 00000000 00000000 00000000 00000110 | | |
| 80 | +| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 | |
| 81 | +| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map | |
| 82 | + |
| 83 | +The built in objects, `true`, `false`, `nil`, and `undefined` are |
| 84 | +handled specially, and integers are relatively easy because there is a |
| 85 | +well defined mapping from the native representation to the integer and |
| 86 | +vice versa, but to manage objects we need to do a little more work. |
| 87 | + |
| 88 | +When we convert an object `VALUE` to its native representation we need |
| 89 | +to keep the corresponding `ValueWrapper` object alive, and we need to |
| 90 | +record that mapping from handle to `ValueWrapper` somewhere. The |
| 91 | +mapping from `ValueWrapper` to handle must also be stable, so a symbol |
| 92 | +or other immutable object that can outlive a context will need to |
| 93 | +store that mapping somewhere on the `RubyLanguage` object. |
| 94 | + |
| 95 | +We achieve all this through a combination of handle block maps and |
| 96 | +allocators. We deal with handles in blocks of 4096, and the current |
| 97 | +`RubyFiber` holds onto a `HandleBlockHolder` which in turn holds the |
| 98 | +current block for mutable objects (which cannot outlive the |
| 99 | +`RubyContext`) and immutable objects (which can outlive the |
| 100 | +context). Each fiber will take values from those blocks until they |
| 101 | +becomes exhausted. When that block is exhausted then `RubyLanguage` |
| 102 | +holds a `HandleBlockAllocator` which is responsible for allocating new |
| 103 | +blocks and recycling old ones. These blocks of handles however only |
| 104 | +hold weak references, because we don't want a conversion to native to |
| 105 | +keep the `ValueWrapper` alive longer that it should. |
| 106 | + |
| 107 | +Conversely the `HandleBlock` _must_ live for as long as there are any |
| 108 | +reachable `ValueWrapper`s in that block, so a `ValueWrapper` keeps a |
| 109 | +strong reference to the `HandleBlock` it is in. |
0 commit comments