Skip to content

Commit 76cfbe8

Browse files
committed
[GR-36892] [GR-38140] Move marking to exit of C calls and make native handles weak refs.
PullRequest: truffleruby/3195
2 parents e24dc0c + b83e569 commit 76cfbe8

29 files changed

+513
-364
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ Changes:
5656
* Removed `Truffle::Interop.members_without_conversion` (use `Truffle::Interop.members` instead).
5757
* Refactored internals of `rb_sprintf` to simplify handling of `VALUE`s in common cases (@aardvark179).
5858
* Refactored sharing of array objects between threads using new `SharedArrayStorage` (@aardvark179).
59+
* Marking of native structures wrapped in objects is now done on C call exit to reduce memory overhead (@aardvark179).
5960

6061
# 22.2.0
6162

doc/contributor/cext-values.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# `VALUE`s in C extensions
2+
3+
## Semantics on MRI
4+
5+
Before we discuss the mechanisms used to represent MRI's `VALUE`
6+
semantics we should outline what those are. A `VALUE`in a local
7+
variable (i.e. on the stack) will keep the associated object alive as
8+
long as that stack entry lasts (so either until the function exits, or
9+
until that variable is no longer live). We can also wrap C structures
10+
in Ruby objects, and when we do this we're able to specify a marking
11+
function. This marking function is used by MRI's garbage collector to
12+
find all the objects reachable from the structure, and allows it to
13+
mark them in the same way it would with normal instance
14+
variables. There are also a couple of utility methods and macros for
15+
keeping a value alive for the duration of a function call even if it
16+
is no longer being held in a variable, and for globally preserving a
17+
value held in a static variable.
18+
19+
Because `VALUE`s are essentially tagged pointers on MRI there are also
20+
some semantics that may be obvious but are worth stating anyway:
21+
22+
* Any two `VALUE`s associated with the same object will be
23+
identical. In other words as long as an object is alive its `VALUE`
24+
will remain constant.
25+
* A `VALUE` for a live object can reuse the same tagged pointer that
26+
was previously used for a now dead object.
27+
28+
## Emulating the semantics in TruffleRuby
29+
30+
Emulating these semantics on TruffleRuby is non-trivial. Although we
31+
are running under a garbage collector it doesn't know that a `VALUE`
32+
maps to an object, and neither does it have any mechanism for
33+
specifying a custom mark function to be used with particular
34+
objects. As long as `VALUE`s can remain as `ValueWrapper` objects then
35+
we don't need to do much. Ruby objects maintain a strong reference to
36+
their associated `ValueWrapper`, and vice versa, so we only really
37+
need to consider situations where `VALUE`s are converted into native
38+
handles.
39+
40+
### Keeping objects alive on the stack
41+
42+
We implement an `ExtensionCallStack` object to keep track of various
43+
bits of useful information during a call to a C extension. Each stack
44+
entry contains a `preservedObject`, and an additional potential
45+
`preservedObjects` list which together will contain all the
46+
`ValueWrapper`s converted to native handles during the process of a
47+
call. When a new call is made a new `ExtensionCallStackEntry` is added
48+
to the stack, and when the call exits that entry is popped off again.
49+
50+
### Keeping objects alive in structures
51+
52+
We don't have a way to run markers when doing garbage collection, but
53+
we know we're keeping objects alive during the lifetime or a C call,
54+
and we can record when the structure is accessed via DATA_PTR (which
55+
should be required for the internal state of that structure to be
56+
mutated). To do this we keep a list of objects to be marked in a
57+
similar manner to the objects that should be kept alive, and when we
58+
exit the C call we'll call those markers.
59+
60+
### Running mark functions
61+
62+
We run markers by recording the object being marked on the extension
63+
stack, and then calling the marker which will in turn call
64+
`rb_gc_mark` for the individual `VALUE`s which are held by the
65+
structure. We'll record those marked objects in a temporary array also
66+
held on the extension stack, and then attach that to the object
67+
wrapping the struct when the mark function has finished.
68+
69+
70+
## Managing the conversion of `VALUE`s to and from native handles
71+
72+
When converted to native, the `ValueWrapper` takes the following long values.
73+
74+
| Represented Value | Handle Bits | Comments |
75+
|-------------------|-------------------------------------|----------|
76+
| false | 00000000 00000000 00000000 00000000 | |
77+
| true | 00000000 00000000 00000000 00000010 | |
78+
| nil | 00000000 00000000 00000000 00000100 | |
79+
| undefined | 00000000 00000000 00000000 00000110 | |
80+
| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 |
81+
| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map |
82+
83+
The built in objects, `true`, `false`, `nil`, and `undefined` are
84+
handled specially, and integers are relatively easy because there is a
85+
well defined mapping from the native representation to the integer and
86+
vice versa, but to manage objects we need to do a little more work.
87+
88+
When we convert an object `VALUE` to its native representation we need
89+
to keep the corresponding `ValueWrapper` object alive, and we need to
90+
record that mapping from handle to `ValueWrapper` somewhere. The
91+
mapping from `ValueWrapper` to handle must also be stable, so a symbol
92+
or other immutable object that can outlive a context will need to
93+
store that mapping somewhere on the `RubyLanguage` object.
94+
95+
We achieve all this through a combination of handle block maps and
96+
allocators. We deal with handles in blocks of 4096, and the current
97+
`RubyFiber` holds onto a `HandleBlockHolder` which in turn holds the
98+
current block for mutable objects (which cannot outlive the
99+
`RubyContext`) and immutable objects (which can outlive the
100+
context). Each fiber will take values from those blocks until they
101+
becomes exhausted. When that block is exhausted then `RubyLanguage`
102+
holds a `HandleBlockAllocator` which is responsible for allocating new
103+
blocks and recycling old ones. These blocks of handles however only
104+
hold weak references, because we don't want a conversion to native to
105+
keep the `ValueWrapper` alive longer that it should.
106+
107+
Conversely the `HandleBlock` _must_ live for as long as there are any
108+
reachable `ValueWrapper`s in that block, so a `ValueWrapper` keeps a
109+
strong reference to the `HandleBlock` it is in.

doc/contributor/cexts.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -125,18 +125,10 @@ not a `VALUE`.
125125

126126
See [polyglot.h](https://github.com/oracle/graal/blob/master/sulong/projects/com.oracle.truffle.llvm.libraries.graalvm.llvm/include/graalvm/llvm/polyglot.h) for documentation regarding the `polyglot_*` methods.
127127

128+
##### Native conversion
128129

129-
##### ValueWrapper Long Representation
130-
When converted to native, the `ValueWrapper` takes the following long values.
131-
132-
| Represented Value | Handle Bits | Comments |
133-
|-------------------|-------------------------------------|----------|
134-
| false | 00000000 00000000 00000000 00000000 | |
135-
| true | 00000000 00000000 00000000 00000010 | |
136-
| nil | 00000000 00000000 00000000 00000100 | |
137-
| undefined | 00000000 00000000 00000000 00000110 | |
138-
| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 |
139-
| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map |
130+
See [cext-values.md](cext-values.md) for documentation of the
131+
conversion and management of native handles.
140132

141133
### String pointers
142134

lib/cext/ABI_check.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
9
1+
10

lib/truffle/truffle/cext.rb

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -722,6 +722,10 @@ def rb_thread_alone
722722
Thread.list.count == 1 ? 1 : 0
723723
end
724724

725+
def rb_intern(str)
726+
Primitive.string_to_symbol(str, true)
727+
end
728+
725729
def rb_int_positive_pow(a, b)
726730
a ** b
727731
end
@@ -1456,6 +1460,12 @@ def rb_set_end_proc(func, data)
14561460
at_exit { Primitive.call_with_c_mutex_and_frame(func, [data], Primitive.caller_special_variables_if_available, nil) }
14571461
end
14581462

1463+
def define_marker(object, marker)
1464+
data_holder = Primitive.object_hidden_var_get object, DATA_HOLDER
1465+
Primitive.data_holder_set_marker(data_holder, marker)
1466+
Primitive.cext_mark_object_on_call_exit(object) unless Truffle::Interop.null?(marker)
1467+
end
1468+
14591469
def rb_data_object_wrap(ruby_class, data, mark, free)
14601470
ruby_class = Object unless ruby_class
14611471
object = ruby_class.__send__(:__layout_allocate__)
@@ -1464,7 +1474,7 @@ def rb_data_object_wrap(ruby_class, data, mark, free)
14641474

14651475
Primitive.object_space_define_data_finalizer object, free, data_holder unless Truffle::Interop.null?(free)
14661476

1467-
define_marker object, data_marker(mark, data_holder) unless Truffle::Interop.null?(mark)
1477+
define_marker object, mark
14681478

14691479
object
14701480
end
@@ -1479,19 +1489,22 @@ def rb_data_typed_object_wrap(ruby_class, data, data_type, mark, free, size)
14791489

14801490
Primitive.object_space_define_data_finalizer object, free, data_holder unless Truffle::Interop.null?(free)
14811491

1482-
define_marker object, data_marker(mark, data_holder) unless Truffle::Interop.null?(mark)
1492+
define_marker object, mark
1493+
14831494
object
14841495
end
14851496

1486-
def data_marker(mark, data_holder)
1487-
raise unless mark.respond_to?(:call)
1488-
proc { |obj|
1497+
def run_marker(obj)
1498+
Primitive.array_mark_store(obj) if Primitive.array_store_native?(obj)
1499+
1500+
data_holder = Primitive.object_hidden_var_get obj, DATA_HOLDER
1501+
mark = Primitive.data_holder_get_marker(data_holder)
1502+
unless Truffle::Interop.null?(mark)
14891503
create_mark_list(obj)
14901504
data = Primitive.data_holder_get_data(data_holder)
1491-
# This call is done without pushing a new frame as the marking service manages frames itself.
1492-
Primitive.call_with_c_mutex(mark, [data]) unless Truffle::Interop.null?(data)
1505+
mark.call(data) unless Truffle::Interop.null?(data)
14931506
set_mark_list_on_object(obj)
1494-
}
1507+
end
14951508
end
14961509

14971510
def data_sizer(sizer, data_holder)

lib/truffle/truffle/cext_structs.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ def RDATA_PTR(object)
2929
raise TypeError, "wrong argument type #{object.class} (expected T_DATA)"
3030
end
3131

32+
Primitive.cext_mark_object_on_call_exit(object) unless Truffle::Interop.null?(Primitive.data_holder_get_marker(data_holder))
3233
Primitive.data_holder_get_data(data_holder)
3334
end
3435

@@ -68,6 +69,7 @@ def polyglot_members(internal)
6869
def polyglot_read_member(name)
6970
case name
7071
when 'data'
72+
Primitive.cext_mark_object_on_call_exit(@object) unless Truffle::Interop.null?(Primitive.data_holder_get_marker(@data_holder))
7173
Primitive.data_holder_get_data(@data_holder)
7274
when 'type'
7375
type
@@ -294,6 +296,7 @@ def polyglot_pointer?
294296
end
295297

296298
def polyglot_as_pointer
299+
Primitive.cext_mark_object_on_call_exit(@array)
297300
Primitive.array_store_address(@array)
298301
end
299302

src/main/c/cext/string.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ VALUE rb_str_inspect(VALUE string) {
116116
}
117117

118118
ID rb_intern_str(VALUE string) {
119-
return SYM2ID(RUBY_INVOKE(string, "intern"));
119+
return SYM2ID(RUBY_CEXT_INVOKE("rb_intern", string));
120120
}
121121

122122
VALUE rb_str_cat(VALUE string, const char *to_concat, long length) {

src/main/c/cext/symbol.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@ ID rb_intern(const char *string) {
2424
}
2525

2626
ID rb_intern2(const char *string, long length) {
27-
return SYM2ID(RUBY_INVOKE(rb_tr_temporary_native_string(string, length, rb_ascii8bit_encoding()), "intern"));
27+
return SYM2ID(RUBY_CEXT_INVOKE("rb_intern", rb_tr_temporary_native_string(string, length, rb_ascii8bit_encoding())));
2828
}
2929

3030
ID rb_intern3(const char *name, long len, rb_encoding *enc) {
31-
return SYM2ID(RUBY_INVOKE(rb_tr_temporary_native_string(name, len, enc), "intern"));
31+
return SYM2ID(RUBY_CEXT_INVOKE("rb_intern", rb_tr_temporary_native_string(name, len, enc)));
3232
}
3333

3434
VALUE rb_sym2str(VALUE string) {

src/main/java/org/truffleruby/RubyContext.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ public RubyContext(RubyLanguage language, TruffleLanguage.Env env) {
189189
featureLoader = new FeatureLoader(this, language);
190190
referenceProcessor = new ReferenceProcessor(this);
191191
finalizationService = new FinalizationService(referenceProcessor);
192-
markingService = new MarkingService(referenceProcessor);
192+
markingService = new MarkingService();
193193
dataObjectFinalizationService = new DataObjectFinalizationService(language, referenceProcessor);
194194

195195
// We need to construct this at runtime

src/main/java/org/truffleruby/RubyLanguage.java

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,12 @@ public RubySymbol getSymbol(String string) {
383383

384384
@TruffleBoundary
385385
public RubySymbol getSymbol(AbstractTruffleString name, RubyEncoding encoding) {
386-
return symbolTable.getSymbol(name, encoding);
386+
return symbolTable.getSymbol(name, encoding, false);
387+
}
388+
389+
@TruffleBoundary
390+
public RubySymbol getSymbol(AbstractTruffleString name, RubyEncoding encoding, boolean preserveSymbol) {
391+
return symbolTable.getSymbol(name, encoding, preserveSymbol);
387392
}
388393

389394
public Assumption getTracingAssumption() {

0 commit comments

Comments
 (0)