Skip to content

Commit bedc897

Browse files
committed
📖 Document RLE message index implementation
Provide details of how the RLE encoding is build and used to select callbacks to run for a given message.
1 parent 5308593 commit bedc897

File tree

3 files changed

+164
-5
lines changed

3 files changed

+164
-5
lines changed

docs/implementation_details.adoc

+153
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
2+
== Implementation Details
3+
4+
This section details some of the internal implementation details to assist contributors.
5+
The details here are not required to use the `cib` library.
6+
7+
=== Run Length Encoded Message Indices
8+
9+
To switch to using the RLE indices is as simple as converting your `msg::indexed_service` to a
10+
`msg::rle_indexed_service`.
11+
12+
The initial building of the mapping indices proceeds the same as
13+
the normal ones, where a series of entries in an index is generated
14+
and the callback that match are encoded into a `stdx::bitset`.
15+
16+
However, once this initial representation is built, we then take this and
17+
perform additional work (at compile time) to encode the bitsets as RLE
18+
data, and store in the index just an offset into the blog of RLE data
19+
rather than the bitset itself.
20+
21+
This is good for message maps that contain a large number of handlers as
22+
we trade off storage space for some decoding overhead.
23+
24+
Once encoded, the normal operation of the lookup process at run time
25+
proceeds and a set of candidate matches is collected, these are then
26+
_intersected_ from the RLE data and the final set of callbacks invoked
27+
without needing to materialise any of the underlying bitsets.
28+
29+
==== RLE Data Encoding
30+
31+
There are several options for encoding the bitset into an RLE pattern, many of which will result
32+
in smaller size, but a lot of bit-shifting to extract data. We have chosen to trade off encoded
33+
size for faster decoding, as it is likely the handling of the RLE data and index lookup will be
34+
in the critical path for system state changes.
35+
36+
The encoding chosen is simply the number of consecutive bits of `0`​s or `1`​s.
37+
38+
Specifics:
39+
40+
- The encoding runs from the least significant bit to most significant bit
41+
- The number of consecutive bits is stored as a `std::byte` and ranges `0...255`
42+
- The first byte of the encoding counts the number of `0` bits
43+
- If there are more than 255 consecutive identical bits, they can only be encoded in
44+
blocks of 255, and an additional 0 is needed to indicate zero opposite bits are needed.
45+
46+
[ditaa, format="svg", scale=1.5]
47+
----
48+
Bitset RLE Data
49+
/-------------+ +---+
50+
| 0b0000`0000 |--->| 8 |
51+
+-------------/ +---+
52+
53+
/-------------+ +---+---+
54+
| 0b0000`0001 |--->| 1 | 7 |
55+
+-------------/ +---+---+
56+
57+
/-------------+ +---+---+---+
58+
| 0b1000`0011 |--->| 2 | 5 | 1 |
59+
+-------------/ +---+---+---+
60+
61+
/-------------+ +---+---+---+---+
62+
| 0b1100`1110 |--->| 1 | 3 | 2 | 2 |
63+
+-------------/ +---+---+---+---+
64+
65+
66+
/------------------------------+ +---+---+-----+---+-----+---+-----+---+-----+
67+
| 1000 `0`s and one `1` in LSB |--->| 0 | 1 | 255 | 0 | 255 | 0 | 255 | 0 | 235 |
68+
+------------------------------/ +---+---+-----+---+-----+---+-----+---+-----+
69+
----
70+
71+
The `msg::rle_indexed_builder` will go through a process to take the indices and
72+
their bitset data and build a single blob of RLE encoded data for all indices, stored in
73+
and instance of a `msg::detail::rle_storage`. It also generates a set of
74+
`msg::detail::rle_index` entries for each of the index entries that maps the orignial bitmap
75+
to a location in the shared storage blob.
76+
77+
The `rle_storage` object contains a simple array of all RLE data bytes. The `rle_index`
78+
contains a simple offset into that array. We compute the smallest size that can contain the
79+
offset to avoid wasted storage and use that.
80+
81+
NOTE: The specific `rle_storage` and `rle_index`​s are locked together using a unique type
82+
so that the `rle_index` can not be used with the wrong `rle_storage` object.
83+
84+
When building the shared blog, the encoder will attempt to reduce the storage size by finding
85+
and reusing repeated patterns in the RLE data.
86+
87+
The final `msg::indexed_handler` contains an instance of the `msg::rle_indices` which contains
88+
both the storage and the maps referring to all the `rle_index` objects.
89+
90+
This means that the final compile time data generated consists of:
91+
92+
- The Message Map lookups as per the normal implementation, however they store a simple offset
93+
rather than a bitset.
94+
- The blog of all RLE bitset data for all indices in the message handling map
95+
96+
==== Runtime Handling
97+
98+
The `msg::indexed_handler` implementation will delegate the mapping call for an incoming
99+
message down to the `msg::rle_indices` implementation. It will further call into it's
100+
storage indices and match to the set of `rle_index` values for each mapping index.
101+
102+
This set of `rle_index` values (which are just offsets) are then converted to instances of
103+
a `msg::detail::rle_decoder` by the `rle_storage`. This converts the offset into a
104+
pointer to the sequence of `std::byte`​s for the RLE encoding.
105+
106+
All the collected `rle_decoders` from the various maps in the set of indices are then passed
107+
to an instance of the `msg::detail::rle_interset` object and returned from the `rle_indices`
108+
call operator.
109+
110+
The `rle_decoder` provides a single-use enumerator that will step over the groups of
111+
`0`​s or `1`​s, providing a way to advance through them by arbitrary increments.
112+
113+
The `rle_interset` implementation wraps the variadic set of `rle_decoder`​s so that
114+
the caller can iterate through all `1`​s, calling the appropriate callback as it goes.
115+
116+
===== Efficient Iteration of Bits
117+
118+
The `msg::detail::rle_decoder::chunk_enumerator` provides a way to step through the RLE
119+
data for the encoded bitset an arbitrary number of bits at a time. It does this by exposing
120+
the current number of bits of consecutive value.
121+
122+
This is presented so that it is possible to efficiently find:
123+
124+
- the longest run of `0`​s
125+
- or, if none, the shortest run of `1`​s.
126+
127+
Remember that we are trying to compute the intersection of all the encoded bitsets, so
128+
where all bitsets have a `1`, we call the associated callback, where any of the bitsets
129+
has a `0`, we skip that callback.
130+
131+
So the `chunk_enumerator` will return a signed 16 bit (at least) value indicating:
132+
133+
- *negative* value - the number of `0`​s
134+
- *positive* value - the number of `1`​s
135+
- *zero* when past the end (special case)
136+
137+
The `rle_intersect` will initialise an array of `rle_decoder::chunk_enumerators`
138+
when it is asked to run a lambda for each `1` bit using the `for_each()` method.
139+
140+
This list is then searched for the _minimum_ value of chunk size. This will either
141+
be the largest negative value, and so the longest run of `0`​s, or the smallest
142+
number of `1`​s, representing the next set of bits that are set in all bitsets.
143+
144+
The `for_each()` method will then advance past all the `0`​s, or execute the lambda
145+
for that many set bits, until it has consumed all bits in the encoded bitsets.
146+
147+
This means that the cost of intersection of `N` indices is a number of pointers and
148+
a small amount of state for tracking the current run of bits and their type for each index.
149+
150+
There is no need to materialise a full bitset at all. This can be quite a memory saving if
151+
there are a large number of callbacks. The trade-off, of course, is more complex iteration
152+
of bits to discover the callbacks to run.
153+

docs/index.adoc

+1
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ include::flows.adoc[]
1111
include::interrupts.adoc[]
1212
include::match.adoc[]
1313
include::message.adoc[]
14+
include::implementation_details.adoc[]

docs/message.adoc

+10-5
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ cib::service<my_service>->handle(my_message{"my field"_field = 0x80});
181181

182182
Notice in this case that our callback is defined with a matcher that always
183183
matches, but also that the field in `my_message` has a matcher that requires it
184-
to equal `0x80`. Therefore handling the following message will not call the
184+
to equal `0x80`. Therefore, handling the following message will not call the
185185
callback:
186186
[source,cpp]
187187
----
@@ -190,7 +190,7 @@ callback:
190190
cib::service<my_service>->handle(my_message{"my_field"_field = 0x81});
191191
----
192192

193-
NOTE: Because message view types are implicitly constructible from an owning
193+
NOTE: Because message view types are implicitly constructable from an owning
194194
message type, or from an appropriate `std::array`, it is possible to set up a
195195
service and handler that works with "raw data" in the form of a `std::array`,
196196
but whose callbacks and matchers take the appropriate message view types.
@@ -242,7 +242,12 @@ minimal effort at runtime.
242242
For each field in the `msg::index_spec`, we build a map from field values to
243243
bitsets, where the values in the bitsets represent callback indices.
244244

245-
NOTE: The bitsets may be run-length encoded: this is a work in progress.
245+
NOTE: The bitsets may be run-length encoded by using the `rle_indexed_service`
246+
inplace of the `indexed_service`. This may be useful if you have limited space
247+
and/or a large set of possible callbacks.
248+
See xref:implementation_details.adoc#run_length_encoded_message_indices[Run Length
249+
Encoding Implementation Details]
250+
246251

247252
Each `indexed_callback` has a matcher that may be an
248253
xref:match.adoc#_boolean_algebra_with_matchers[arbitrary Boolean matcher
@@ -433,7 +438,7 @@ as follows:
433438
- `and` together all the resulting bitsets (i.e. perform their set intersection).
434439

435440
This gives us the callbacks to be called. Each callback still has an associated
436-
matcher that may include field constraints that were already handled by the
441+
mpmatcher that may include field constraints that were already handled by the
437442
indexing, but may also include constraints on fields that were not indexed. With
438443
a little xref:match.adoc#_boolean_algebra_with_matchers[Boolean matcher
439444
manipulation], we can remove the fields that were indexed by setting them to
@@ -442,4 +447,4 @@ compile time.
442447

443448
For each callback, we now run the remaining matcher expression to deal with any
444449
unindexed but constrained fields, and call the callback if it passes. Bob's your
445-
uncle.
450+
uncle.

0 commit comments

Comments
 (0)