Skip to content

Commit 0fb8992

Browse files
Rough draft MIR dataflow
1 parent 39dd586 commit 0fb8992

File tree

2 files changed

+160
-0
lines changed

2 files changed

+160
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@
8282
- [MIR visitor and traversal](./mir/visitor.md)
8383
- [MIR passes: getting the MIR for a function](./mir/passes.md)
8484
- [MIR optimizations](./mir/optimizations.md)
85+
- [Data-flow Analysis](./mir/dataflow.md)
8586
- [Debugging](./mir/debugging.md)
8687
- [The borrow checker](./borrow_check.md)
8788
- [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)

src/mir/dataflow.md

+159
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# Data-flow Analysis
2+
3+
If you work on the MIR, you will frequently come across various flavors of
4+
[data-flow analysis][wiki]. For example, `rustc` uses data-flow to find
5+
uninitialized variables, determine what variables are live across a generator
6+
`yield` statement, and compute which `Place`s are borrowed at a given point in
7+
the control-flow graph.
8+
9+
Since data-flow analysis is such a fundamental concept in modern compilers, there
10+
are ample resources for those who are not yet familiar. [*Static
11+
Program Analysis*] by Anders Møller and Michael I. Schwartzbach is an
12+
incredible, freely available textbook. For those who prefer audiovisual
13+
learning, the Goethe University Frankfurt has published a series of short
14+
[youtube lectures][goethe] that are very approachable.
15+
16+
The following sections will discuss the framework used to define and inspect
17+
data-flow analyses in `rustc`. They assume that the reader is familiar with
18+
common data-flow ideas such as [lattices], fixpoint, and transfer functions.
19+
Any of the resources listed above should give you enough background to
20+
understand what comes next.
21+
22+
[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles
23+
[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2
24+
[lattices]: https://en.wikipedia.org/wiki/Lattice_(order)
25+
[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/
26+
27+
## Inspecting the Results of a Data-flow Analysis
28+
29+
Before we describe how to define a new data-flow analysis, let's inspect the
30+
results of an existing one. Once you have constructed an analysis, you must
31+
pass it to an `Engine`, which is capable of finding the fixpoint of
32+
your data-flow problem. Calling `iterate_to_fixpoint` will return a `Results`,
33+
which contains the fixpoint upon entry of each block.
34+
35+
Once you have a `Results`, you can can inspect the data-flow state at fixpoint
36+
at any point in the CFG. If you only need the state at a few locations (e.g.,
37+
each `Drop` terminator) use a [`ResultsCursor`]. If you need the state at *all*
38+
locations, a [`ResultsVisitor`] will be more efficient.
39+
40+
```
41+
Analysis
42+
|
43+
| into_engine(…)
44+
|
45+
Engine
46+
|
47+
| iterate_to_fixpoint()
48+
|
49+
Results
50+
/ \
51+
into_results_cursor(…) / \ visit(…)
52+
/ \
53+
ResultsCursor ResultsVisitor
54+
```
55+
56+
The following code example uses the `ResultsVisitor` method...
57+
58+
59+
```rust,ignore
60+
// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = BitSet<MyAnalysis::Idx>>`...
61+
let my_visitor = MyVisitor::new();
62+
63+
// inspect the fixpoint state for every location within every block in RPO.
64+
let results = MyAnalysis()
65+
.into_engine(tcx, body, def_id)
66+
.iterate_to_fixpoint()
67+
.visit(body, traversal::reverse_postorder(body), my_visitor);
68+
```
69+
70+
and this code uses `ResultsCursor`.
71+
72+
```rust,ignore
73+
let mut results = MyAnalysis()
74+
.into_engine(tcx, body, def_id)
75+
.iterate_to_fixpoint()
76+
.into_results_cursor(body);
77+
78+
// Inspect the fixpoint state immediately before each `Drop` terminator.
79+
for (bb, block) in body.basic_blocks().iter_enumerated() {
80+
if let TerminatorKind::Drop { .. } = block.terminator().kind {
81+
results.seek_before(body.terminator_loc(bb));
82+
let state = results.get();
83+
84+
println!("state before drop: {:#?}", state);
85+
}
86+
}
87+
```
88+
89+
[`ResultsCursor`]: #
90+
[`ResultsVisitor`]: #
91+
92+
## Defining a New Data-flow Analysis
93+
94+
### Domain
95+
96+
A data-flow analysis has two defining characteristics. First is the domain upon
97+
which the analysis is defined, also known as the data-flow lattice. For
98+
example, the domain of the [`MaybeInitializedPlaces`] analysis is the set–or,
99+
more formally, the powerset lattice–of all move paths that are used in a
100+
function. For now, the MIR data-flow framework only supports analyses whose
101+
domain is the powerset lattice of some monotonic index, such as a `MovePathIndex`
102+
or a `Local`.
103+
104+
The [`AnalysisDomain`] and [`BottomValue`] traits define the domain of a data-flow
105+
analysis. `BottomValue` determines the initial value of the data-flow state for
106+
each basic block, either the empty set (if `BOTTOM_VALUE = false`) or the full
107+
set (if `BOTTOM_VALUE = true`). This also specifies the default lattice join
108+
operator, union (if `BOTTOM_VALUE = false`) or intersection (if `BOTTOM_VALUE =
109+
true`). This is because the initial value of the entry state of each block is
110+
joined with the exit state of its predecessors. For example, if the initial
111+
value of the data-flow state is the empty set but intersection is used as the
112+
join operator, the entry state will never change since ∅ ∩ A = ∅ for all A.
113+
114+
`AnalysisDomain` defines the index type that serves as the element of the
115+
data-flow state. It is also responsible for initalizing the data-flow state for
116+
the `START_BLOCK`. For example,
117+
`MaybeInitializedPlaces::initialize_start_block` marks move paths
118+
corresponding to the parameters of a function as initialized.
119+
120+
[`MaybeInitializedPlaces`]: #
121+
[`BottomValue`]: #
122+
[`AnalysisDomain`]: #
123+
124+
### Transfer Function
125+
126+
The second characteristic of a data-flow analysis is its transfer function.
127+
This describes how the data-flow state changes as a program is executed. For
128+
the MIR, the transfer function of each basic block is comprised of the
129+
effects of each individual statement followed by the effect of the terminator.
130+
For example, in `MaybeInitializedPlaces`, the statement effect for an
131+
assignment marks its destination as initialized.
132+
133+
A transfer function is defined for each statement and terminator via the
134+
`Analysis::effect` methods. When called in sequence, these comprise the
135+
transfer function for the entire basic block. Try to avoid using the
136+
`before` variants of the effect methods. Unlike the unprefixed variants, their
137+
effect on a given statement will be applied when `seek_before` is called with
138+
that statement as the target location. Instead, use `seek_after` or
139+
`visit_statement_exit` when inspecting the results.
140+
141+
#### Gen-kill data-flow problems
142+
143+
[Gen-kill] problems (also known as bit vector problems) are a certain class of
144+
data-flow analyses whose domain is a powerset lattice and whose transfer
145+
function only inserts or removes specific elements from the state vector. This
146+
class of analyses is guaranteed to converge quickly, since we can use more
147+
efficient approach when iterating to fixpoint. If your analysis can be defined
148+
using only `gen` and `kill` operations, it probably should be.
149+
150+
[`GenKillAnalysis`] defines the transfer function for such analyses. Unlike the
151+
[`Analysis`] trait, which can mutate the state vector directly. A
152+
`GenKillAnalysis` only has access to a generic type that implements the
153+
[`GenKill`] interface.
154+
155+
156+
[`GenKillAnalysis`]: #
157+
[`Analysis`]: #
158+
[`GenKill`]: #
159+
[Gen-kill]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems

0 commit comments

Comments
 (0)