|
| 1 | +# Data-flow Analysis |
| 2 | + |
| 3 | +If you work on the MIR, you will frequently come across various flavors of |
| 4 | +[data-flow analysis][wiki]. For example, `rustc` uses data-flow to find |
| 5 | +uninitialized variables, determine what variables are live across a generator |
| 6 | +`yield` statement, and compute which `Place`s are borrowed at a given point in |
| 7 | +the control-flow graph. |
| 8 | + |
| 9 | +Since data-flow analysis is such a fundamental concept in modern compilers, there |
| 10 | +are ample resources for those who are not yet familiar. [*Static |
| 11 | +Program Analysis*] by Anders Møller and Michael I. Schwartzbach is an |
| 12 | +incredible, freely available textbook. For those who prefer audiovisual |
| 13 | +learning, the Goethe University Frankfurt has published a series of short |
| 14 | +[youtube lectures][goethe] that are very approachable. |
| 15 | + |
| 16 | +The following sections will discuss the framework used to define and inspect |
| 17 | +data-flow analyses in `rustc`. They assume that the reader is familiar with |
| 18 | +common data-flow ideas such as [lattices], fixpoint, and transfer functions. |
| 19 | +Any of the resources listed above should give you enough background to |
| 20 | +understand what comes next. |
| 21 | + |
| 22 | +[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles |
| 23 | +[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2 |
| 24 | +[lattices]: https://en.wikipedia.org/wiki/Lattice_(order) |
| 25 | +[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/ |
| 26 | + |
| 27 | +## Inspecting the Results of a Data-flow Analysis |
| 28 | + |
| 29 | +Before we describe how to define a new data-flow analysis, let's inspect the |
| 30 | +results of an existing one. Once you have constructed an analysis, you must |
| 31 | +pass it to an `Engine`, which is capable of finding the fixpoint of |
| 32 | +your data-flow problem. Calling `iterate_to_fixpoint` will return a `Results`, |
| 33 | +which contains the fixpoint upon entry of each block. |
| 34 | + |
| 35 | +Once you have a `Results`, you can can inspect the data-flow state at fixpoint |
| 36 | +at any point in the CFG. If you only need the state at a few locations (e.g., |
| 37 | +each `Drop` terminator) use a [`ResultsCursor`]. If you need the state at *all* |
| 38 | +locations, a [`ResultsVisitor`] will be more efficient. |
| 39 | + |
| 40 | +``` |
| 41 | + Analysis |
| 42 | + | |
| 43 | + | into_engine(…) |
| 44 | + | |
| 45 | + Engine |
| 46 | + | |
| 47 | + | iterate_to_fixpoint() |
| 48 | + | |
| 49 | + Results |
| 50 | + / \ |
| 51 | + into_results_cursor(…) / \ visit(…) |
| 52 | + / \ |
| 53 | + ResultsCursor ResultsVisitor |
| 54 | +``` |
| 55 | + |
| 56 | +The following code example uses the `ResultsVisitor` method... |
| 57 | + |
| 58 | + |
| 59 | +```rust,ignore |
| 60 | +// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = BitSet<MyAnalysis::Idx>>`... |
| 61 | +let my_visitor = MyVisitor::new(); |
| 62 | +
|
| 63 | +// inspect the fixpoint state for every location within every block in RPO. |
| 64 | +let results = MyAnalysis() |
| 65 | + .into_engine(tcx, body, def_id) |
| 66 | + .iterate_to_fixpoint() |
| 67 | + .visit(body, traversal::reverse_postorder(body), my_visitor); |
| 68 | +``` |
| 69 | + |
| 70 | +and this code uses `ResultsCursor`. |
| 71 | + |
| 72 | +```rust,ignore |
| 73 | +let mut results = MyAnalysis() |
| 74 | + .into_engine(tcx, body, def_id) |
| 75 | + .iterate_to_fixpoint() |
| 76 | + .into_results_cursor(body); |
| 77 | +
|
| 78 | +// Inspect the fixpoint state immediately before each `Drop` terminator. |
| 79 | +for (bb, block) in body.basic_blocks().iter_enumerated() { |
| 80 | + if let TerminatorKind::Drop { .. } = block.terminator().kind { |
| 81 | + results.seek_before(body.terminator_loc(bb)); |
| 82 | + let state = results.get(); |
| 83 | +
|
| 84 | + println!("state before drop: {:#?}", state); |
| 85 | + } |
| 86 | +} |
| 87 | +``` |
| 88 | + |
| 89 | +[`ResultsCursor`]: # |
| 90 | +[`ResultsVisitor`]: # |
| 91 | + |
| 92 | +## Defining a New Data-flow Analysis |
| 93 | + |
| 94 | +### Domain |
| 95 | + |
| 96 | +A data-flow analysis has two defining characteristics. First is the domain upon |
| 97 | +which the analysis is defined, also known as the data-flow lattice. For |
| 98 | +example, the domain of the [`MaybeInitializedPlaces`] analysis is the set–or, |
| 99 | +more formally, the powerset lattice–of all move paths that are used in a |
| 100 | +function. For now, the MIR data-flow framework only supports analyses whose |
| 101 | +domain is the powerset lattice of some monotonic index, such as a `MovePathIndex` |
| 102 | +or a `Local`. |
| 103 | + |
| 104 | +The [`AnalysisDomain`] and [`BottomValue`] traits define the domain of a data-flow |
| 105 | +analysis. `BottomValue` determines the initial value of the data-flow state for |
| 106 | +each basic block, either the empty set (if `BOTTOM_VALUE = false`) or the full |
| 107 | +set (if `BOTTOM_VALUE = true`). This also specifies the default lattice join |
| 108 | +operator, union (if `BOTTOM_VALUE = false`) or intersection (if `BOTTOM_VALUE = |
| 109 | +true`). This is because the initial value of the entry state of each block is |
| 110 | +joined with the exit state of its predecessors. For example, if the initial |
| 111 | +value of the data-flow state is the empty set but intersection is used as the |
| 112 | +join operator, the entry state will never change since ∅ ∩ A = ∅ for all A. |
| 113 | + |
| 114 | +`AnalysisDomain` defines the index type that serves as the element of the |
| 115 | +data-flow state. It is also responsible for initalizing the data-flow state for |
| 116 | +the `START_BLOCK`. For example, |
| 117 | +`MaybeInitializedPlaces::initialize_start_block` marks move paths |
| 118 | +corresponding to the parameters of a function as initialized. |
| 119 | + |
| 120 | +[`MaybeInitializedPlaces`]: # |
| 121 | +[`BottomValue`]: # |
| 122 | +[`AnalysisDomain`]: # |
| 123 | + |
| 124 | +### Transfer Function |
| 125 | + |
| 126 | +The second characteristic of a data-flow analysis is its transfer function. |
| 127 | +This describes how the data-flow state changes as a program is executed. For |
| 128 | +the MIR, the transfer function of each basic block is comprised of the |
| 129 | +effects of each individual statement followed by the effect of the terminator. |
| 130 | +For example, in `MaybeInitializedPlaces`, the statement effect for an |
| 131 | +assignment marks its destination as initialized. |
| 132 | + |
| 133 | +A transfer function is defined for each statement and terminator via the |
| 134 | +`Analysis::effect` methods. When called in sequence, these comprise the |
| 135 | +transfer function for the entire basic block. Try to avoid using the |
| 136 | +`before` variants of the effect methods. Unlike the unprefixed variants, their |
| 137 | +effect on a given statement will be applied when `seek_before` is called with |
| 138 | +that statement as the target location. Instead, use `seek_after` or |
| 139 | +`visit_statement_exit` when inspecting the results. |
| 140 | + |
| 141 | +#### Gen-kill data-flow problems |
| 142 | + |
| 143 | +[Gen-kill] problems (also known as bit vector problems) are a certain class of |
| 144 | +data-flow analyses whose domain is a powerset lattice and whose transfer |
| 145 | +function only inserts or removes specific elements from the state vector. This |
| 146 | +class of analyses is guaranteed to converge quickly, since we can use more |
| 147 | +efficient approach when iterating to fixpoint. If your analysis can be defined |
| 148 | +using only `gen` and `kill` operations, it probably should be. |
| 149 | + |
| 150 | +[`GenKillAnalysis`] defines the transfer function for such analyses. Unlike the |
| 151 | +[`Analysis`] trait, which can mutate the state vector directly. A |
| 152 | +`GenKillAnalysis` only has access to a generic type that implements the |
| 153 | +[`GenKill`] interface. |
| 154 | + |
| 155 | + |
| 156 | +[`GenKillAnalysis`]: # |
| 157 | +[`Analysis`]: # |
| 158 | +[`GenKill`]: # |
| 159 | +[Gen-kill]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems |
0 commit comments