| 
1 | 1 | # HIR lowering  | 
 | 2 | + | 
 | 3 | +The HIR -- "High-level IR" -- is the primary IR used in most of  | 
 | 4 | +rustc. It is a desugared version of the "abstract syntax tree" (AST)  | 
 | 5 | +that is generated after parsing, macro expansion, and name resolution  | 
 | 6 | +have completed. Many parts of HIR resemble Rust surface syntax quite  | 
 | 7 | +closely, with the exception that some of Rust's expression forms have  | 
 | 8 | +been desugared away (as an example, `for` loops are converted into a  | 
 | 9 | +`loop` and do not appear in the HIR).  | 
 | 10 | + | 
 | 11 | +This chapter covers the main concepts of the HIR.  | 
 | 12 | + | 
 | 13 | +### Out-of-band storage and the `Crate` type  | 
 | 14 | + | 
 | 15 | +The top-level data-structure in the HIR is the `Crate`, which stores  | 
 | 16 | +the contents of the crate currently being compiled (we only ever  | 
 | 17 | +construct HIR for the current crate). Whereas in the AST the crate  | 
 | 18 | +data structure basically just contains the root module, the HIR  | 
 | 19 | +`Crate` structure contains a number of maps and other things that  | 
 | 20 | +serve to organize the content of the crate for easier access.  | 
 | 21 | + | 
 | 22 | +For example, the contents of individual items (e.g., modules,  | 
 | 23 | +functions, traits, impls, etc) in the HIR are not immediately  | 
 | 24 | +accessible in the parents. So, for example, if had a module item `foo`  | 
 | 25 | +containing a function `bar()`:  | 
 | 26 | + | 
 | 27 | +```  | 
 | 28 | +mod foo {  | 
 | 29 | +  fn bar() { }  | 
 | 30 | +}  | 
 | 31 | +```  | 
 | 32 | + | 
 | 33 | +Then in the HIR the representation of module `foo` (the `Mod`  | 
 | 34 | +stuct) would have only the **`ItemId`** `I` of `bar()`. To get the  | 
 | 35 | +details of the function `bar()`, we would lookup `I` in the  | 
 | 36 | +`items` map.  | 
 | 37 | + | 
 | 38 | +One nice result from this representation is that one can iterate  | 
 | 39 | +over all items in the crate by iterating over the key-value pairs  | 
 | 40 | +in these maps (without the need to trawl through the IR in total).  | 
 | 41 | +There are similar maps for things like trait items and impl items,  | 
 | 42 | +as well as "bodies" (explained below).  | 
 | 43 | + | 
 | 44 | +The other reason to setup the representation this way is for better  | 
 | 45 | +integration with incremental compilation. This way, if you gain access  | 
 | 46 | +to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately  | 
 | 47 | +gain access to the contents of the function `bar()`. Instead, you only  | 
 | 48 | +gain access to the **id** for `bar()`, and you must invoke some  | 
 | 49 | +function to lookup the contents of `bar()` given its id; this gives us  | 
 | 50 | +a chance to observe that you accessed the data for `bar()` and record  | 
 | 51 | +the dependency.  | 
 | 52 | + | 
 | 53 | +### Identifiers in the HIR  | 
 | 54 | + | 
 | 55 | +Most of the code that has to deal with things in HIR tends not to  | 
 | 56 | +carry around references into the HIR, but rather to carry around  | 
 | 57 | +*identifier numbers* (or just "ids"). Right now, you will find four  | 
 | 58 | +sorts of identifiers in active use:  | 
 | 59 | + | 
 | 60 | +- `DefId`, which primarily names "definitions" or top-level items.  | 
 | 61 | +  - You can think of a `DefId` as being shorthand for a very explicit  | 
 | 62 | +    and complete path, like `std::collections::HashMap`. However,  | 
 | 63 | +    these paths are able to name things that are not nameable in  | 
 | 64 | +    normal Rust (e.g., impls), and they also include extra information  | 
 | 65 | +    about the crate (such as its version number, as two versions of  | 
 | 66 | +    the same crate can co-exist).  | 
 | 67 | +  - A `DefId` really consists of two parts, a `CrateNum` (which  | 
 | 68 | +    identifies the crate) and a `DefIndex` (which indixes into a list  | 
 | 69 | +    of items that is maintained per crate).  | 
 | 70 | +- `HirId`, which combines the index of a particular item with an  | 
 | 71 | +  offset within that item.  | 
 | 72 | +  - the key point of a `HirId` is that it is *relative* to some item (which is named  | 
 | 73 | +    via a `DefId`).  | 
 | 74 | +- `BodyId`, this is an absolute identifier that refers to a specific  | 
 | 75 | +  body (definition of a function or constant) in the crate. It is currently  | 
 | 76 | +  effectively a "newtype'd" `NodeId`.  | 
 | 77 | +- `NodeId`, which is an absolute id that identifies a single node in the HIR tree.  | 
 | 78 | +  - While these are still in common use, **they are being slowly phased out**.  | 
 | 79 | +  - Since they are absolute within the crate, adding a new node  | 
 | 80 | +    anywhere in the tree causes the node-ids of all subsequent code in  | 
 | 81 | +    the crate to change. This is terrible for incremental compilation,  | 
 | 82 | +    as you can perhaps imagine.  | 
 | 83 | + | 
 | 84 | +### HIR Map  | 
 | 85 | + | 
 | 86 | +Most of the time when you are working with the HIR, you will do so via  | 
 | 87 | +the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in  | 
 | 88 | +the `hir::map` module). The HIR map contains a number of methods to  | 
 | 89 | +convert between ids of various kinds and to lookup data associated  | 
 | 90 | +with a HIR node.  | 
 | 91 | + | 
 | 92 | +For example, if you have a `DefId`, and you would like to convert it  | 
 | 93 | +to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This  | 
 | 94 | +returns an `Option<NodeId>` -- this will be `None` if the def-id  | 
 | 95 | +refers to something outside of the current crate (since then it has no  | 
 | 96 | +HIR node), but otherwise returns `Some(n)` where `n` is the node-id of  | 
 | 97 | +the definition.  | 
 | 98 | + | 
 | 99 | +Similarly, you can use `tcx.hir.find(n)` to lookup the node for a  | 
 | 100 | +`NodeId`. This returns a `Option<Node<'tcx>>`, where `Node` is an enum  | 
 | 101 | +defined in the map; by matching on this you can find out what sort of  | 
 | 102 | +node the node-id referred to and also get a pointer to the data  | 
 | 103 | +itself. Often, you know what sort of node `n` is -- e.g., if you know  | 
 | 104 | +that `n` must be some HIR expression, you can do  | 
 | 105 | +`tcx.hir.expect_expr(n)`, which will extract and return the  | 
 | 106 | +`&hir::Expr`, panicking if `n` is not in fact an expression.  | 
 | 107 | + | 
 | 108 | +Finally, you can use the HIR map to find the parents of nodes, via  | 
 | 109 | +calls like `tcx.hir.get_parent_node(n)`.  | 
 | 110 | + | 
 | 111 | +### HIR Bodies  | 
 | 112 | + | 
 | 113 | +A **body** represents some kind of executable code, such as the body  | 
 | 114 | +of a function/closure or the definition of a constant. Bodies are  | 
 | 115 | +associated with an **owner**, which is typically some kind of item  | 
 | 116 | +(e.g., a `fn()` or `const`), but could also be a closure expression  | 
 | 117 | +(e.g., `|x, y| x + y`). You can use the HIR map to find the body  | 
 | 118 | +associated with a given def-id (`maybe_body_owned_by()`) or to find  | 
 | 119 | +the owner of a body (`body_owner_def_id()`).  | 
0 commit comments