|
| 1 | +# Serialization in Rustc |
| 2 | + |
| 3 | +Rustc has to [serialize] and deserialize various data during compilation. |
| 4 | +Specifially: |
| 5 | + |
| 6 | +- "Crate metadata", mainly query outputs, are serialized in a binary |
| 7 | + format into `rlib` and `rmeta` files that are output when compiling a library |
| 8 | + crate, these are then deserialized by crates that depend on that library. |
| 9 | +- Certain query outputs are serialized in a binary format to |
| 10 | + [persist incremental compilation results]. |
| 11 | +- The `-Z ast-json` and `-Z ast-json-noexpand` flags serialize the [AST] to json |
| 12 | + and output the result to stdout. |
| 13 | +- [`CrateInfo`] is serialized to json when the `-Z no-link` flag is used, and |
| 14 | + deserialized from json when the `-Z link-only` flag is used. |
| 15 | + |
| 16 | +## The `Encodable` and `Decodable` traits |
| 17 | + |
| 18 | +The [`rustc_serialize`] crate defines two traits for types which can be serialized: |
| 19 | + |
| 20 | +```rust |
| 21 | +pub trait Encodable<S: Encoder> { |
| 22 | + fn encode(&self, s: &mut S) -> Result<(), S::Error>; |
| 23 | +} |
| 24 | + |
| 25 | +pub trait Decodable<D: Decoder>: Sized { |
| 26 | + fn decode(d: &mut D) -> Result<Self, D::Error>; |
| 27 | +} |
| 28 | +``` |
| 29 | + |
| 30 | +It also defines implementations of these for integer types, floating point |
| 31 | +types, `bool`, `char`, `str` and various common standard library types. |
| 32 | + |
| 33 | +For types that are constructed from those types, `Encodable` and `Decodable` are |
| 34 | +usually implemented by [derives]. These generate implementations that forward |
| 35 | +deserialization to the fields of the struct or enum. For a struct those impls |
| 36 | +look something like this: |
| 37 | + |
| 38 | +```rust |
| 39 | +# #![feature(rustc_private)] |
| 40 | +# extern crate rustc_serialize; |
| 41 | +# use rustc_serialize::{Decodable, Decoder, Encodable, Encoder}; |
| 42 | + |
| 43 | +struct MyStruct { |
| 44 | + int: u32, |
| 45 | + float: f32, |
| 46 | +} |
| 47 | + |
| 48 | +impl<E: Encoder> Encodable<E> for MyStruct { |
| 49 | + fn encode(&self, s: &mut E) -> Result<(), E::Error> { |
| 50 | + s.emit_struct("MyStruct", 2, |s| { |
| 51 | + s.emit_struct_field("int", 0, |s| self.int.encode(s))?; |
| 52 | + s.emit_struct_field("float", 1, |s| self.float.encode(s)) |
| 53 | + }) |
| 54 | + } |
| 55 | +} |
| 56 | +impl<D: Decoder> Decodable<D> for MyStruct { |
| 57 | + fn decode(s: &mut D) -> Result<MyStruct, D::Error> { |
| 58 | + s.read_struct("MyStruct", 2, |d| { |
| 59 | + let int = d.read_struct_field("int", 0, Decodable::decode)?; |
| 60 | + let float = d.read_struct_field("float", 1, Decodable::decode)?; |
| 61 | + |
| 62 | + Ok(MyStruct::new(int, float, SyntaxContext::root())) |
| 63 | + }) |
| 64 | + } |
| 65 | +} |
| 66 | +``` |
| 67 | + |
| 68 | +## Encoding and Decoding arena allocated types |
| 69 | + |
| 70 | +Rustc has a lot of [arena allocated types]. Deserializing these types isn't |
| 71 | +possible without access to the arena that they need to be allocated on. The |
| 72 | +[`TyDecoder`] and [`TyEncoder`] traits are supertraits of `Decoder` and |
| 73 | +`Encoder` that allow access to a `TyCtxt`. |
| 74 | + |
| 75 | +Types which contain arena allocated types can then bound the type parameter of |
| 76 | +their `Encodable` and `Decodable` implementations with these traits. For |
| 77 | +example |
| 78 | + |
| 79 | +```rust,ignore |
| 80 | +impl<'tcx, D: TyDecoder<'tcx>> Decodable<D> for MyStruct<'tcx> { |
| 81 | + /* ... */ |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +The `TyEncodable` and `TyDecodable` [derive macros](derives) will expand to such |
| 86 | +an implementation. |
| 87 | + |
| 88 | +Decoding the actual arena allocated type is harder, because some of the |
| 89 | +implementations can't be written due to the orphan rules. To work around this, |
| 90 | +the [`RefDecodable`] trait is defined in `rustc_middle`. This can then be |
| 91 | +implemented for any type. The `TyDecodable` macro will call `RefDecodable` to |
| 92 | +decode references, but various generic code needs types to actually be |
| 93 | +`Decodable` with a specific decoder. |
| 94 | + |
| 95 | +For interned types instead of manually implementing `RefDecodable`, using a new |
| 96 | +type wrapper, like `ty::Predicate` and manually implementing `Encodable` and |
| 97 | +`Decodable` may be simpler. |
| 98 | + |
| 99 | +## Derive macros |
| 100 | + |
| 101 | +The `rustc_macros` crate defines various derives to help implement `Decodable` |
| 102 | +and `Encodable`. |
| 103 | + |
| 104 | +- The `Encodable` and `Decodable` macros generate implementations that apply to |
| 105 | + all `Encoders` and `Decoders`. These should be used in crates that don't |
| 106 | + depend on `rustc_middle`, or that have to be serialized by a type that does |
| 107 | + not implement `TyEncoder`. |
| 108 | +- `MetadataEncodable` and `MetadataDecodable` generate implementations that |
| 109 | + only allow decoding by [`rustc_metadata::rmeta::encoder::EncodeContext`] and |
| 110 | + [`rustc_metadata::rmeta::decoder::DecodeContext`]. These are used for types |
| 111 | + that contain `rustc_metadata::rmeta::Lazy`. |
| 112 | +- `TyEncodable` and `TyDecoder` generate implementation that apply to any |
| 113 | + `TyEncoder` or `TyDecoder`. These should be used for types that are only |
| 114 | + serialized in crate metadata and/or the incremental cache, which is most |
| 115 | + serializable types in `rustc_middle`. |
| 116 | + |
| 117 | +## Shorthands |
| 118 | + |
| 119 | +`Ty` can be deeply recursive, if each `Ty` was encoded naively then crate |
| 120 | +metadata would be very large. To handle this, each `TyEncoder` has a cache of |
| 121 | +locations in its output where it has serialized types. If a type being encoded |
| 122 | +is in the cache, then instead of serializing the type as usual, the byte offset |
| 123 | +within the file being written is encoded instead. A similar scheme is used for |
| 124 | +`ty::Predicate`. |
| 125 | + |
| 126 | +## `Lazy<T>` |
| 127 | + |
| 128 | +Crate metadata is initially loaded before the `TyCtxt<'tcx>` is created, so |
| 129 | +some deserialization needs to be deferred from the initial loading of metadata. |
| 130 | +The [`Lazy<T>`] type wraps the (relative) offset in the crate metadata where a |
| 131 | +`T` has been serialized. |
| 132 | + |
| 133 | +The `Lazy<[T]>` and `Lazy<Table<I, T>>` type provide some functionality over |
| 134 | +`Lazy<Vec<T>>` and `Lazy<HashMap<I, T>>`: |
| 135 | + |
| 136 | +- It's possible to encode a `Lazy<[T]>` directly from an iterator, without |
| 137 | + first collecting into a `Vec<T>`. |
| 138 | +- Indexing into a `Lazy<Table<I, T>>` does not require decoding entries other |
| 139 | + than the one being read. |
| 140 | + |
| 141 | +**note**: `Lazy<T>` does not cache its value after being deserialized the first |
| 142 | +time. Instead the query system is the main way of caching these results. |
| 143 | + |
| 144 | +## Specialization |
| 145 | + |
| 146 | +A few types, most notably `DefId`, need to have different implementations for |
| 147 | +different `Encoder`s. This is currently handled by ad-hoc specializations: |
| 148 | +`DefId` has a `default` implementation of `Encodable<E>` and a specialized one |
| 149 | +for `Encodable<CacheEncoder>`. |
| 150 | + |
| 151 | +[arena allocated types]: memory.md |
| 152 | +[AST]: the-parser.md |
| 153 | +[derives]: #derive-macros |
| 154 | +[persist incremental compilation results]: queries/incremental-compilation-in-detail.md#the-real-world-how-persistence-makes-everything-complicated |
| 155 | +[serialize]: https://en.wikipedia.org/wiki/Serialization |
| 156 | + |
| 157 | +[`CrateInfo`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/struct.CrateInfo.html |
| 158 | +[`Lazy<T>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.Lazy.html |
| 159 | +[`RefDecodable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.RefDecodable.html |
| 160 | +[`rustc_metadata::rmeta::decoder::DecodeContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.DecodeContext.html |
| 161 | +[`rustc_metadata::rmeta::encoder::EncodeContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/encoder/struct.EncodeContext.html |
| 162 | +[`rustc_serialize`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_serialize/index.html |
| 163 | +[`TyDecoder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.TyEncoder.html |
| 164 | +[`TyEncoder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.TyDecoder.html |
0 commit comments