Skip to content

Commit 39d51fb

Browse files
committed
Document serialization in rustc
1 parent 88f57a5 commit 39d51fb

File tree

2 files changed

+165
-0
lines changed

2 files changed

+165
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656
- [Profiling Queries](./queries/profiling.md)
5757
- [Salsa](./salsa.md)
5858
- [Memory Management in Rustc](./memory.md)
59+
- [Serialization in Rustc](./serialization.md)
5960
- [Parallel Compilation](./parallel-rustc.md)
6061
- [Rustdoc](./rustdoc-internals.md)
6162

src/serialization.md

+164
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Serialization in Rustc
2+
3+
Rustc has to [serialize] and deserialize various data during compilation.
4+
Specifially:
5+
6+
- "Crate metadata", mainly query outputs, are serialized in a binary
7+
format into `rlib` and `rmeta` files that are output when compiling a library
8+
crate, these are then deserialized by crates that depend on that library.
9+
- Certain query outputs are serialized in a binary format to
10+
[persist incremental compilation results].
11+
- The `-Z ast-json` and `-Z ast-json-noexpand` flags serialize the [AST] to json
12+
and output the result to stdout.
13+
- [`CrateInfo`] is serialized to json when the `-Z no-link` flag is used, and
14+
deserialized from json when the `-Z link-only` flag is used.
15+
16+
## The `Encodable` and `Decodable` traits
17+
18+
The [`rustc_serialize`] crate defines two traits for types which can be serialized:
19+
20+
```rust
21+
pub trait Encodable<S: Encoder> {
22+
fn encode(&self, s: &mut S) -> Result<(), S::Error>;
23+
}
24+
25+
pub trait Decodable<D: Decoder>: Sized {
26+
fn decode(d: &mut D) -> Result<Self, D::Error>;
27+
}
28+
```
29+
30+
It also defines implementations of these for integer types, floating point
31+
types, `bool`, `char`, `str` and various common standard library types.
32+
33+
For types that are constructed from those types, `Encodable` and `Decodable` are
34+
usually implemented by [derives]. These generate implementations that forward
35+
deserialization to the fields of the struct or enum. For a struct those impls
36+
look something like this:
37+
38+
```rust
39+
# #![feature(rustc_private)]
40+
# extern crate rustc_serialize;
41+
# use rustc_serialize::{Decodable, Decoder, Encodable, Encoder};
42+
43+
struct MyStruct {
44+
int: u32,
45+
float: f32,
46+
}
47+
48+
impl<E: Encoder> Encodable<E> for MyStruct {
49+
fn encode(&self, s: &mut E) -> Result<(), E::Error> {
50+
s.emit_struct("MyStruct", 2, |s| {
51+
s.emit_struct_field("int", 0, |s| self.int.encode(s))?;
52+
s.emit_struct_field("float", 1, |s| self.float.encode(s))
53+
})
54+
}
55+
}
56+
impl<D: Decoder> Decodable<D> for MyStruct {
57+
fn decode(s: &mut D) -> Result<MyStruct, D::Error> {
58+
s.read_struct("MyStruct", 2, |d| {
59+
let int = d.read_struct_field("int", 0, Decodable::decode)?;
60+
let float = d.read_struct_field("float", 1, Decodable::decode)?;
61+
62+
Ok(MyStruct::new(int, float, SyntaxContext::root()))
63+
})
64+
}
65+
}
66+
```
67+
68+
## Encoding and Decoding arena allocated types
69+
70+
Rustc has a lot of [arena allocated types]. Deserializing these types isn't
71+
possible without access to the arena that they need to be allocated on. The
72+
[`TyDecoder`] and [`TyEncoder`] traits are supertraits of `Decoder` and
73+
`Encoder` that allow access to a `TyCtxt`.
74+
75+
Types which contain arena allocated types can then bound the type parameter of
76+
their `Encodable` and `Decodable` implementations with these traits. For
77+
example
78+
79+
```rust,ignore
80+
impl<'tcx, D: TyDecoder<'tcx>> Decodable<D> for MyStruct<'tcx> {
81+
/* ... */
82+
}
83+
```
84+
85+
The `TyEncodable` and `TyDecodable` [derive macros](derives) will expand to such
86+
an implementation.
87+
88+
Decoding the actual arena allocated type is harder, because some of the
89+
implementations can't be written due to the orphan rules. To work around this,
90+
the [`RefDecodable`] trait is defined in `rustc_middle`. This can then be
91+
implemented for any type. The `TyDecodable` macro will call `RefDecodable` to
92+
decode references, but various generic code needs types to actually be
93+
`Decodable` with a specific decoder.
94+
95+
For interned types instead of manually implementing `RefDecodable`, using a new
96+
type wrapper, like `ty::Predicate` and manually implementing `Encodable` and
97+
`Decodable` may be simpler.
98+
99+
## Derive macros
100+
101+
The `rustc_macros` crate defines various derives to help implement `Decodable`
102+
and `Encodable`.
103+
104+
- The `Encodable` and `Decodable` macros generate implementations that apply to
105+
all `Encoders` and `Decoders`. These should be used in crates that don't
106+
depend on `rustc_middle`, or that have to be serialized by a type that does
107+
not implement `TyEncoder`.
108+
- `MetadataEncodable` and `MetadataDecodable` generate implementations that
109+
only allow decoding by [`rustc_metadata::rmeta::encoder::EncodeContext`] and
110+
[`rustc_metadata::rmeta::decoder::DecodeContext`]. These are used for types
111+
that contain `rustc_metadata::rmeta::Lazy`.
112+
- `TyEncodable` and `TyDecoder` generate implementation that apply to any
113+
`TyEncoder` or `TyDecoder`. These should be used for types that are only
114+
serialized in crate metadata and/or the incremental cache, which is most
115+
serializable types in `rustc_middle`.
116+
117+
## Shorthands
118+
119+
`Ty` can be deeply recursive, if each `Ty` was encoded naively then crate
120+
metadata would be very large. To handle this, each `TyEncoder` has a cache of
121+
locations in its output where it has serialized types. If a type being encoded
122+
is in the cache, then instead of serializing the type as usual, the byte offset
123+
within the file being written is encoded instead. A similar scheme is used for
124+
`ty::Predicate`.
125+
126+
## `Lazy<T>`
127+
128+
Crate metadata is initially loaded before the `TyCtxt<'tcx>` is created, so
129+
some deserialization needs to be deferred from the initial loading of metadata.
130+
The [`Lazy<T>`] type wraps the (relative) offset in the crate metadata where a
131+
`T` has been serialized.
132+
133+
The `Lazy<[T]>` and `Lazy<Table<I, T>>` type provide some functionality over
134+
`Lazy<Vec<T>>` and `Lazy<HashMap<I, T>>`:
135+
136+
- It's possible to encode a `Lazy<[T]>` directly from an iterator, without
137+
first collecting into a `Vec<T>`.
138+
- Indexing into a `Lazy<Table<I, T>>` does not require decoding entries other
139+
than the one being read.
140+
141+
**note**: `Lazy<T>` does not cache its value after being deserialized the first
142+
time. Instead the query system is the main way of caching these results.
143+
144+
## Specialization
145+
146+
A few types, most notably `DefId`, need to have different implementations for
147+
different `Encoder`s. This is currently handled by ad-hoc specializations:
148+
`DefId` has a `default` implementation of `Encodable<E>` and a specialized one
149+
for `Encodable<CacheEncoder>`.
150+
151+
[arena allocated types]: memory.md
152+
[AST]: the-parser.md
153+
[derives]: #derive-macros
154+
[persist incremental compilation results]: queries/incremental-compilation-in-detail.md#the-real-world-how-persistence-makes-everything-complicated
155+
[serialize]: https://en.wikipedia.org/wiki/Serialization
156+
157+
[`CrateInfo`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/struct.CrateInfo.html
158+
[`Lazy<T>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.Lazy.html
159+
[`RefDecodable`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.RefDecodable.html
160+
[`rustc_metadata::rmeta::decoder::DecodeContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.DecodeContext.html
161+
[`rustc_metadata::rmeta::encoder::EncodeContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/encoder/struct.EncodeContext.html
162+
[`rustc_serialize`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_serialize/index.html
163+
[`TyDecoder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.TyEncoder.html
164+
[`TyEncoder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/codec/trait.TyDecoder.html

0 commit comments

Comments
 (0)