Skip to content

Commit 08dfba1

Browse files
RalfJungavadacatavra
authored andcommitted
[edit for readability] initial version of a Stacked Borrows spec (#64)
* initial version of a Stacked Borrows spec
1 parent 4d188f1 commit 08dfba1

File tree

1 file changed

+202
-0
lines changed

1 file changed

+202
-0
lines changed

wip/stacked-borrows.md

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Stacked Borrows
2+
3+
**Note:** This document is not normative nor endorsed by the UCG WG. It is maintained by @RalfJung to reflect what is currently implemented in [Miri].
4+
5+
For more background on the how and why of Stacked Borrows, see this [blog post on the first implementation][stacked-borrows-1] and this [follow-up][stacked-borrows-2].
6+
7+
[Miri]: https://github.com/solson/miri/
8+
[stacked-borrows-1]: https://www.ralfj.de/blog/2018/11/16/stacked-borrows-implementation.html
9+
[stacked-borrows-2]: https://www.ralfj.de/blog/2018/12/26/stacked-borrows-barriers.html
10+
11+
## Extra state
12+
13+
Stacked Borrows adds some extra state to the Rust abstract machine.
14+
Every pointer value has a *tag* (in addition to the location in memory that the pointer points to), and every memory location carries a *stack* (in addition to the byte of data stored at that location).
15+
Moreover, there is a per-call-frame `CallId` as well as some global tracking state.
16+
17+
```rust
18+
// `nat` is the type of mathematical natural numbers, meaning we don't want to think about overflow.
19+
// NOTE: Miri just uses `u64` which, realistically, will not overflow because we only ever increment by 1.
20+
type Timestamp = nat;
21+
22+
// Extra per-pointer state (the "tag")
23+
pub enum Borrow {
24+
Uniq(Timestamp),
25+
Shr(Option<Timestamp>),
26+
}
27+
28+
pub enum BorStackItem {
29+
Uniq(Timestamp),
30+
Shr,
31+
FnBarrier(CallId),
32+
}
33+
// Extra per-location state
34+
pub struct Stack {
35+
borrows: Vec<BorStackItem>, // used as a stack; never empty
36+
frozen_since: Option<Timestamp>, // virtual frozen "item" on top of the stack
37+
}
38+
39+
// Extra per-call-frame state
40+
type CallId = nat;
41+
42+
// Extra global state
43+
pub struct Tracking {
44+
clock: Timestamp,
45+
next_call: CallId,
46+
}
47+
```
48+
49+
These exist separately, i.e., when a pointer is stored in memory, then we both have a tag stored as part of this pointer value (remember, [bytes are more than `u8`](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html)), and every byte occupied by the pointer has a stack regulating access to this location.
50+
Also these two do not interact, i.e., when loading a pointer from memory, we just load the tag that was stored as part of this pointer.
51+
The stack of a location, and the tag of a pointer stored at some location, do not have any effect on each other.
52+
53+
## Retag statement
54+
55+
Stacked Borrows introduces a new operation (a new MIR statement) on the Rust abstract machine.
56+
*Retagging* operates on a place, and it also carries a flag indicating the kind of retag that is being performed:
57+
58+
```rust
59+
pub enum RetagKind {
60+
/// The initial retag when entering a function
61+
FnEntry,
62+
/// Retag preparing for a two-phase borrow
63+
TwoPhase,
64+
/// Retagging raw pointers
65+
Raw,
66+
/// A "normal" retag
67+
Default,
68+
}
69+
```
70+
71+
`Retag` is inserted into the MIR for the following situations:
72+
73+
* A retag happens after every assignment MIR statement where the assigned type may contain a reference type (checking recursively below compound types but not below references).
74+
This is usually a `Default` retag. However, if the RHS of this assignment is a `Ref` which allows two-phase borrows, then this is a `TwoPhase` retag.
75+
76+
Currently, if the LHS of the assignment involves a `Deref`, no `Retag` is inserted.
77+
That's just a limitation of the current implementation: after executing this assignment, evaluating the place (the LHS) again could yield a different location in memory, which means we would retag the wrong thing.
78+
Proper retagging here requires either a copy through a temporary, or making retagging integral part of the semantics of assignment.
79+
80+
* A `Raw` retag happens after every assignment where the RHS is a cast from a reference to a raw pointer.
81+
82+
* A `FnEntry` retag happens in the first basic block of every function, retagging each argument.
83+
84+
* A `Default` retag happens on the return value of every function that gets called (i.e., this is the first statement in the basic block that the call will return to).
85+
86+
* The automatically generated drop shims perform a `Raw` retag of their argument because they use it as a raw pointer.
87+
88+
## Operational semantics
89+
90+
### Generating timestampts
91+
92+
Whenever we need to generate a new timestamp, that means we effectively call the following method:
93+
94+
```rust
95+
impl Tracking {
96+
fn increment_clock(&mut self) -> Timestamp {
97+
let val = self.clock;
98+
self.clock += 1;
99+
val
100+
}
101+
}
102+
```
103+
104+
This method will never return the same value twice.
105+
106+
### Tracking function calls
107+
108+
To attach metadata to a particular function call, we assign a fresh ID to every call stack (so this distinguishes multiple calls to the same function).
109+
In other words, the per-stack-frame `CallId` is initialized by the following method:
110+
111+
```rust
112+
impl Tracking {
113+
fn new_call(&mut self) -> CallId {
114+
let id = self.next_call;
115+
self.next_call += 1;
116+
id
117+
}
118+
}
119+
```
120+
121+
This method will never return the same value twice.
122+
We say that a `CallId` is *active* if the call stack contains a stack frame with that ID.
123+
124+
**Note**: Miri uses a slightly more complex system with a `HashSet<CallId>` tracking the set of active `CallId`; that is just an optimization to avoid having to scan the call stack all the time.
125+
126+
### Allocating memory
127+
128+
When allocating memory, we have to initialize the `Stack` associated with the new locations, and we have to choose a `Borrow` (a tag) for the initial pointer to this memory.
129+
130+
For most memory, the stack of each freshly allocated memory location is `Stack { borrows: vec![Shr], frozen_since: None }`, and the initial pointer to that memory has tag `Shr(None)`.
131+
132+
The only exception is stack memory.
133+
Stack memory is handled by an environment (which is part of the information carried in a stack frame of the Rust abstract machine) that maps each local variable to a place.
134+
A place is a pointer together with some other data that is not relevant here -- the key point is that a place, just like every other pointer, carries a tag.
135+
When the local variable becomes live and its backing memory gets allocated, we generate a new timestamp `t` and use `Uniq(t)` as timestamp for the place of this local variable.
136+
We also initialize the stack of all the memory locations in this new memory allocation with `Stack { borrows: vec![Uniq(t)], frozen_since: None }`.
137+
138+
### Accessing memory
139+
140+
On every memory access, we perform the following extra check for every location that gets accessed (i.e., for a 4-byte access, this happens for each of the 4 bytes):
141+
142+
1. If the location is frozen (`frozen_since.is_some()`) and this is a read access, nothing happens (even if the tag is `Uniq`).
143+
2. Otherwise, if this is a write access, unfreeze the location (set `frozen_since` to `None`).
144+
(If this is a read access and we come here, the location is already unfrozen.)
145+
3. Pop the stack until the top item matches the tag of the pointer.
146+
- A `Uniq` item matches a `Uniq` tag with the same ID.
147+
- A `Shr` item matches any `Shr` tag (with or without timestamp).
148+
- When we are reading, a `Shr` item matches a `Uniq` tag.
149+
- If we pop a `FnBarrier(c)` where `c` is active, we have undefined behavior.
150+
151+
If we pop the entire stack without finding a match, then we have undefined behavior.
152+
153+
### Dereferencing a pointer
154+
155+
Every time a pointer gets dereferenced (evaluating the `Deref` place projection), we determine the extent of memory that this pointer covers using `size_of_val` and then we perform the following check on every location covered by the reference:
156+
157+
1. The location must exist, i.e., the pointer must actually be dereferencable for this entire memory range it covers.
158+
2. If this is a raw pointer, stop here. Raw accesses are checked as little as possible.
159+
3. If this is a unique reference and the tag is `Shr(Some(_))`, that's an error.
160+
4. If the tag is `Uniq`, make sure there is a matching `Uniq` item with the same ID on the stack.
161+
5. If the tag is `Shr(None)`, make sure that either the location is frozen or else there is a `Shr` item on the stack.
162+
6. If the tag is `Shr(Some(t))`, then the check depends on whether the location is inside an `UnsafeCell` or not, according to the type of the reference.
163+
- Locations outside `UnsafeCell` must have `frozen_since` set to `t` or an older timestamp.
164+
- `UnsafeCell` locations must either be frozen or else have a `Shr` item in their stack (same check as if the tag had no timestamp).
165+
166+
Whenever we are checking whether an item is in the stack, we ignore barriers.
167+
Failing any of these checks means we have undefined behavior.
168+
169+
### Reborrowing
170+
171+
We define the notion of "reborrowing", which will be used below to define the semantics of `Retag`.
172+
Reborrowing takes a (typed) place, whether to push a barrier, and the new borrow that this place is to be reborrowed for.
173+
we determine the extent of memory that this place covers using `size_of_val` and then we perform the following actions on every location covered by the place:
174+
175+
1. Perform the checks that would also happen on a dereference.
176+
Remember the position of the item matching the tag in the stack.
177+
2. Redundancy check, only happens if we will not push a barrier: if the new tag passes the checks performed on a dereference, and if the item that makes this check succeed is *above* the one we remembered in step 1 (where the "frozen" state is considered above every item in the stack), then stop.
178+
We are done for this location.
179+
This can only happen for shared references (i.e., when the borrow is `Shr(_)`).
180+
3. Perform the actions that would also happen when an actual access happens through this reference (for shared references with borrow `Shr(_)` this is a read access, for mutable references with borrow `Uniq(_)` it is a write access).
181+
Now the location cannot be frozen any more: if the new borrow is `Uniq(_)`, we just unfroze; if it is `Shr(_)` and the location was already frozen, then the redundancy check (step 3) would have kicked in.
182+
4. If we want to push a barrier, push `FnBarrier(c)` to the location stack where `c` is the `CallId` if the current function call (i.e., of the topmost frame in the call stack).
183+
5. Check if the new tag is `Shr(Some(t))` and the location is inside an `UnsafeCell`.
184+
- If both conditions are satisfied, freeze the location with timestamp `t`. If it is already frozen, do nothing.
185+
- Otherwise, push a new item onto the stack: `Shr` if the tag is a `Shr(_)`, `Uniq(id)` if the tag is `Uniq(id)`.
186+
187+
### Retagging
188+
189+
When executing `Retag(kind, place)`, we recursively visit all fields of this place, descending into compound types (`struct`, `enum`, arrays and so on) but not below any pointers.
190+
For each reference (`&[mut] _`) and box (`Box<_>`) we encounter, and if `kind == Raw` also for each raw pointer (`*[const,mut] _`), we perform the following steps:
191+
192+
1. We compute a fresh tag: `Uniq(_)` for mutable references, `Box`, `Shr(Some(_))` for shared references, and `Shr(None)` for raw pointers.
193+
2. We determine if we will want to push a barrier.
194+
This is the case only if all of the following conditions are satisfied: `kind == FnBarrier`, the type of this pointer is a reference (not a box), and if this is a shared reference then we are not inside an `UnsafeCell`.
195+
3. We perform reborrowing with the new tag and indicating whether we ant a barrier pushed or not.
196+
4. If `kind == TwoPhase`, we perform *another* reborrow with the tag being `Shr(Some(t))` for some fresh timestamp `t`, and not pushing new barriers.
197+
198+
### Deallocating memory
199+
200+
Memory deallocation first acts like a write access through the pointer used for deallocation.
201+
After that is done, we additionally check all `FnBarrier(c)` occurring in any stack on any of the deallocated locations.
202+
If any of the `c` is still active, we have undefined behavior.

0 commit comments

Comments
 (0)