From 43c10b0389115ed2bd60969e32ef34d3fad181d3 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Tue, 15 Oct 2019 06:11:06 +0200
Subject: [PATCH 01/42] First RFC draft for atomic volatile

---
 rfcs/0000-atomic-volatile.md | 416 +++++++++++++++++++++++++++++++++++
 1 file changed, 416 insertions(+)
 create mode 100644 rfcs/0000-atomic-volatile.md

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
new file mode 100644
index 00000000..cb84f7c9
--- /dev/null
+++ b/rfcs/0000-atomic-volatile.md
@@ -0,0 +1,416 @@
+- Feature Name: `atomic_volatile`
+- Start Date: 2019-10-15
+- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
+- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)
+
+# Summary
+[summary]: #summary
+
+Introduce a set of `core::volatile::VolatileXyz` structs, modeled after the
+existing `core::sync::atomic::AtomicXyz` API, which expose volatile loads and
+stores of natively supported width with both atomic and non-atomic semantics.
+Deprecate `ptr::[read|write]_volatile` and the corresponding methods of pointer
+types, and recommend that they be replaced with `Relaxed` atomic volatile
+operations on every platform that has support for them.
+
+
+# Motivation
+[motivation]: #motivation
+
+Volatile operations are meant an escape hatch that allows a Rust programmer to
+invoke hardware-level memory access semantics that are not properly accounted
+for by the Rust Abstract Machine. They work by triggering the (mostly)
+unoptimized generation of a matching stream of hardware load and store
+instructions in the output machine code.
+
+Unfortunately, the volatile operations that are currently exposed by Rust, which
+map into LLVM's non-atomic volatile operations, unnecessarily differ from the
+memory access semantics of mainstream hardware in two major ways:
+
+1. Concurrent use of volatile memory operations on a given memory location is
+   considered to be Undefined Behavior, and may therefore result in unintended
+   compilation output if detected by the optimizer.
+2. Using an overly wide volatile load or store operation which cannot be carried
+   out by a single hardware load and store instruction will not result in a
+   compilation error, but in the silent emission of multiple hardware load or
+   store instructions.
+
+By implementing support for LLVM's atomic volatile operations, and encouraging
+their use on every hardware that supports them, we eliminate these divergences
+from hardware behavior and therefore bring volatile operations closer to the
+"defer to hardware memory model" semantics that programmers expect them to have.
+This reduces the odd of mistake in programs operating outside of the regular
+Rust Abstract Machine semantics, which are notoriously hard to get right.
+
+As an attractive side-effect, atomic volatile memory operations also enable
+higher-performance interprocess communication between mutually trusting Rust
+programs through lock-free synchronization of shared memory objects.
+
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+The Rust compiler generally assumes that the program that it is building is
+living in a fully isolated memory space. It leverages this knowledge to
+transform said program's memory access patterns for performance optimization
+purposes, under the assumption that said transformations will not have any
+externally observable effect other than the program running faster.
+
+Examples of such transformations include:
+
+- Caching data from RAM into CPU registers.
+- Eliminating unused loads and unobservable stores.
+- Merging neighboring stores with each other.
+- Only updating the part of an over-written struct that has actually changed.
+
+Although these optimizations are most of the time correct and useful, there are
+some situations where they are inappropriate, including but not limited to:
+
+- [Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O), a common
+  low-level communication pattern between CPUs and peripherals where hardware
+  registers that masquerade as memory can be used to program the target hardware
+  by accessing them in very specific patterns.
+- [Shared memory](https://en.wikipedia.org/wiki/Shared_memory), a form of
+  inter-process communication where two programs can communicate via a common
+  memory block, and it is therefore not appropriate for Rust to assume that it
+  is aware of all memory accesses occurring inside said memory block.
+- [Cryptography](https://en.wikipedia.org/wiki/Cryptography), where it is
+  extremely important to ensure that sensitive information is erased after use,
+  and is not leaked via indirect means such as recognizable scaling patterns in
+  the time taken by a system to process attacker-crafted inputs.
+
+In such circumstances, it may be necessary to assert precise manual control on
+the memory accesses that are carried out by a Rust program in a certain memory
+region. This is the purpose of _volatile memory operations_, which allow a Rust
+programmers to generate a carefully controlled stream of hardware memory load
+and store instructions, which is guaranteed to be left untouched by the Rust
+compiler's optimizer even though surrounding Rust code will continue to be
+optimized as usual.
+
+---
+
+Volatile memory operations are exposed in the `std::volatile` module of the Rust
+standard library, or alternatively in the `core::volatile` module for the
+purpose of writing `#![no_std]` applications. They are interfaced through
+fixed-size data wrappers that are somewhat reminiscent of the API used for
+atomic operations in `std::sync::atomic`:
+
+```rust
+use std::sync::atomic::Ordering;
+use std::ptr::NonNull;
+use std::volatile::VolatileU8;
+
+unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
+    target.store(42, Ordering::Relaxed);
+    target.load_not_atomic()
+}
+```
+
+Several specificities, however, should be apparent from the above usage example.
+
+First of all, volatile types must be manipulated via pointers, instad of Rust
+references. These unusual and unpleasant ergonomics are necessary in order to
+achieve the desired semantics of manually controlling every access to the target
+memory location, because the mere existence of a Rust reference pointing to a
+memory region allows the Rust compiler to generate memory operations targeting
+this region (be they prefetches, register spills...).
+
+Because a Rust pointer is not subjected to borrow checking and has no obligation
+of pointing towards a valid memory location, this means that using a Volatile
+wrapper in any way is unsafe.
+
+Second, in addition to familiar atomic memory operations, volatile types expose
+the `load_not_atomic()` and `store_not_atomic()` methods. As their name suggest,
+these memory operations are not considered to be atomic by the compiler, and
+are therefore not safe to concurrently invoke in multiple threads.
+
+On the vast majority of hardware platforms supported by Rust, using these
+methods will generate exactly the same code as using the `load()` and `store()`
+methods with `Relaxed` memory ordering, with the only difference being that
+data races are Undefined Behavior. When that is the case, safer `Relaxed` atomic
+volatile operations should be preferred to their non-atomic equivalents.
+
+But unfortunately, Rust supports a couple of platforms, such as the `nvptx`
+assembly used by NVidia GPUs, where `Relaxed` atomic ordering is either
+unsupported or emulated via a stream of hardware instructions that is more
+complex than plain loads and stores. Supporting not-atomic volatile loads and
+stores is necessary to get minimal `Volatile` support on those platforms.
+
+Finally, unlike with atomics, the compiler is not allowed to optimize the above
+function into the following...
+
+```rust
+unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
+    target.store(42, Ordering::Relaxed);
+    42
+}
+```
+
+...or even the following, if it can prove that no other thread has access to
+the underlying `VolatileU8` variable:
+
+```rust
+unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
+    42
+}
+```
+
+This is the definining characteristic of volatile operations, which makes them
+suitable for sensitive memory manipulations such as cryptographic secret erasure
+or memory-mapped user.
+
+---
+
+Experienced Rust users may be familiar with the previously existing
+`std::ptr::read_volatile()` and `std::ptr::write_volatile()` functions, or
+equivalent methods of raw pointer objects, and wonder how the new facilities
+provided by `std::volatile` compare to those methods.
+
+The answer is that this new volatile data access API supersedes its predecessor,
+which is now _deprecated_, by improving upon it in many different ways:
+
+- The data race semantics of `Relaxed` volatile data accesses more closely
+  matches the data race semantics of most hardware, and therefore eliminates an
+  unnecessary mismatch between volatile semantics and low-level hardware load
+  and store semantics when it comes to concurrency.
+- `VolatileXyz` wrappers are only supported for data types which are supported
+  at the machine level. Therefore, one no longer needs to worry about the
+  possibility of the compiler silently turning a Rust-level volatile data access
+  into multiple hardware-level memory operations. In the same manner as with
+  atomics, if a `Volatile` wrapper type is provided by Rust, the underlying
+  hardware is guaranteed to support memory operations of that width.
+- The ability to specify stronger-than-`Relaxed` memory orderings on volatile
+  memory operations enables new use cases which were not achievable before
+  without exploiting Undefined Behavior, such as high-performance
+  synchronization of mutually trusting Rust processes communicating via
+  lock-free data structures in shared memory.
+
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+The fundamental purpose of volatile operations, in a system programming language
+like Rust, is to allow a developer to locally escape the Rust Abstract Machine's
+weak memory semantics and defer to the hardware's memory semantics instead,
+without resorting to the full complexity, overhead, and non-portability of
+assembly (inline or otherwise).
+
+Rust's current volatile operations, which defer to LLVM's non-atomic volatile
+operations, do not achieve this goal very well because...
+
+1. Their data race semantics do not match the data race semantics of the
+   hardware which volatile is supposed to defer to, and as a result are
+   unnecessarily surprising and unsafe. They prevent some synchronization
+   patterns which are legal at the hardware level but not at the abstract
+   machine level, such as the "seqlock", to be expressed using volatile
+   operations. No useful optimization opportunities are opened by this undefined
+   behavior, since volatile operations cannot be meaningfully optimized.
+2. The absence of a hard guarantee that each volatile load or store will
+   translate into exactly one load or store at the hardware level is needlessly
+   tracherous in scenarios where memory access patterns must be very precisely
+   controlled, such as memory-mapped I/O.
+
+Using LLVM's `Relaxed` atomic volatile operations instead resolves both problems
+on cache-coherent hardware where native loads and stores have `Relaxed` or
+stronger semantics. The vast majority of hardware which Rust supports today and
+is expected to support in the future is cache-coherent, and therefore making
+volatile feel more at home on such hardware is a major win.
+
+However, we obviously must still have something for those exotic platforms whose
+basic memory loads and stores are not cache-coherent, such as `nvptx`. Hence the
+compromise of `load_not_atomic()` and `store_not_atomic()` is still kept around,
+only discouraging its use.
+
+---
+
+It should be noted that switching to LLVM's atomic volatile accesses does not
+resolve the second problem very well per se, because although oversized volatile
+accesses will not compile anymore, they will only be "reported" to the user via
+LLVM crashes. This is not a nice user experience, which is why volatile wrapper
+types are proposed.
+
+Their design largely mirrors that of Rust's existing atomic types, which is only
+appropriate since they do expose atomic operations. One goal of this design is
+that it should be possible to re-use the architecture-specific code that already
+exists to selectively expose Rust atomic types and operations depending on what
+the hardware supports under the hood.
+
+---
+
+This RFC currently proposes to expose the full power of LLVM's atomic volatile
+operations, including e.g. read-modify-write operations like compare-and-swap,
+because there are legitimately useful use cases for these operations in
+interprocess communication scenarios.
+
+However, the fact that these operations do not necessarily compile into a single
+hardware instruction is arguably a footgun for volatile's use cases, and it
+could be argued that initially only stabilizing loads, stores and `Relaxed`
+atomic ordering would be more prudent. I'll go back to this in the alternatives
+section of this RFC.
+
+---
+
+As currently designed, this RFC uses `arbitrary_safe_types` to give method-like
+semantics to a `NonNull` raw pointer. I believe that this is necessary to get
+reasonable ergonomics with an atomics-like wrapper type approach. However, it
+could also be argued that a `VolatileXyz::store(ptr, data, ordering)` style of
+API could reasonably work, and avoid coupling with unstable features. Similarly,
+the use of `NonNull` itself could be debated. I'll come back to these points in
+the alternatives section as well.
+
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+Deprecating the existing volatile operations will cause language churn. Not
+deprecating them will keep around two different and subtly incompatible ways to
+do the same thing, which is equally bad. It could be argued that the issues with
+the existing volatile operations, while real, do not warrant the full complexity
+of this RFC.
+
+Atomic volatile operations are also a somewhat LLVM-specific concept, and
+requiring them may make the life of our other compiler backends harder.
+
+As mentioned above, exposing more than loads and stores, and non-`Relaxed`
+atomic memory orderings, also muddies the "a volatile op should compile into one
+hardware memory instruction" that is so convenient for loads and stores.
+Further, compatibility of non-load/store atomics in IPC scenario may require
+some ABI agreement between the interacting programs.
+
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+I think we want to do this because it simultaneously eliminates two well-known
+and unnecessary volatile footguns (data races and tearing) and opens interesting
+new possibilities.
+
+But although this feature may seem simple, its design space is actually
+remarquably large, and a great many alternatives were considered before reaching
+the current design proposal. Here is a map of some design knobs that I explored:
+
+## Extending `AtomicXyz`
+[extending-atomicxyz]: #extending-atomicxyz
+
+Atomic volatile operations could be made part of the Atomic wrapper types.
+However, they cannot be just another atomic memory ordering, due to the fact
+that atomic ops take an `&self` (which is against the spirit of volatile as it
+is tagged with `dereferenceable` at the LLVM layer).
+
+Instead, one would need to add more methods to atomic wrapper types, which take
+a pointer as a self-parameter. I thought that this API inconsistency would be
+jarring to users, not to mention that potentially having a copy of each atomic
+memory operation with a `_volatile` prefix would get annoying quickly when
+reading through the API docs.
+
+## Self-type or not self-type
+[self-type]: #self-type
+
+Instead of using `arbitrary_self_types` to get `a.load(Ordering::Relaxed)`
+method syntax on pointer-like objects, one could instead provide the volatile
+operations as inherent methods, i.e. `VolatileU8::load(a, Ordering::Relaxed)`.
+
+This has the advantage of avoiding coupling this feature to another unstable
+feature. But it has the drawback of being incredibly verbose. Typing the same
+volatile type name over and over again in a complex shared memory transaction
+would certainly get old and turn annoying quickly, and we don't want anger to
+distract low-level developers from the delicate task of implementing the kind
+of subtle algorithm that requires volatile operations.
+
+## `NonNull<T>` vs `*mut T` vs `*const T` vs other
+[pointers]: #pointers
+
+Honestly, I don't have a very strong opinion there. I have a positive _a priori_
+towards `NonNull<T>` because it encodes an important invariant in the API, and
+I think that arbitrary self types make its usually poor ergonomics bearable.
+
+But what I would really want here is something like a non-`dereferenceable`
+Rust reference. I don't like the fact that I have to give up on the borrow
+checker and make everything unsafe just to do volatile loads and stores, which
+are not unsafe _per se_ as long as they occur via a `Volatile` wrapper. It just
+feels like we should be able to propose something better than full references
+vs full raw pointers here...
+
+## Full atomics vocabulary vs hardware semantics
+[atomics-vs-hardware]: #atomics-vs-hardware
+
+Currently, this RFC basically proposes exposing a volatile version of every
+atomic operation supported by Rust for maximal expressive power, and I could
+definitely find uses for the new possibilities that this opens in IPC scenarios.
+
+But it could also be argued that this distracts us from volatile's main purpose
+of generating a stream of simple hardware instructions without using inline
+assembly:
+
+- Non-`Relaxed` atomics will entail memory barriers
+- Compare-and-swap may be implemented as a load-linked/store-conditional loop
+- Some types like `VolatileBool` are dangerous when interacting with untrusted
+  code because they come with data validity invariants.
+
+From this perspective, there would be an argument in favor of only supporting
+`Relaxed` load/stores and machine data types, at least initially. And I could
+get behind that. But since the rest can be useful in IPC with trusted Rust code,
+I thought it might be worth at least considering.
+
+
+# Prior art
+[prior-art]: #prior-art
+
+As far as I know, LLVM's notion of atomic volatile, which is being exposed here,
+is rather original. The closest thing that it makes me think about is how Java
+uses the `volatile` keyword for what most other languages call atomics. But
+I'm not sure if Java also retains the other semantics of `volatile` in the C
+family (e.g. "cannot be optimized out ever").
+
+There is a lot more prior art behind C's notion of volatile, which is closer to
+Rust's current notion of volatile. That being said, most discussions of C/++
+volatile semantics end up in complaints about how ill-specified, how easy it is
+to get it wrong, how "contagious" it can get... so I'm not sure if it's
+something we want to emulate. Besides, Rust's notion of volatile differs
+fundamentally from C's notion of volatile in that it is based on volatile
+_operations_, not volatile _types_.
+
+In the C++ community, there have been
+[a series](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1152r0.html)
+[of papers](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1152r2.html)
+by JF Bastien to greatly reduce the scope of `volatile`, ideally until it
+basically covers just loads and stores.
+
+In the Rust community, we have a long history of discussing use cases that
+require volatile semantics, from
+[memory-mapped I/O](https://github.com/rust-lang/unsafe-code-guidelines/issues/33)
+to [weird virtual memory](https://github.com/rust-lang/unsafe-code-guidelines/issues/28)
+and [interaction with untrusted system processes](https://github.com/rust-lang/unsafe-code-guidelines/issues/152).
+
+This last case, in particular, could be served through a combination of atomic
+volatile with a clarification of LLVM's volatile semantics which would tune down
+the amount of situations in which a volatile read from memory can be undefined
+behavior.
+
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+I expect the RFC process to be an occasion to revisit some points of the
+rationale and alternative sections better than I can do on my own, bring new
+arguments and edge cases on the table, and more generally refine the API.
+
+Implementation should be generally straightforward as most of the groundwork
+has already been done when implementing `ptr::read_volatile()`,
+`ptr::write_volatile()` and `std::sync::atomic` as far as I know.
+
+This RFC will no fully resolve the "untrusted shared memory" use case, because
+doing so also requires work on clarifying LLVM semantics so that it is
+absolutely clear that a malicious process cannot cause UB in another process
+by writing data in memory that's shared between the two, no matter if said
+writes are non-atomic, non-volatile, etc.
+
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
+If we decide to drop advanced atomic orderings and operations from this RFC,
+then they will fall out in this section.
+
+This RFC would also benefit from a safer way to interact with volatile memory
+regions than raw pointers.

From e84c586e1accb75d4a5371e6dec388bee2ab31e2 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Tue, 15 Oct 2019 06:26:03 +0200
Subject: [PATCH 02/42] Some cross-checking

---
 rfcs/0000-atomic-volatile.md | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index cb84f7c9..1861b446 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -17,10 +17,10 @@ operations on every platform that has support for them.
 # Motivation
 [motivation]: #motivation
 
-Volatile operations are meant an escape hatch that allows a Rust programmer to
-invoke hardware-level memory access semantics that are not properly accounted
-for by the Rust Abstract Machine. They work by triggering the (mostly)
-unoptimized generation of a matching stream of hardware load and store
+Volatile operations are meant to be an escape hatch that allows a Rust
+programmer to invoke hardware-level memory access semantics that are not
+properly accounted for by the Rust Abstract Machine. They work by triggering the
+(mostly) unoptimized generation of a matching stream of hardware load and store
 instructions in the output machine code.
 
 Unfortunately, the volatile operations that are currently exposed by Rust, which
@@ -67,9 +67,9 @@ Although these optimizations are most of the time correct and useful, there are
 some situations where they are inappropriate, including but not limited to:
 
 - [Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O), a common
-  low-level communication pattern between CPUs and peripherals where hardware
-  registers that masquerade as memory can be used to program the target hardware
-  by accessing them in very specific patterns.
+  low-level communication protocol between CPUs and peripherals, where hardware
+  registers masquerading as memory can be used to program said hardware by
+  accessing the registers in very specific load/store patterns.
 - [Shared memory](https://en.wikipedia.org/wiki/Shared_memory), a form of
   inter-process communication where two programs can communicate via a common
   memory block, and it is therefore not appropriate for Rust to assume that it
@@ -157,7 +157,7 @@ unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
 
 This is the definining characteristic of volatile operations, which makes them
 suitable for sensitive memory manipulations such as cryptographic secret erasure
-or memory-mapped user.
+or memory-mapped I/O.
 
 ---
 
@@ -179,11 +179,11 @@ which is now _deprecated_, by improving upon it in many different ways:
   into multiple hardware-level memory operations. In the same manner as with
   atomics, if a `Volatile` wrapper type is provided by Rust, the underlying
   hardware is guaranteed to support memory operations of that width.
-- The ability to specify stronger-than-`Relaxed` memory orderings on volatile
-  memory operations enables new use cases which were not achievable before
-  without exploiting Undefined Behavior, such as high-performance
-  synchronization of mutually trusting Rust processes communicating via
-  lock-free data structures in shared memory.
+- The ability to specify stronger-than-`Relaxed` memory orderings and to use 
+  memory operations other than loads and stores enables new use cases which were
+  not achievable before without exploiting Undefined Behavior, such as
+  high-performance synchronization of mutually trusting Rust processes
+  via lock-free data structures in shared memory.
 
 
 # Reference-level explanation
@@ -387,6 +387,10 @@ volatile with a clarification of LLVM's volatile semantics which would tune down
 the amount of situations in which a volatile read from memory can be undefined
 behavior.
 
+There are plenty of crates trying to abstract volatile operations. Many are
+believed to be unsound due as they expose an `&self` to the sensitive memory
+region, but `voladdress` is believed not to be affected by this problem.
+
 
 # Unresolved questions
 [unresolved-questions]: #unresolved-questions

From 3638bf9cdba2477eed554c9b4f0d5cf88835fba2 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Tue, 15 Oct 2019 06:26:21 +0200
Subject: [PATCH 03/42] More cross-checking

---
 rfcs/0000-atomic-volatile.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 1861b446..6c69d74f 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -399,9 +399,10 @@ I expect the RFC process to be an occasion to revisit some points of the
 rationale and alternative sections better than I can do on my own, bring new
 arguments and edge cases on the table, and more generally refine the API.
 
-Implementation should be generally straightforward as most of the groundwork
-has already been done when implementing `ptr::read_volatile()`,
-`ptr::write_volatile()` and `std::sync::atomic` as far as I know.
+If we decide to implement this, implementation should be reasonably
+straightforward and uneventful, as most of the groundwork has already been done
+over the course of implementing `ptr::read_volatile()`, `ptr::write_volatile()`
+and `std::sync::atomic` (as far as I know at least).
 
 This RFC will no fully resolve the "untrusted shared memory" use case, because
 doing so also requires work on clarifying LLVM semantics so that it is

From c46335eb931b78ebd24f2a0ca9432ee99d4b91c6 Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Tue, 15 Oct 2019 10:56:17 +0200
Subject: [PATCH 04/42] One more argument against volatile ops on atomic

---
 rfcs/0000-atomic-volatile.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 6c69d74f..19029c2e 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -303,6 +303,10 @@ jarring to users, not to mention that potentially having a copy of each atomic
 memory operation with a `_volatile` prefix would get annoying quickly when
 reading through the API docs.
 
+Finally, it is extremely common to want either all operations on a memory
+location to be volatile, or none of them. Providing separate wrapper types
+helps enforce this very common usage pattern at the API level.
+
 ## Self-type or not self-type
 [self-type]: #self-type
 

From 9cfd45e4a817389e697fe0113b53c87c5d28ef95 Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Tue, 15 Oct 2019 11:12:23 +0200
Subject: [PATCH 05/42] Fine-tuning

---
 rfcs/0000-atomic-volatile.md | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 19029c2e..51c3cfb0 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -42,9 +42,9 @@ from hardware behavior and therefore bring volatile operations closer to the
 This reduces the odd of mistake in programs operating outside of the regular
 Rust Abstract Machine semantics, which are notoriously hard to get right.
 
-As an attractive side-effect, atomic volatile memory operations also enable
-higher-performance interprocess communication between mutually trusting Rust
-programs through lock-free synchronization of shared memory objects.
+As an unexpected but attractive side-effect, atomic volatile memory operations
+also enable higher-performance interprocess communication between mutually
+trusting Rust programs through lock-free synchronization of shared memory.
 
 
 # Guide-level explanation
@@ -68,8 +68,8 @@ some situations where they are inappropriate, including but not limited to:
 
 - [Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O), a common
   low-level communication protocol between CPUs and peripherals, where hardware
-  registers masquerading as memory can be used to program said hardware by
-  accessing the registers in very specific load/store patterns.
+  registers masquerading as memory can be used to program peripherals by
+  accessing said registers in very specific load/store patterns.
 - [Shared memory](https://en.wikipedia.org/wiki/Shared_memory), a form of
   inter-process communication where two programs can communicate via a common
   memory block, and it is therefore not appropriate for Rust to assume that it
@@ -146,8 +146,8 @@ unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
 }
 ```
 
-...or even the following, if it can prove that no other thread has access to
-the underlying `VolatileU8` variable:
+...or even the following, which it could normally do if it could prove that no
+other thread has access to the underlying `VolatileU8` variable:
 
 ```rust
 unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
@@ -273,9 +273,10 @@ requiring them may make the life of our other compiler backends harder.
 
 As mentioned above, exposing more than loads and stores, and non-`Relaxed`
 atomic memory orderings, also muddies the "a volatile op should compile into one
-hardware memory instruction" that is so convenient for loads and stores.
+hardware memory instruction" narrative that is so convenient for loads and stores.
 Further, compatibility of non-load/store atomics in IPC scenario may require
-some ABI agreement between the interacting programs.
+some ABI agreement on how atomics should be implemented between the interacting
+programs.
 
 
 # Rationale and alternatives
@@ -283,7 +284,7 @@ some ABI agreement between the interacting programs.
 
 I think we want to do this because it simultaneously eliminates two well-known
 and unnecessary volatile footguns (data races and tearing) and opens interesting
-new possibilities.
+new possibilities in interprocess communication.
 
 But although this feature may seem simple, its design space is actually
 remarquably large, and a great many alternatives were considered before reaching
@@ -303,7 +304,7 @@ jarring to users, not to mention that potentially having a copy of each atomic
 memory operation with a `_volatile` prefix would get annoying quickly when
 reading through the API docs.
 
-Finally, it is extremely common to want either all operations on a memory
+Also, it is extremely common to want either all operations on a memory
 location to be volatile, or none of them. Providing separate wrapper types
 helps enforce this very common usage pattern at the API level.
 
@@ -392,7 +393,7 @@ the amount of situations in which a volatile read from memory can be undefined
 behavior.
 
 There are plenty of crates trying to abstract volatile operations. Many are
-believed to be unsound due as they expose an `&self` to the sensitive memory
+believed to be unsound as they expose an `&self` to the sensitive memory
 region, but `voladdress` is believed not to be affected by this problem.
 
 
@@ -418,6 +419,10 @@ writes are non-atomic, non-volatile, etc.
 # Future possibilities
 [future-possibilities]: #future-possibilities
 
+As mentioned above, this RFC is a step forward in addressing the untrusted
+shared memory use case that is of interest to many "supervisor" programs, but
+not the end of that story. Finishing it will likely require LLVM assistance.
+
 If we decide to drop advanced atomic orderings and operations from this RFC,
 then they will fall out in this section.
 

From 3b2984f3c3f456c1f09d0d1ff6327235acdba939 Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 13:41:03 +0200
Subject: [PATCH 06/42] Update rfcs/0000-atomic-volatile.md

Emphasize why load/store tearing is bad.

Co-Authored-By: gnzlbg <gnzlbg@users.noreply.github.com>
---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 51c3cfb0..378f3097 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -33,7 +33,7 @@ memory access semantics of mainstream hardware in two major ways:
 2. Using an overly wide volatile load or store operation which cannot be carried
    out by a single hardware load and store instruction will not result in a
    compilation error, but in the silent emission of multiple hardware load or
-   store instructions.
+   store instructions which might be a logic error in the users' program.
 
 By implementing support for LLVM's atomic volatile operations, and encouraging
 their use on every hardware that supports them, we eliminate these divergences

From e920abd7aa451bbdda8cca39b326742829edbfdd Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 13:45:37 +0200
Subject: [PATCH 07/42] Update rfcs/0000-atomic-volatile.md

Typographical fix

Co-Authored-By: gnzlbg <gnzlbg@users.noreply.github.com>
---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 378f3097..d77d2913 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -196,7 +196,7 @@ without resorting to the full complexity, overhead, and non-portability of
 assembly (inline or otherwise).
 
 Rust's current volatile operations, which defer to LLVM's non-atomic volatile
-operations, do not achieve this goal very well because...
+operations, do not achieve this goal very well because:
 
 1. Their data race semantics do not match the data race semantics of the
    hardware which volatile is supposed to defer to, and as a result are

From c423313ff80bc67a4fa7f9280dcaae5f94260e67 Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 17:34:24 +0200
Subject: [PATCH 08/42] Follow suggestions from the PR

---
 rfcs/0000-atomic-volatile.md | 239 ++++++++++++++++++++++++++++++-----
 1 file changed, 207 insertions(+), 32 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index d77d2913..4e68f6be 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -33,7 +33,7 @@ memory access semantics of mainstream hardware in two major ways:
 2. Using an overly wide volatile load or store operation which cannot be carried
    out by a single hardware load and store instruction will not result in a
    compilation error, but in the silent emission of multiple hardware load or
-   store instructions which might be a logic error in the users' program.
+   store instructions, which might be a logic error in the users' program.
 
 By implementing support for LLVM's atomic volatile operations, and encouraging
 their use on every hardware that supports them, we eliminate these divergences
@@ -44,7 +44,7 @@ Rust Abstract Machine semantics, which are notoriously hard to get right.
 
 As an unexpected but attractive side-effect, atomic volatile memory operations
 also enable higher-performance interprocess communication between mutually
-trusting Rust programs through lock-free synchronization of shared memory.
+trusting Rust programs, through lock-free synchronization of shared memory.
 
 
 # Guide-level explanation
@@ -130,11 +130,12 @@ methods with `Relaxed` memory ordering, with the only difference being that
 data races are Undefined Behavior. When that is the case, safer `Relaxed` atomic
 volatile operations should be preferred to their non-atomic equivalents.
 
-But unfortunately, Rust supports a couple of platforms, such as the `nvptx`
-assembly used by NVidia GPUs, where `Relaxed` atomic ordering is either
-unsupported or emulated via a stream of hardware instructions that is more
-complex than plain loads and stores. Supporting not-atomic volatile loads and
-stores is necessary to get minimal `Volatile` support on those platforms.
+But unfortunately, Rust supports a couple of targets, including the `nvptx`
+assembly used by NVidia GPUs and abstract machines like WASM, where `Relaxed`
+atomic ordering is either unsupported or emulated via a stream of
+instructions that is more complex than plain loads and stores. Supporting
+not-atomic volatile loads and stores is necessary to get minimal `Volatile`
+support on those platforms.
 
 Finally, unlike with atomics, the compiler is not allowed to optimize the above
 function into the following...
@@ -146,8 +147,8 @@ unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
 }
 ```
 
-...or even the following, which it could normally do if it could prove that no
-other thread has access to the underlying `VolatileU8` variable:
+...or even the following, which it could normally do if it the optimizer managed
+to prove that no other thread has access to the `VolatileU8` variable:
 
 ```rust
 unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
@@ -211,10 +212,10 @@ operations, do not achieve this goal very well because:
    controlled, such as memory-mapped I/O.
 
 Using LLVM's `Relaxed` atomic volatile operations instead resolves both problems
-on cache-coherent hardware where native loads and stores have `Relaxed` or
-stronger semantics. The vast majority of hardware which Rust supports today and
-is expected to support in the future is cache-coherent, and therefore making
-volatile feel more at home on such hardware is a major win.
+on globally cache-coherent hardware where native loads and stores have `Relaxed`
+or stronger semantics. The vast majority of hardware which Rust supports today
+and is expected to support in the future exhibits global cache coherence,
+so making volatile feel more at home on such hardware is a sizeable achievement.
 
 However, we obviously must still have something for those exotic platforms whose
 basic memory loads and stores are not cache-coherent, such as `nvptx`. Hence the
@@ -223,17 +224,176 @@ only discouraging its use.
 
 ---
 
-It should be noted that switching to LLVM's atomic volatile accesses does not
-resolve the second problem very well per se, because although oversized volatile
-accesses will not compile anymore, they will only be "reported" to the user via
-LLVM crashes. This is not a nice user experience, which is why volatile wrapper
-types are proposed.
+Switching to LLVM's atomic volatile accesses without also changing the API of
+`ptr::[read|write]_volatile` would unfortunately not resolve the memory access
+tearing problem very well, because although oversized volatile accesses would
+not compile anymore, this fact would only be "reported" to the user via an LLVM
+crash. This is not a nice user experience, which is why volatile wrapper types
+are proposed instead.
 
 Their design largely mirrors that of Rust's existing atomic types, which is only
-appropriate since they do expose atomic operations. One goal of this design is
-that it should be possible to re-use the architecture-specific code that already
-exists to selectively expose Rust atomic types and operations depending on what
-the hardware supports under the hood.
+appropriate since they do expose atomic operations. The current proposal would
+be to fill the newly built `std::volatile` module with the following entities
+(some of which may not be available on a given platform, we will come back to
+this point in the moment):
+
+- `VolatileBool`
+- `VolatileI8`
+- `VolatileI16`
+- `VolatileI32`
+- `VolatileI64`
+- `VolatileIsize`
+- `VolatilePtr`
+- `VolatileU8`
+- `VolatileU16`
+- `VolatileU32`
+- `VolatileU64`
+- `VolatileUsize`
+
+The API of these volatile types would then be very much like that of existing
+`AtomicXyz` types, except for the fact that it would be based on raw pointers
+instead of references because the existence of a Rust reference to a memory
+location is fundamentally at odds with the precise control of hardware load and
+store generation that is required by volatile use case.
+
+To give a more concrete example, here is what the API of `VolatileBool` would
+look like on a platform with full support for this type.
+
+```rust
+#![feature(arbitrary_self_types)]
+use std::sync::atomic::Ordering;
+
+
+#[repr(transparent)]
+struct VolatileBool(bool);
+
+impl VolatileBool {
+    /// Creates a new VolatileBool pointer
+    ///
+    /// This is safe as creating a pointer is considered safe in Rust and
+    /// volatile adds no safety invariant to the input pointer.
+    ///
+    pub const fn new(v: NonNull<bool>) -> NonNull<Self> { /* ... */ }
+
+    // NOTE: Unlike with `AtomicBool`, `get_mut()` and `into_inner()` operations
+    //       are not provided, because it is never safe to assume that no one
+    //       is concurrently accessing the atomic data. Alternatively, these
+    //       operations could be provided in an unsafe way, if someone can find
+    //       a use case for them.
+
+    /// Load a value from the bool
+    ///
+    /// `load` takes an `Ordering` argument which describes the memory ordering
+    /// of this operation. Possible values are SeqCst, Acquire and Relaxed.
+    ///
+    /// # Panics
+    ///
+    /// Panics if order is Release or AcqRel.
+    ///
+    /// # Safety
+    ///
+    /// The `self` pointer must be well-aligned and point to a valid memory
+    /// location containing a valid `bool` value.
+    ///
+    pub unsafe fn load(self: NonNull<Self>, order: Ordering) -> bool { /* ... */ }
+
+    // ... and then a similar transformation is carried out on all other atomic
+    // operation APIs from `AtomicBool`:
+
+    pub unsafe fn store(self: NonNull<Self>, val: bool, order: Ordering) { /* ... */ }
+
+    pub unsafe fn swap(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
+
+    pub unsafe fn compare_and_swap(
+        self: NonNull<Self>,
+        current: bool,
+        new: bool,
+        order: Ordering
+    ) -> bool { /* ... */ }
+
+    pub unsafe fn compare_exchange(
+        self: NonNull<Self>,
+        current: bool,
+        new: bool,
+        success: Ordering,
+        failure: Ordering
+    ) -> Result<bool, bool> { /* ... */ }
+
+    pub unsafe fn compare_exchange_weak(
+        self: NonNull<Self>,
+        current: bool,
+        new: bool,
+        success: Ordering,
+        failure: Ordering
+    ) -> Result<bool, bool> { /* ... */ }
+
+    pub unsafe fn fetch_and(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
+
+    pub unsafe fn fetch_nand(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
+
+    pub unsafe fn fetch_or(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
+
+    pub unsafe fn fetch_xor(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
+
+    // Finally, non-atomic load and store operations are provided:
+
+    /// Load a value from the bool in a non-atomic way
+    ///
+    /// This method is provided for the sake of supporting platforms where
+    /// `load(Relaxed)` either is unsupported or compiles down to more than a
+    /// single hardware load instruction. As a counterpart, it is UB to use it
+    /// in a concurrent setting. Use of `load(Relaxed)` should be preferred
+    /// whenever possible.
+    ///
+    /// # Safety
+    ///
+    /// The `self` pointer must be well-aligned, and point to a valid memory
+    /// location containing a valid `bool` value.
+    ///
+    /// Using this operation to access memory which is concurrently written by
+    /// another thread is a data race, and therefore Undefined Behavior.
+    ///
+    pub unsafe fn load_not_atomic(self: NonNull<Self>) -> bool { /* ... */ }
+
+    // ...and we have the same idea for stores:
+    pub unsafe fn store_not_atomic(self: NonNull<Self>, val: bool) { /* ... */ }
+}
+```
+
+---
+
+Like `std::sync::atomic::AtomicXyz` APIs, the `std::volatile::VolatileXyz` APIs
+are not guaranteed to be fully supported on every platform. Cases of partial
+platform support which are shared with `AtomicXyz` APIs include:
+
+- Not supporting atomic accesses to certain types like `u64`.
+- Not supporting atomic read-modify-write operations like `swap()`.
+
+In addition, a concern which is specific to `Volatile` types is the case where
+atomic operations are not supported at all, but non-atomic volatile operations
+are supported. In this case, the `AtomicBool` type above would only expose
+the `load_not_atomic()` and `store_not_atomic()` methods.
+
+Platform support for volatile operations can be queried in much of the
+same way as platform support for atomic operations:
+
+- `#[cfg(target_has_atomic = N)]`, which can be used today to test full support
+  for atomic operations of a certain width N, may now also be used to test
+  full support for volatile atomic operations of the same width.
+- `#[cfg(target_has_atomic_load_store = N)]` may similarly be used to test
+  support for volatile atomic load and store operations.
+- A new cfg directive, `#[cfg(target_has_volatile = N)]`, may be used to test
+  support for non-atomic loads and stores of a certain width (i.e. `_not_atomic`
+  operations).
+
+This latter directive can be initially pessimistically implemented as a synonym
+of `#[cfg(target_has_atomic_load_store = N)]`, then gradually extended to
+support targets which have no or non-native support of `Relaxed` atomics
+but do have native load/store instructions of a certain width, such as `nvptx`.
+
+In this way, the proposed volatile atomic operation API can largely re-use the
+already existing atomic operation support infrastructure, which will greatly
+reduce effort duplication between these two closely related functionalities.
 
 ---
 
@@ -278,6 +438,11 @@ Further, compatibility of non-load/store atomics in IPC scenario may require
 some ABI agreement on how atomics should be implemented between the interacting
 programs.
 
+If this is perceived to be a problem, we could decide to do away with some of
+the complexity by initially focusing on a restricted subset of this proposal
+which only supports `Relaxed` loads and stores, and saving more complex atomic
+operations as a future extension.
+
 
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
@@ -325,16 +490,26 @@ of subtle algorithm that requires volatile operations.
 ## `NonNull<T>` vs `*mut T` vs `*const T` vs other
 [pointers]: #pointers
 
-Honestly, I don't have a very strong opinion there. I have a positive _a priori_
-towards `NonNull<T>` because it encodes an important invariant in the API, and
-I think that arbitrary self types make its usually poor ergonomics bearable.
-
-But what I would really want here is something like a non-`dereferenceable`
-Rust reference. I don't like the fact that I have to give up on the borrow
-checker and make everything unsafe just to do volatile loads and stores, which
-are not unsafe _per se_ as long as they occur via a `Volatile` wrapper. It just
-feels like we should be able to propose something better than full references
-vs full raw pointers here...
+It is pretty clear that volatile operations cannot be expressed through `&self`
+Rust references, as these provide the compiler with a licence to inject
+arbitrary loads from the memory region (or even stores, for register spills, if
+the region is determined to be unobservable to other threads). This would be
+incompatible with the stated goal of precisely controlling memory accesses.
+
+Currently, the only alternative to references in Rust is to use raw pointer
+types. Rust has a number of these, here it is proposes to use `NonNull<T>`
+pointer type because it encodes the non-nullness invariant of the API in code.
+Although `NonNull<T>` is often less ergonomic to manipulate than `*mut T`, the
+use of arbitrary self types make its ergonomics comparable in this case.
+
+But it should be noted that overall, using raw pointers for this API feels
+decidedly unsatisfactory overall. It feels like this use case could benefit from
+a different kind of data accessor which encodes the fact that the data must be
+live and valid, by correctly upholding normal borrow-checking rules.
+
+In this way, most of the proposed API could become safe, and the only thing that
+would remain unsafe would be the `_not_atomic()` operations, which more closely
+reflects the reality of the situation.
 
 ## Full atomics vocabulary vs hardware semantics
 [atomics-vs-hardware]: #atomics-vs-hardware

From f0c23815acdc3eed7cfcb5c863bbb5588e0f74fc Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 17:37:09 +0200
Subject: [PATCH 09/42] Rewording

---
 rfcs/0000-atomic-volatile.md | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 4e68f6be..dca89541 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -106,25 +106,24 @@ unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
 }
 ```
 
-Several specificities, however, should be apparent from the above usage example.
-
-First of all, volatile types must be manipulated via pointers, instad of Rust
-references. These unusual and unpleasant ergonomics are necessary in order to
-achieve the desired semantics of manually controlling every access to the target
-memory location, because the mere existence of a Rust reference pointing to a
-memory region allows the Rust compiler to generate memory operations targeting
-this region (be they prefetches, register spills...).
+However, notice that volatile types must be manipulated via pointers, instad of
+Rust references. These unusual and unpleasant ergonomics are necessary in order
+to achieve the desired semantics of manually controlling every access to the
+target memory location, because the mere existence of a Rust reference pointing
+to a memory region allows the Rust compiler to generate memory operations
+targeting this region (be they prefetches, register spills...).
 
 Because a Rust pointer is not subjected to borrow checking and has no obligation
 of pointing towards a valid memory location, this means that using a Volatile
 wrapper in any way is unsafe.
 
-Second, in addition to familiar atomic memory operations, volatile types expose
-the `load_not_atomic()` and `store_not_atomic()` methods. As their name suggest,
-these memory operations are not considered to be atomic by the compiler, and
-are therefore not safe to concurrently invoke in multiple threads.
+As a second difference, in addition to familiar atomic memory operations,
+volatile types expose the `load_not_atomic()` and `store_not_atomic()` methods.
+As their name suggest, these memory operations are not considered to be atomic
+by the compiler, and are therefore not safe to concurrently invoke in multiple
+threads.
 
-On the vast majority of hardware platforms supported by Rust, using these
+On the vast majority of targets supported by Rust, however, use of these
 methods will generate exactly the same code as using the `load()` and `store()`
 methods with `Relaxed` memory ordering, with the only difference being that
 data races are Undefined Behavior. When that is the case, safer `Relaxed` atomic

From ca0c36a2e5bf6984769582615abe83301eede4d3 Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 17:47:21 +0200
Subject: [PATCH 10/42] Typo fix

---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index dca89541..103cc3c0 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -106,7 +106,7 @@ unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
 }
 ```
 
-However, notice that volatile types must be manipulated via pointers, instad of
+However, notice that volatile types must be manipulated via pointers, instead of
 Rust references. These unusual and unpleasant ergonomics are necessary in order
 to achieve the desired semantics of manually controlling every access to the
 target memory location, because the mere existence of a Rust reference pointing

From e6bee95716e7dc2419d0955fd70789a542824aed Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 17:49:28 +0200
Subject: [PATCH 11/42] Typo

---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 103cc3c0..e8fdbb16 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -496,7 +496,7 @@ the region is determined to be unobservable to other threads). This would be
 incompatible with the stated goal of precisely controlling memory accesses.
 
 Currently, the only alternative to references in Rust is to use raw pointer
-types. Rust has a number of these, here it is proposes to use `NonNull<T>`
+types. Rust has a number of these, here it is proposed to use `NonNull<T>`
 pointer type because it encodes the non-nullness invariant of the API in code.
 Although `NonNull<T>` is often less ergonomic to manipulate than `*mut T`, the
 use of arbitrary self types make its ergonomics comparable in this case.

From dc559247276e242d641b7ef59df6079a6cdb0cfa Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 18:24:05 +0200
Subject: [PATCH 12/42] Clarify the exotic hardware situation

---
 rfcs/0000-atomic-volatile.md | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index e8fdbb16..3d50d84c 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -123,18 +123,26 @@ As their name suggest, these memory operations are not considered to be atomic
 by the compiler, and are therefore not safe to concurrently invoke in multiple
 threads.
 
-On the vast majority of targets supported by Rust, however, use of these
-methods will generate exactly the same code as using the `load()` and `store()`
+On hardware with global cache coherence, a category which encompasses the vast
+majority of Rust's supported compilation targets, use of these methods will
+generate exactly the same code as using the `load()` and `store()` atomic access
 methods with `Relaxed` memory ordering, with the only difference being that
-data races are Undefined Behavior. When that is the case, safer `Relaxed` atomic
-volatile operations should be preferred to their non-atomic equivalents.
-
-But unfortunately, Rust supports a couple of targets, including the `nvptx`
-assembly used by NVidia GPUs and abstract machines like WASM, where `Relaxed`
-atomic ordering is either unsupported or emulated via a stream of
-instructions that is more complex than plain loads and stores. Supporting
-not-atomic volatile loads and stores is necessary to get minimal `Volatile`
-support on those platforms.
+data races are Undefined Behavior from the compiler's point of view. When that
+is the case, safer `Relaxed` atomic volatile operations should be preferred to
+their non-atomic equivalents.
+
+Unfortunately, however, not all of Rust's compilation targets exhibit global
+cache coherence. GPU hardware, such as the `nvptx` target, may only exhibit
+cache coherence among local "blocks" of threads, and abstract machines like WASM
+may not guarantee cache coherence at all without specific precautions. On those
+compilation targets, `Relaxed` loads and stores may either be unavailable, or
+lead to the generation of multiple machine instructions, which may not be wanted
+where maximal hardware control or performance is desired.
+
+It is only for the sake of providing an alternative to the current
+`ptr::[read|write]_volatile` mechanism on such platforms that the
+`[load|store]_not_atomic()` functions are being proposed, and they should not
+be used where better alternative exists.
 
 Finally, unlike with atomics, the compiler is not allowed to optimize the above
 function into the following...

From 9b9783e72301f8339862b2a8f1ba777e944c1a34 Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 18:24:48 +0200
Subject: [PATCH 13/42] Typo fix

---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 3d50d84c..d6594cb8 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -133,7 +133,7 @@ their non-atomic equivalents.
 
 Unfortunately, however, not all of Rust's compilation targets exhibit global
 cache coherence. GPU hardware, such as the `nvptx` target, may only exhibit
-cache coherence among local "blocks" of threads, and abstract machines like WASM
+cache coherence among local "blocks" of threads. And abstract machines like WASM
 may not guarantee cache coherence at all without specific precautions. On those
 compilation targets, `Relaxed` loads and stores may either be unavailable, or
 lead to the generation of multiple machine instructions, which may not be wanted

From effdd4bbed3a49aeda6e279c6f71e801e6ff876d Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 18:37:02 +0200
Subject: [PATCH 14/42] =?UTF-8?q?Propose=20using=20a=20different=20feature?=
 =?UTF-8?q?=20gate=20for=20not=5Fatomic=20stuff=C3=A9?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 rfcs/0000-atomic-volatile.md | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index d6594cb8..43051142 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -224,10 +224,18 @@ or stronger semantics. The vast majority of hardware which Rust supports today
 and is expected to support in the future exhibits global cache coherence,
 so making volatile feel more at home on such hardware is a sizeable achievement.
 
-However, we obviously must still have something for those exotic platforms whose
-basic memory loads and stores are not cache-coherent, such as `nvptx`. Hence the
-compromise of `load_not_atomic()` and `store_not_atomic()` is still kept around,
-only discouraging its use.
+For exotic platforms whose basic memory loads and stores do not guarantee global
+cache coherence, such as `nvptx`, this RFC adds `load_not_atomic()` and
+`store_not_atomic()` operations. It is unclear at this point in time whether
+these two methods should be stabilized, or an alternative solution such as
+extending Rust's atomic operation model with synchronization guarantees weaker
+than `Relaxed` should be researched.
+
+As this feels like a complex and niche edge case that should not block the most
+generally useful subset of volatile atomic operations, this RFC proposes to
+implement these operations behind a different feature gate, and postpone their
+stabilization until supplementary research has determined whether they are
+truly a necessary evil or not.
 
 ---
 
@@ -597,6 +605,10 @@ absolutely clear that a malicious process cannot cause UB in another process
 by writing data in memory that's shared between the two, no matter if said
 writes are non-atomic, non-volatile, etc.
 
+The necessity of having `load_not_atomic()` and `store_not_atomic()` methods,
+as opposed to alternatives such as weaker-than-`Relaxed` atomics, should be
+researched before stabilizing that subset of this RFC.
+
 
 # Future possibilities
 [future-possibilities]: #future-possibilities

From a195f48531e2572df9b29e2f60eb79436f481bbc Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 18:43:40 +0200
Subject: [PATCH 15/42] Clarification

---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 43051142..4e74ab3e 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -292,7 +292,7 @@ impl VolatileBool {
 
     // NOTE: Unlike with `AtomicBool`, `get_mut()` and `into_inner()` operations
     //       are not provided, because it is never safe to assume that no one
-    //       is concurrently accessing the atomic data. Alternatively, these
+    //       is concurrently accessing volatile data. As an alternative, these
     //       operations could be provided in an unsafe way, if someone can find
     //       a use case for them.
 

From da02234130a06aa90c1c4f860cbcb05acc47f672 Mon Sep 17 00:00:00 2001
From: Hadrien Grasland <grasland@lal.in2p3.fr>
Date: Thu, 17 Oct 2019 18:55:57 +0200
Subject: [PATCH 16/42] Avoid first person

---
 rfcs/0000-atomic-volatile.md | 70 +++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 32 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 4e74ab3e..2353c0a7 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -420,18 +420,18 @@ interprocess communication scenarios.
 However, the fact that these operations do not necessarily compile into a single
 hardware instruction is arguably a footgun for volatile's use cases, and it
 could be argued that initially only stabilizing loads, stores and `Relaxed`
-atomic ordering would be more prudent. I'll go back to this in the alternatives
-section of this RFC.
+atomic ordering would be more prudent. This will be revisited in the
+alternatives section of this RFC.
 
 ---
 
 As currently designed, this RFC uses `arbitrary_safe_types` to give method-like
-semantics to a `NonNull` raw pointer. I believe that this is necessary to get
-reasonable ergonomics with an atomics-like wrapper type approach. However, it
-could also be argued that a `VolatileXyz::store(ptr, data, ordering)` style of
-API could reasonably work, and avoid coupling with unstable features. Similarly,
-the use of `NonNull` itself could be debated. I'll come back to these points in
-the alternatives section as well.
+semantics to a `NonNull` raw pointer. This seems necessary to get reasonable
+ergonomics with an atomics-like wrapper type approach. However, it could also be
+argued that a `VolatileXyz::store(ptr, data, ordering)` style of API would work
+well enough, and avoid coupling with unstable features. Similarly, the use of
+`NonNull` itself could be debated. This will be revisited in the alternatives
+section of the RFC as well.
 
 
 # Drawbacks
@@ -462,13 +462,13 @@ operations as a future extension.
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 
-I think we want to do this because it simultaneously eliminates two well-known
+This effprt seems worthwhile because it simultaneously eliminates two well-known
 and unnecessary volatile footguns (data races and tearing) and opens interesting
 new possibilities in interprocess communication.
 
 But although this feature may seem simple, its design space is actually
 remarquably large, and a great many alternatives were considered before reaching
-the current design proposal. Here is a map of some design knobs that I explored:
+the current design proposal. Here are some design knobs that were explored:
 
 ## Extending `AtomicXyz`
 [extending-atomicxyz]: #extending-atomicxyz
@@ -479,12 +479,12 @@ that atomic ops take an `&self` (which is against the spirit of volatile as it
 is tagged with `dereferenceable` at the LLVM layer).
 
 Instead, one would need to add more methods to atomic wrapper types, which take
-a pointer as a self-parameter. I thought that this API inconsistency would be
-jarring to users, not to mention that potentially having a copy of each atomic
-memory operation with a `_volatile` prefix would get annoying quickly when
-reading through the API docs.
+a pointer as a self-parameter. This API inconsistency was felt to be
+unnecessarily jarring to users, not to mention that potentially having a copy of
+each atomic memory operation with a `_volatile` prefix would reduce the
+readability of the `AtomicXyz` types' API documentation.
 
-Also, it is extremely common to want either all operations on a memory
+In addition, it is extremely common to want either all operations on a memory
 location to be volatile, or none of them. Providing separate wrapper types
 helps enforce this very common usage pattern at the API level.
 
@@ -526,12 +526,12 @@ In this way, most of the proposed API could become safe, and the only thing that
 would remain unsafe would be the `_not_atomic()` operations, which more closely
 reflects the reality of the situation.
 
-## Full atomics vocabulary vs hardware semantics
+## Full atomics vocabulary vs sticking with hardware semantics
 [atomics-vs-hardware]: #atomics-vs-hardware
 
 Currently, this RFC basically proposes exposing a volatile version of every
-atomic operation supported by Rust for maximal expressive power, and I could
-definitely find uses for the new possibilities that this opens in IPC scenarios.
+atomic operation supported by Rust for maximal expressive power, which opens
+new possibilities for shared-memory interprocess communication.
 
 But it could also be argued that this distracts us from volatile's main purpose
 of generating a stream of simple hardware instructions without using inline
@@ -543,25 +543,31 @@ assembly:
   code because they come with data validity invariants.
 
 From this perspective, there would be an argument in favor of only supporting
-`Relaxed` load/stores and machine data types, at least initially. And I could
-get behind that. But since the rest can be useful in IPC with trusted Rust code,
-I thought it might be worth at least considering.
+`Relaxed` load/stores and machine data types, at least initially. In this case,
+one could split this feature into three feature gates:
+
+- `Relaxed` volatile atomic loads and stores, which are most urgently needed.
+- Non-`Relaxed` orderings and read-modify-write atomics, which open new
+  possibilities for shared-memory IPC.
+- `_not_atomic()` operations, where it is not yet clear whether the proposed API
+  is even the right solution to the problem being solved, and more research is
+  needed before reaching a definite conclusion.
 
 
 # Prior art
 [prior-art]: #prior-art
 
-As far as I know, LLVM's notion of atomic volatile, which is being exposed here,
-is rather original. The closest thing that it makes me think about is how Java
-uses the `volatile` keyword for what most other languages call atomics. But
-I'm not sure if Java also retains the other semantics of `volatile` in the C
-family (e.g. "cannot be optimized out ever").
+To the RFC author's knowledge, LLVM's notion of atomic volatile, which is being
+exposed here, is rather original. The closest thing that it reminds of is how
+Java uses the `volatile` keyword for what most other languages call atomics. But
+it does Java's `volatile` also retains the other semantics of `volatile` in the
+C family, such as lack of optimizability?
 
 There is a lot more prior art behind C's notion of volatile, which is closer to
 Rust's current notion of volatile. That being said, most discussions of C/++
 volatile semantics end up in complaints about how ill-specified, how easy it is
-to get it wrong, how "contagious" it can get... so I'm not sure if it's
-something we want to emulate. Besides, Rust's notion of volatile differs
+to get it wrong, how "contagious" it can get... so it isn't clear if it is a
+very good role model to follow. Furthermore, Rust's notion of volatile differs
 fundamentally from C's notion of volatile in that it is based on volatile
 _operations_, not volatile _types_.
 
@@ -590,14 +596,14 @@ region, but `voladdress` is believed not to be affected by this problem.
 # Unresolved questions
 [unresolved-questions]: #unresolved-questions
 
-I expect the RFC process to be an occasion to revisit some points of the
-rationale and alternative sections better than I can do on my own, bring new
-arguments and edge cases on the table, and more generally refine the API.
+The RFC process will be an occasion to revisit some points of the rationale and
+alternative sections better than the author can do on his own, bring new
+arguments and edge cases on the table, and more generally refining the API.
 
 If we decide to implement this, implementation should be reasonably
 straightforward and uneventful, as most of the groundwork has already been done
 over the course of implementing `ptr::read_volatile()`, `ptr::write_volatile()`
-and `std::sync::atomic` (as far as I know at least).
+and `std::sync::atomic` (to the best of the author's knowledge at least).
 
 This RFC will no fully resolve the "untrusted shared memory" use case, because
 doing so also requires work on clarifying LLVM semantics so that it is

From d32a410d095841a52b7303d530c2eeef47234e73 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Thu, 17 Oct 2019 22:08:02 +0200
Subject: [PATCH 17/42] Clarify the assumption of isolated memory

---
 rfcs/0000-atomic-volatile.md | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 2353c0a7..01f09561 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -51,10 +51,13 @@ trusting Rust programs, through lock-free synchronization of shared memory.
 [guide-level-explanation]: #guide-level-explanation
 
 The Rust compiler generally assumes that the program that it is building is
-living in a fully isolated memory space. It leverages this knowledge to
-transform said program's memory access patterns for performance optimization
-purposes, under the assumption that said transformations will not have any
-externally observable effect other than the program running faster.
+living in a fully isolated memory space, where the contents of memory can only
+change if some direct action from the program allows it to change.
+
+It leverages this knowledge to transform said program's memory access patterns
+for performance optimization purposes, under the assumption that said
+transformations will not have any externally observable effect other than
+speeding up the program.
 
 Examples of such transformations include:
 

From 10fe05e841e9ccbe9b1c6454ccabe88ba41c9115 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Thu, 17 Oct 2019 22:21:42 +0200
Subject: [PATCH 18/42] Correct inaccurate section about atomic volatile being
 an LLVM thing

---
 rfcs/0000-atomic-volatile.md | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 01f09561..1fdac302 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -560,14 +560,12 @@ one could split this feature into three feature gates:
 # Prior art
 [prior-art]: #prior-art
 
-To the RFC author's knowledge, LLVM's notion of atomic volatile, which is being
-exposed here, is rather original. The closest thing that it reminds of is how
-Java uses the `volatile` keyword for what most other languages call atomics. But
-it does Java's `volatile` also retains the other semantics of `volatile` in the
-C family, such as lack of optimizability?
-
-There is a lot more prior art behind C's notion of volatile, which is closer to
-Rust's current notion of volatile. That being said, most discussions of C/++
+Atomic volatile accesses exist in C++11 and C11. They are respectively exposed
+in those languages as [volatile overloads of `std::atomic` operations](https://en.cppreference.com/w/cpp/atomic/atomic/exchange) and [just making all atomic operations operate on
+volatile objects](https://en.cppreference.com/w/c/atomic/atomic_load).
+
+There is also a lot of prior art behind C's notion of volatile, which is closer
+to Rust's current notion of volatile. That being said, most discussions of C/++
 volatile semantics end up in complaints about how ill-specified, how easy it is
 to get it wrong, how "contagious" it can get... so it isn't clear if it is a
 very good role model to follow. Furthermore, Rust's notion of volatile differs

From 24b77e0337343ad6f44decb2186297e5b0f18255 Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Fri, 18 Oct 2019 16:17:37 +0200
Subject: [PATCH 19/42] Kill another incorrect "atomic volatile is
 LLVM-specific" + typo-hunting

---
 rfcs/0000-atomic-volatile.md | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 1fdac302..cf4eb835 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -446,9 +446,6 @@ do the same thing, which is equally bad. It could be argued that the issues with
 the existing volatile operations, while real, do not warrant the full complexity
 of this RFC.
 
-Atomic volatile operations are also a somewhat LLVM-specific concept, and
-requiring them may make the life of our other compiler backends harder.
-
 As mentioned above, exposing more than loads and stores, and non-`Relaxed`
 atomic memory orderings, also muddies the "a volatile op should compile into one
 hardware memory instruction" narrative that is so convenient for loads and stores.
@@ -465,7 +462,7 @@ operations as a future extension.
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 
-This effprt seems worthwhile because it simultaneously eliminates two well-known
+This effort seems worthwhile because it simultaneously eliminates two well-known
 and unnecessary volatile footguns (data races and tearing) and opens interesting
 new possibilities in interprocess communication.
 

From 636f5f7512f1dce6b5e2328db21d477e237ac984 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Fri, 8 Nov 2019 11:01:24 +0100
Subject: [PATCH 20/42] Typo nazi

---
 rfcs/0000-atomic-volatile.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index cf4eb835..22cf3830 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -61,10 +61,10 @@ speeding up the program.
 
 Examples of such transformations include:
 
-- Caching data from RAM into CPU registers.
-- Eliminating unused loads and unobservable stores.
-- Merging neighboring stores with each other.
-- Only updating the part of an over-written struct that has actually changed.
+- caching data from RAM into CPU registers,
+- eliminating unused loads and unobservable stores,
+- merging neighboring stores with each other,
+- only updating the part of an over-written struct that has actually changed.
 
 Although these optimizations are most of the time correct and useful, there are
 some situations where they are inappropriate, including but not limited to:
@@ -109,7 +109,7 @@ unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
 }
 ```
 
-However, notice that volatile types must be manipulated via pointers, instead of
+Notice that volatile types must be manipulated via pointers, instead of
 Rust references. These unusual and unpleasant ergonomics are necessary in order
 to achieve the desired semantics of manually controlling every access to the
 target memory location, because the mere existence of a Rust reference pointing

From 375f279e156e1afa92bcc3bc31b4ddd4184b4cf4 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Fri, 8 Nov 2019 11:02:01 +0100
Subject: [PATCH 21/42] Remove empty sentence

---
 rfcs/0000-atomic-volatile.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 22cf3830..f0a6e327 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -594,10 +594,6 @@ region, but `voladdress` is believed not to be affected by this problem.
 # Unresolved questions
 [unresolved-questions]: #unresolved-questions
 
-The RFC process will be an occasion to revisit some points of the rationale and
-alternative sections better than the author can do on his own, bring new
-arguments and edge cases on the table, and more generally refining the API.
-
 If we decide to implement this, implementation should be reasonably
 straightforward and uneventful, as most of the groundwork has already been done
 over the course of implementing `ptr::read_volatile()`, `ptr::write_volatile()`

From b272333d1fe2126e9ab9dd0087af652bbf6b262d Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Fri, 8 Nov 2019 11:02:39 +0100
Subject: [PATCH 22/42] Add unresolved discussion on shared memory, expand on
 untrusted shared memory

---
 rfcs/0000-atomic-volatile.md | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index f0a6e327..d81cb082 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -599,12 +599,42 @@ straightforward and uneventful, as most of the groundwork has already been done
 over the course of implementing `ptr::read_volatile()`, `ptr::write_volatile()`
 and `std::sync::atomic` (to the best of the author's knowledge at least).
 
-This RFC will no fully resolve the "untrusted shared memory" use case, because
-doing so also requires work on clarifying LLVM semantics so that it is
-absolutely clear that a malicious process cannot cause UB in another process
+## Should shared-memory IPC always be volatile?
+
+There is [some ongoing discussion](https://github.com/rust-lang/unsafe-code-guidelines/issues/215)
+in the Unsafe Code Guidelines group concerning whether a Rust implementation
+should assume the existence of unknown threads of execution, even when no
+concrete action has been taken to spawn such threads. This choice is a tradeoff
+between Rust code performance and FFI ergonomics.
+
+Depending on the outcome of this discussion, use cases such as shared-memory
+interprocess communication, which do involve external threads which the Rust
+implementation has no knowledge of, may or may not need to be volatile.
+
+- If we go in the "maximal Rust performance" direction, then every access to
+  shared memory must be marked volatile because the Rust compiler is allowed to
+  optimize it out if it is not subsequently used by Rust code (or used by Rust
+  code in a sufficiently restricted way).
+- If we go in the "maximal FFI ergonomics" direction, then volatile accesses are
+  only needed when they are not coupled with atomics-based synchronization, as
+  the mere presence of atomics acts as a trigger that disables the above
+  optimizations.
+
+## Untrusted shared memory
+
+Although it performs a step in the right direction by strengthening the
+definition of volatile accesses to result the amount of possible avenues for
+undefined behavior, this RFC will no fully resolve the "untrusted shared memory"
+use case, where Rust code is interacting with untrusted arbitrary code via a
+shared memory region.
+
+Doing so would also require work on clarifying LLVM semantics so that it is
+absolutely clear that a malicious process cannot cause UB in another process by
 by writing data in memory that's shared between the two, no matter if said
 writes are non-atomic, non-volatile, etc.
 
+## Necessity of non-atomic operations
+
 The necessity of having `load_not_atomic()` and `store_not_atomic()` methods,
 as opposed to alternatives such as weaker-than-`Relaxed` atomics, should be
 researched before stabilizing that subset of this RFC.

From cf87a6420420f7b713a7f8708567ed8ed46c2ad4 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Fri, 8 Nov 2019 11:51:47 +0100
Subject: [PATCH 23/42] Review discussion of the IPC use case

---
 rfcs/0000-atomic-volatile.md | 80 +++++++++++++++++++++---------------
 1 file changed, 47 insertions(+), 33 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index d81cb082..b1a3d9e9 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -42,17 +42,14 @@ from hardware behavior and therefore bring volatile operations closer to the
 This reduces the odd of mistake in programs operating outside of the regular
 Rust Abstract Machine semantics, which are notoriously hard to get right.
 
-As an unexpected but attractive side-effect, atomic volatile memory operations
-also enable higher-performance interprocess communication between mutually
-trusting Rust programs, through lock-free synchronization of shared memory.
-
 
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation
 
 The Rust compiler generally assumes that the program that it is building is
 living in a fully isolated memory space, where the contents of memory can only
-change if some direct action from the program allows it to change.
+change if some direct action from the program (including FFI or atomic memory
+operations) allows it to change.
 
 It leverages this knowledge to transform said program's memory access patterns
 for performance optimization purposes, under the assumption that said
@@ -67,7 +64,7 @@ Examples of such transformations include:
 - only updating the part of an over-written struct that has actually changed.
 
 Although these optimizations are most of the time correct and useful, there are
-some situations where they are inappropriate, including but not limited to:
+some situations where they are inappropriate, in areas such as:
 
 - [Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O), a common
   low-level communication protocol between CPUs and peripherals, where hardware
@@ -75,15 +72,23 @@ some situations where they are inappropriate, including but not limited to:
   accessing said registers in very specific load/store patterns.
 - [Shared memory](https://en.wikipedia.org/wiki/Shared_memory), a form of
   inter-process communication where two programs can communicate via a common
-  memory block, and it is therefore not appropriate for Rust to assume that it
-  is aware of all memory accesses occurring inside said memory block.
+  memory block, which means that stores are externally observable and loads are
+  not guaranteed to return consistent results.
+- Advanced uses of [virtual memory](https://en.wikipedia.org/wiki/Virtual_memory)
+  where the mere action of reading from or writing to memory may trigger
+  execution of arbitrary code by the operating system.
 - [Cryptography](https://en.wikipedia.org/wiki/Cryptography), where it is
   extremely important to ensure that sensitive information is erased after use,
   and is not leaked via indirect means such as recognizable scaling patterns in
   the time taken by a system to process attacker-crafted inputs.
 
-In such circumstances, it may be necessary to assert precise manual control on
-the memory accesses that are carried out by a Rust program in a certain memory
+In all those circumstances, though for different reasons, it may be important to
+guarantee that memory loads and stores do occur, because they have externally
+observable side-effects outside of the Rust program being optimized, and may be
+subjected to unpredictable side-effects from the outside world.
+
+And in that cas, it is useful to be able to assert precise manual control on the
+memory accesses that are carried out by a Rust program in a certain memory
 region. This is the purpose of _volatile memory operations_, which allow a Rust
 programmers to generate a carefully controlled stream of hardware memory load
 and store instructions, which is guaranteed to be left untouched by the Rust
@@ -139,8 +144,9 @@ cache coherence. GPU hardware, such as the `nvptx` target, may only exhibit
 cache coherence among local "blocks" of threads. And abstract machines like WASM
 may not guarantee cache coherence at all without specific precautions. On those
 compilation targets, `Relaxed` loads and stores may either be unavailable, or
-lead to the generation of multiple machine instructions, which may not be wanted
-where maximal hardware control or performance is desired.
+lead to the generation of machine instructions more complex than native loads
+and stores, which may not be wanted where maximal hardware control or CPU
+performance is desired.
 
 It is only for the sake of providing an alternative to the current
 `ptr::[read|write]_volatile` mechanism on such platforms that the
@@ -178,7 +184,7 @@ equivalent methods of raw pointer objects, and wonder how the new facilities
 provided by `std::volatile` compare to those methods.
 
 The answer is that this new volatile data access API supersedes its predecessor,
-which is now _deprecated_, by improving upon it in many different ways:
+which is now _deprecated_, by improving upon it in several ways:
 
 - The data race semantics of `Relaxed` volatile data accesses more closely
   matches the data race semantics of most hardware, and therefore eliminates an
@@ -191,10 +197,12 @@ which is now _deprecated_, by improving upon it in many different ways:
   atomics, if a `Volatile` wrapper type is provided by Rust, the underlying
   hardware is guaranteed to support memory operations of that width.
 - The ability to specify stronger-than-`Relaxed` memory orderings and to use 
-  memory operations other than loads and stores enables new use cases which were
-  not achievable before without exploiting Undefined Behavior, such as
-  high-performance synchronization of mutually trusting Rust processes
-  via lock-free data structures in shared memory.
+  memory operations other than loads and stores enables Rust to draw a clear
+  distinction between atomic operations which are meant to synchronize normal
+  Rust code and atomic operations which are meant to synchronize with arbitrary
+  FFI edge cases (such as threads spawned by LD_PRELOAD unbeknownst to the Rust
+  compiler), which in turn would enable better optimization of atomic operations
+  in the vast majority of Rust programs.
 
 
 # Reference-level explanation
@@ -211,11 +219,9 @@ operations, do not achieve this goal very well because:
 
 1. Their data race semantics do not match the data race semantics of the
    hardware which volatile is supposed to defer to, and as a result are
-   unnecessarily surprising and unsafe. They prevent some synchronization
-   patterns which are legal at the hardware level but not at the abstract
-   machine level, such as the "seqlock", to be expressed using volatile
-   operations. No useful optimization opportunities are opened by this undefined
-   behavior, since volatile operations cannot be meaningfully optimized.
+   unnecessarily surprising and unsafe. No useful optimization opportunities are
+   opened by this undefined behavior, since volatile operations cannot be
+   meaningfully optimized.
 2. The absence of a hard guarantee that each volatile load or store will
    translate into exactly one load or store at the hardware level is needlessly
    tracherous in scenarios where memory access patterns must be very precisely
@@ -417,8 +423,10 @@ reduce effort duplication between these two closely related functionalities.
 
 This RFC currently proposes to expose the full power of LLVM's atomic volatile
 operations, including e.g. read-modify-write operations like compare-and-swap,
-because there are legitimately useful use cases for these operations in
-interprocess communication scenarios.
+because it is consistent with the atomics operation API and could have
+legitimate uses in interprocess communication scenarios, as a marker of the
+nuance between well-optimized program-local synchronization and worst-case FFI
+synchronization. See Unresolved Questions section for more details.
 
 However, the fact that these operations do not necessarily compile into a single
 hardware instruction is arguably a footgun for volatile's use cases, and it
@@ -448,10 +456,10 @@ of this RFC.
 
 As mentioned above, exposing more than loads and stores, and non-`Relaxed`
 atomic memory orderings, also muddies the "a volatile op should compile into one
-hardware memory instruction" narrative that is so convenient for loads and stores.
-Further, compatibility of non-load/store atomics in IPC scenario may require
-some ABI agreement on how atomics should be implemented between the interacting
-programs.
+hardware memory instruction" narrative that is so convenient for loads and
+stores. Further, compatibility of non-load/store atomics in IPC scenario may
+require some ABI agreement on how atomics should be implemented between the
+interacting programs.
 
 If this is perceived to be a problem, we could decide to do away with some of
 the complexity by initially focusing on a restricted subset of this proposal
@@ -526,12 +534,17 @@ In this way, most of the proposed API could become safe, and the only thing that
 would remain unsafe would be the `_not_atomic()` operations, which more closely
 reflects the reality of the situation.
 
+One possible way to achieve this result would be to introduce a way to disable
+the licence to insert arbitrary loads from Rust references that the compiler
+normally has. For example, it has been proposed before that `&UnsafeCell<T>`
+should not exhibit this behavior. This could be enough for `VolatileXyz`'s need
+if it were extended to transparent newtypes of `UnsafeCell<Xyz>`.
+
 ## Full atomics vocabulary vs sticking with hardware semantics
 [atomics-vs-hardware]: #atomics-vs-hardware
 
 Currently, this RFC basically proposes exposing a volatile version of every
-atomic operation supported by Rust for maximal expressive power, which opens
-new possibilities for shared-memory interprocess communication.
+atomic operation supported by Rust for maximal expressive power.
 
 But it could also be argued that this distracts us from volatile's main purpose
 of generating a stream of simple hardware instructions without using inline
@@ -609,12 +622,13 @@ between Rust code performance and FFI ergonomics.
 
 Depending on the outcome of this discussion, use cases such as shared-memory
 interprocess communication, which do involve external threads which the Rust
-implementation has no knowledge of, may or may not need to be volatile.
+implementation has no knowledge of, may or may not require systematic use of
+volatile memory accesses.
 
 - If we go in the "maximal Rust performance" direction, then every access to
   shared memory must be marked volatile because the Rust compiler is allowed to
-  optimize it out if it is not subsequently used by Rust code (or used by Rust
-  code in a sufficiently restricted way).
+  optimize it out if it is not subsequently used by Rust code (or if it can
+  transform the Rust code to eliminate that use).
 - If we go in the "maximal FFI ergonomics" direction, then volatile accesses are
   only needed when they are not coupled with atomics-based synchronization, as
   the mere presence of atomics acts as a trigger that disables the above

From 1c83d8aeb73d1ca01522c8f09599f01e8a153650 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Fri, 8 Nov 2019 12:16:26 +0100
Subject: [PATCH 24/42] Forgotten detail

---
 rfcs/0000-atomic-volatile.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index b1a3d9e9..abcf8eb6 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -202,7 +202,8 @@ which is now _deprecated_, by improving upon it in several ways:
   Rust code and atomic operations which are meant to synchronize with arbitrary
   FFI edge cases (such as threads spawned by LD_PRELOAD unbeknownst to the Rust
   compiler), which in turn would enable better optimization of atomic operations
-  in the vast majority of Rust programs.
+  in the vast majority of Rust programs, as will be further discussed in the
+  Unresolved Questions section.
 
 
 # Reference-level explanation

From 8f6b20bacd407f5962d3077c624d5989bfecfffa Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Thu, 21 Nov 2019 08:06:42 +0100
Subject: [PATCH 25/42] Update rfcs/0000-atomic-volatile.md

Co-Authored-By: Ralf Jung <post@ralfj.de>
---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index abcf8eb6..7be68130 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -437,7 +437,7 @@ alternatives section of this RFC.
 
 ---
 
-As currently designed, this RFC uses `arbitrary_safe_types` to give method-like
+As currently designed, this RFC uses `arbitrary_self_types` to give method-like
 semantics to a `NonNull` raw pointer. This seems necessary to get reasonable
 ergonomics with an atomics-like wrapper type approach. However, it could also be
 argued that a `VolatileXyz::store(ptr, data, ordering)` style of API would work

From e25834796ceb4be4890ce9dbeb2da9a1fc4b0037 Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Thu, 21 Nov 2019 08:07:11 +0100
Subject: [PATCH 26/42] Update rfcs/0000-atomic-volatile.md

Co-Authored-By: Ralf Jung <post@ralfj.de>
---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 7be68130..9914c9ac 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -119,7 +119,7 @@ Rust references. These unusual and unpleasant ergonomics are necessary in order
 to achieve the desired semantics of manually controlling every access to the
 target memory location, because the mere existence of a Rust reference pointing
 to a memory region allows the Rust compiler to generate memory operations
-targeting this region (be they prefetches, register spills...).
+targeting this region (be they prefetches, register spills, ...).
 
 Because a Rust pointer is not subjected to borrow checking and has no obligation
 of pointing towards a valid memory location, this means that using a Volatile

From 7b5af337625c90f3a708de45848bed163d70d8ce Mon Sep 17 00:00:00 2001
From: Hadrien G <grasland@lal.in2p3.fr>
Date: Thu, 21 Nov 2019 08:07:39 +0100
Subject: [PATCH 27/42] Update rfcs/0000-atomic-volatile.md

Co-Authored-By: Ralf Jung <post@ralfj.de>
---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 9914c9ac..5b0fce4e 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -87,7 +87,7 @@ guarantee that memory loads and stores do occur, because they have externally
 observable side-effects outside of the Rust program being optimized, and may be
 subjected to unpredictable side-effects from the outside world.
 
-And in that cas, it is useful to be able to assert precise manual control on the
+And in that case, it is useful to be able to assert precise manual control on the
 memory accesses that are carried out by a Rust program in a certain memory
 region. This is the purpose of _volatile memory operations_, which allow a Rust
 programmers to generate a carefully controlled stream of hardware memory load

From ff9d1f19aa3b641b88cfd12be7a258b33a4a2d59 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 23 Nov 2019 13:35:16 +0100
Subject: [PATCH 28/42] Disambiguate shared-memory IPC from shared-memory
 concurrency

---
 rfcs/0000-atomic-volatile.md | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 5b0fce4e..d6dedd18 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -20,7 +20,7 @@ operations on every platform that has support for them.
 Volatile operations are meant to be an escape hatch that allows a Rust
 programmer to invoke hardware-level memory access semantics that are not
 properly accounted for by the Rust Abstract Machine. They work by triggering the
-(mostly) unoptimized generation of a matching stream of hardware load and store
+largely unoptimized generation of a matching stream of hardware load and store
 instructions in the output machine code.
 
 Unfortunately, the volatile operations that are currently exposed by Rust, which
@@ -70,10 +70,12 @@ some situations where they are inappropriate, in areas such as:
   low-level communication protocol between CPUs and peripherals, where hardware
   registers masquerading as memory can be used to program peripherals by
   accessing said registers in very specific load/store patterns.
-- [Shared memory](https://en.wikipedia.org/wiki/Shared_memory), a form of
+- [Shared-memory IPC](https://en.wikipedia.org/wiki/Shared_memory), a form of
   inter-process communication where two programs can communicate via a common
   memory block, which means that stores are externally observable and loads are
-  not guaranteed to return consistent results.
+  not guaranteed to return consistent results. This is not to be confused with
+  shared-memory concurrency, which refers to sharing of memory between multiple
+  threads running within the same process.
 - Advanced uses of [virtual memory](https://en.wikipedia.org/wiki/Virtual_memory)
   where the mere action of reading from or writing to memory may trigger
   execution of arbitrary code by the operating system.
@@ -506,10 +508,10 @@ operations as inherent methods, i.e. `VolatileU8::load(a, Ordering::Relaxed)`.
 
 This has the advantage of avoiding coupling this feature to another unstable
 feature. But it has the drawback of being incredibly verbose. Typing the same
-volatile type name over and over again in a complex shared memory transaction
-would certainly get old and turn annoying quickly, and we don't want anger to
-distract low-level developers from the delicate task of implementing the kind
-of subtle algorithm that requires volatile operations.
+volatile type name over and over again in a complex transaction would certainly
+get old and turn annoying quickly, and we don't want anger to distract low-level
+developers from the delicate task of implementing the kind of subtle algorithm
+that requires volatile operations.
 
 ## `NonNull<T>` vs `*mut T` vs `*const T` vs other
 [pointers]: #pointers
@@ -635,7 +637,7 @@ volatile memory accesses.
   the mere presence of atomics acts as a trigger that disables the above
   optimizations.
 
-## Untrusted shared memory
+## Untrusted shared-memory IPC
 
 Although it performs a step in the right direction by strengthening the
 definition of volatile accesses to result the amount of possible avenues for
@@ -659,8 +661,8 @@ researched before stabilizing that subset of this RFC.
 [future-possibilities]: #future-possibilities
 
 As mentioned above, this RFC is a step forward in addressing the untrusted
-shared memory use case that is of interest to many "supervisor" programs, but
-not the end of that story. Finishing it will likely require LLVM assistance.
+shared-memory IPC use case that is of interest to many "supervisor" programs,
+but not the end of that story. Finishing it will likely require LLVM assistance.
 
 If we decide to drop advanced atomic orderings and operations from this RFC,
 then they will fall out in this section.

From 693d6c1b3ba04c8ad85006c4fb115f9c8fea7c44 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 23 Nov 2019 13:44:32 +0100
Subject: [PATCH 29/42] Talk about spills

---
 rfcs/0000-atomic-volatile.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index d6dedd18..c5f74f30 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -59,6 +59,7 @@ speeding up the program.
 Examples of such transformations include:
 
 - caching data from RAM into CPU registers,
+- spilling CPU registers into accessible RAM locations,
 - eliminating unused loads and unobservable stores,
 - merging neighboring stores with each other,
 - only updating the part of an over-written struct that has actually changed.

From 9cb9328ed4aa4b42f652a9214911b69a12907d26 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 23 Nov 2019 13:44:43 +0100
Subject: [PATCH 30/42] Talk about barrier issues

---
 rfcs/0000-atomic-volatile.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index c5f74f30..2eb330b5 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -70,7 +70,8 @@ some situations where they are inappropriate, in areas such as:
 - [Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O), a common
   low-level communication protocol between CPUs and peripherals, where hardware
   registers masquerading as memory can be used to program peripherals by
-  accessing said registers in very specific load/store patterns.
+  accessing said registers in very specific load/store patterns (possibly
+  coupled with hardware-specific CPU cache configurations and memory barriers).
 - [Shared-memory IPC](https://en.wikipedia.org/wiki/Shared_memory), a form of
   inter-process communication where two programs can communicate via a common
   memory block, which means that stores are externally observable and loads are

From bfcb98b60dd3d7958e48c488f764fcee5fec46d8 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 23 Nov 2019 13:44:58 +0100
Subject: [PATCH 31/42] Talk about limitations of volatile for virtual memory

---
 rfcs/0000-atomic-volatile.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 2eb330b5..bbf516ff 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -80,7 +80,9 @@ some situations where they are inappropriate, in areas such as:
   threads running within the same process.
 - Advanced uses of [virtual memory](https://en.wikipedia.org/wiki/Virtual_memory)
   where the mere action of reading from or writing to memory may trigger
-  execution of arbitrary code by the operating system.
+  execution of arbitrary code by the operating system. Note that even when using
+  volatile accesses, [some sanity restrictions](https://llvm.org/docs/LangRef.html#volatile-memory-accesses)
+  are imposed by LLVM here to allow optimization of surrouding code.
 - [Cryptography](https://en.wikipedia.org/wiki/Cryptography), where it is
   extremely important to ensure that sensitive information is erased after use,
   and is not leaked via indirect means such as recognizable scaling patterns in

From b573dd4e63df3c1acc95f5e6d5a3ec312f3e0b74 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 23 Nov 2019 14:44:49 +0100
Subject: [PATCH 32/42] Weaken wording of how optimizations interact with
 volatile

---
 rfcs/0000-atomic-volatile.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index bbf516ff..5b3a5a9e 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -93,8 +93,8 @@ guarantee that memory loads and stores do occur, because they have externally
 observable side-effects outside of the Rust program being optimized, and may be
 subjected to unpredictable side-effects from the outside world.
 
-And in that case, it is useful to be able to assert precise manual control on the
-memory accesses that are carried out by a Rust program in a certain memory
+And in that case, it is useful to be able to assert precise manual control on
+the memory accesses that are carried out by a Rust program in a certain memory
 region. This is the purpose of _volatile memory operations_, which allow a Rust
 programmers to generate a carefully controlled stream of hardware memory load
 and store instructions, which is guaranteed to be left untouched by the Rust
@@ -178,9 +178,11 @@ unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
 }
 ```
 
-This is the definining characteristic of volatile operations, which makes them
-suitable for sensitive memory manipulations such as cryptographic secret erasure
-or memory-mapped I/O.
+The fact that hardware loads and stores must be emitted even when the compiler's
+optimizer can predict the results of loads or assert that stores will have no
+effect on program execution is one of the most central characteristics of
+volatile operations, it is what makes these operations suitable for sensitive
+memory manipulations such as cryptographic secret erasure or memory-mapped I/O.
 
 ---
 
@@ -644,7 +646,7 @@ volatile memory accesses.
 ## Untrusted shared-memory IPC
 
 Although it performs a step in the right direction by strengthening the
-definition of volatile accesses to result the amount of possible avenues for
+definition of volatile accesses to reduce the amount of possible avenues for
 undefined behavior, this RFC will no fully resolve the "untrusted shared memory"
 use case, where Rust code is interacting with untrusted arbitrary code via a
 shared memory region.

From b81738e05567fa6fd50e8fdcba0e99ce5d10199c Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 23 Nov 2019 14:45:20 +0100
Subject: [PATCH 33/42] Shorten sentence

---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 5b3a5a9e..72991bf2 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -181,7 +181,7 @@ unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
 The fact that hardware loads and stores must be emitted even when the compiler's
 optimizer can predict the results of loads or assert that stores will have no
 effect on program execution is one of the most central characteristics of
-volatile operations, it is what makes these operations suitable for sensitive
+volatile operations. It is what makes these operations suitable for sensitive
 memory manipulations such as cryptographic secret erasure or memory-mapped I/O.
 
 ---

From bab46454fdeaeff045af3f31b68e5a8229be380c Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Mon, 25 Nov 2019 08:40:14 +0100
Subject: [PATCH 34/42] Do not deprecate old read/write_volatile methods

---
 rfcs/0000-atomic-volatile.md | 29 +++++++++++++++++++----------
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 72991bf2..dfe6be2e 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -9,9 +9,9 @@
 Introduce a set of `core::volatile::VolatileXyz` structs, modeled after the
 existing `core::sync::atomic::AtomicXyz` API, which expose volatile loads and
 stores of natively supported width with both atomic and non-atomic semantics.
-Deprecate `ptr::[read|write]_volatile` and the corresponding methods of pointer
-types, and recommend that they be replaced with `Relaxed` atomic volatile
-operations on every platform that has support for them.
+Recommend that `ptr::[read|write]_volatile` and the corresponding methods of
+pointer types be replaced with atomic volatile operations on every platform that
+has support for them.
 
 
 # Motivation
@@ -191,8 +191,8 @@ Experienced Rust users may be familiar with the previously existing
 equivalent methods of raw pointer objects, and wonder how the new facilities
 provided by `std::volatile` compare to those methods.
 
-The answer is that this new volatile data access API supersedes its predecessor,
-which is now _deprecated_, by improving upon it in several ways:
+The answer is that this new volatile data access API should be preferred to its
+predecessor in almost every use case, as it improves upon it in several ways:
 
 - The data race semantics of `Relaxed` volatile data accesses more closely
   matches the data race semantics of most hardware, and therefore eliminates an
@@ -204,6 +204,8 @@ which is now _deprecated_, by improving upon it in several ways:
   into multiple hardware-level memory operations. In the same manner as with
   atomics, if a `Volatile` wrapper type is provided by Rust, the underlying
   hardware is guaranteed to support memory operations of that width.
+- `VolatileXyz` wrappers more strongly encourage developers to refrain from
+  mixing volatile and non-volatile memory accesses, which is usually a mistake.
 - The ability to specify stronger-than-`Relaxed` memory orderings and to use 
   memory operations other than loads and stores enables Rust to draw a clear
   distinction between atomic operations which are meant to synchronize normal
@@ -457,11 +459,11 @@ section of the RFC as well.
 # Drawbacks
 [drawbacks]: #drawbacks
 
-Deprecating the existing volatile operations will cause language churn. Not
-deprecating them will keep around two different and subtly incompatible ways to
-do the same thing, which is equally bad. It could be argued that the issues with
-the existing volatile operations, while real, do not warrant the full complexity
-of this RFC.
+Keeping around two different and subtly incompatible ways to do almost the same
+thing, in the form of these new wrappers and the old volatile read/write methods
+of pointer types, is unsatisfying. It could be argued that the issues with the
+existing volatile operations, while real, do not warrant the full complexity of
+this RFC.
 
 As mentioned above, exposing more than loads and stores, and non-`Relaxed`
 atomic memory orderings, also muddies the "a volatile op should compile into one
@@ -675,3 +677,10 @@ then they will fall out in this section.
 
 This RFC would also benefit from a safer way to interact with volatile memory
 regions than raw pointers.
+
+Finally, one could consider `ptr::[read|write]_volatile` and the corresponding
+methods of pointer types as candidates for future deprecation, as they provide
+less clear semantics than atomic volatile accesses (e.g. no guarantee of being
+exempted from the definition of data races) for no clear benefit. As a more
+backward-compatible alternative, one could also reimplement those methods using
+a loop of atomic volatile operations of unspecified width.

From 10de5daa83d601ca5130191a676eb31fee435904 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 08:07:43 +0100
Subject: [PATCH 35/42] Reorganize unresolved questions & future possibilities

---
 rfcs/0000-atomic-volatile.md | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index dfe6be2e..40edab17 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -645,6 +645,23 @@ volatile memory accesses.
   the mere presence of atomics acts as a trigger that disables the above
   optimizations.
 
+## Necessity of non-atomic operations
+
+The necessity of having `load_not_atomic()` and `store_not_atomic()` methods,
+as opposed to alternatives such as weaker-than-`Relaxed` atomics, should be
+researched before stabilizing that subset of this RFC.
+
+## Safer self types
+
+This RFC would also benefit from a safer way to interact with volatile memory
+regions than raw pointers, by providing a way to opt out of LLVM's
+"dereferenceable" semantics without also having to opt out from all the
+memory-safety guarantees provided by the Rust borrow checker.
+
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
 ## Untrusted shared-memory IPC
 
 Although it performs a step in the right direction by strengthening the
@@ -658,25 +675,12 @@ absolutely clear that a malicious process cannot cause UB in another process by
 by writing data in memory that's shared between the two, no matter if said
 writes are non-atomic, non-volatile, etc.
 
-## Necessity of non-atomic operations
-
-The necessity of having `load_not_atomic()` and `store_not_atomic()` methods,
-as opposed to alternatives such as weaker-than-`Relaxed` atomics, should be
-researched before stabilizing that subset of this RFC.
-
-
-# Future possibilities
-[future-possibilities]: #future-possibilities
-
-As mentioned above, this RFC is a step forward in addressing the untrusted
-shared-memory IPC use case that is of interest to many "supervisor" programs,
-but not the end of that story. Finishing it will likely require LLVM assistance.
+## Volatile atomic operations
 
 If we decide to drop advanced atomic orderings and operations from this RFC,
 then they will fall out in this section.
 
-This RFC would also benefit from a safer way to interact with volatile memory
-regions than raw pointers.
+## Deprecating `ptr::[read|write]_volatile`
 
 Finally, one could consider `ptr::[read|write]_volatile` and the corresponding
 methods of pointer types as candidates for future deprecation, as they provide

From e57bcf80396f6cf272044e5580c12016097c97ff Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 15:34:44 +0100
Subject: [PATCH 36/42] Remove ordering, rmw, make unordered, and restructure

---
 rfcs/0000-atomic-volatile.md | 608 ++++++++++++++++++-----------------
 1 file changed, 318 insertions(+), 290 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 40edab17..a7371b72 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -6,12 +6,12 @@
 # Summary
 [summary]: #summary
 
-Introduce a set of `core::volatile::VolatileXyz` structs, modeled after the
-existing `core::sync::atomic::AtomicXyz` API, which expose volatile loads and
-stores of natively supported width with both atomic and non-atomic semantics.
-Recommend that `ptr::[read|write]_volatile` and the corresponding methods of
-pointer types be replaced with atomic volatile operations on every platform that
-has support for them.
+Introduce a set of `core::volatile::VolatileXyz` structs, roughly modeled after
+the existing `core::sync::atomic::AtomicXyz` APIs, which expose volatile loads
+and stores of machine-native width with atomic semantics. Recommend that these
+operations be preferred to `ptr::[read|write]_volatile` and the corresponding
+methods of pointer types, as they have stronger semantics that are almost always
+closer to user intent.
 
 
 # Motivation
@@ -33,7 +33,7 @@ memory access semantics of mainstream hardware in two major ways:
 2. Using an overly wide volatile load or store operation which cannot be carried
    out by a single hardware load and store instruction will not result in a
    compilation error, but in the silent emission of multiple hardware load or
-   store instructions, which might be a logic error in the users' program.
+   store instructions, which might be a logic error in some programs.
 
 By implementing support for LLVM's atomic volatile operations, and encouraging
 their use on every hardware that supports them, we eliminate these divergences
@@ -46,6 +46,9 @@ Rust Abstract Machine semantics, which are notoriously hard to get right.
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation
 
+## Volatile primer
+[volatile-primer]: #volatile-primer
+
 The Rust compiler generally assumes that the program that it is building is
 living in a fully isolated memory space, where the contents of memory can only
 change if some direct action from the program (including FFI or atomic memory
@@ -70,8 +73,7 @@ some situations where they are inappropriate, in areas such as:
 - [Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O), a common
   low-level communication protocol between CPUs and peripherals, where hardware
   registers masquerading as memory can be used to program peripherals by
-  accessing said registers in very specific load/store patterns (possibly
-  coupled with hardware-specific CPU cache configurations and memory barriers).
+  accessing said registers in very specific load/store patterns.
 - [Shared-memory IPC](https://en.wikipedia.org/wiki/Shared_memory), a form of
   inter-process communication where two programs can communicate via a common
   memory block, which means that stores are externally observable and loads are
@@ -82,18 +84,19 @@ some situations where they are inappropriate, in areas such as:
   where the mere action of reading from or writing to memory may trigger
   execution of arbitrary code by the operating system. Note that even when using
   volatile accesses, [some sanity restrictions](https://llvm.org/docs/LangRef.html#volatile-memory-accesses)
-  are imposed by LLVM here to allow optimization of surrouding code.
+  are imposed by LLVM here to allow optimization of surrounding code.
 - [Cryptography](https://en.wikipedia.org/wiki/Cryptography), where it is
   extremely important to ensure that sensitive information is erased after use,
   and is not leaked via indirect means such as recognizable scaling patterns in
   the time taken by a system to process attacker-crafted inputs.
 
 In all those circumstances, though for different reasons, it may be important to
-guarantee that memory loads and stores do occur, because they have externally
-observable side-effects outside of the Rust program being optimized, and may be
-subjected to unpredictable side-effects from the outside world.
+guarantee that memory loads and stores specified in the program do occur at the
+hardware level, because they have externally observable side-effects outside of
+the Rust program being optimized, and may be subjected to unpredictable
+side-effects from the outside world.
 
-And in that case, it is useful to be able to assert precise manual control on
+When that happens, it is useful to be able to assert precise manual control on
 the memory accesses that are carried out by a Rust program in a certain memory
 region. This is the purpose of _volatile memory operations_, which allow a Rust
 programmers to generate a carefully controlled stream of hardware memory load
@@ -101,7 +104,11 @@ and store instructions, which is guaranteed to be left untouched by the Rust
 compiler's optimizer even though surrounding Rust code will continue to be
 optimized as usual.
 
----
+## API guide
+[api-guide]: #api-guide
+
+### Basic usage
+[basic-usage]: #basic-usage
 
 Volatile memory operations are exposed in the `std::volatile` module of the Rust
 standard library, or alternatively in the `core::volatile` module for the
@@ -110,13 +117,12 @@ fixed-size data wrappers that are somewhat reminiscent of the API used for
 atomic operations in `std::sync::atomic`:
 
 ```rust
-use std::sync::atomic::Ordering;
 use std::ptr::NonNull;
 use std::volatile::VolatileU8;
 
 unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
-    target.store(42, Ordering::Relaxed);
-    target.load_not_atomic()
+    target.store(42);
+    target.load()
 }
 ```
 
@@ -124,53 +130,72 @@ Notice that volatile types must be manipulated via pointers, instead of
 Rust references. These unusual and unpleasant ergonomics are necessary in order
 to achieve the desired semantics of manually controlling every access to the
 target memory location, because the mere existence of a Rust reference pointing
-to a memory region allows the Rust compiler to generate memory operations
+to a memory region allows the Rust compiler to generate extra memory operations
 targeting this region (be they prefetches, register spills, ...).
 
 Because a Rust pointer is not subjected to borrow checking and has no obligation
 of pointing towards a valid memory location, this means that using a Volatile
 wrapper in any way is unsafe.
 
-As a second difference, in addition to familiar atomic memory operations,
-volatile types expose the `load_not_atomic()` and `store_not_atomic()` methods.
-As their name suggest, these memory operations are not considered to be atomic
-by the compiler, and are therefore not safe to concurrently invoke in multiple
-threads.
-
-On hardware with global cache coherence, a category which encompasses the vast
-majority of Rust's supported compilation targets, use of these methods will
-generate exactly the same code as using the `load()` and `store()` atomic access
-methods with `Relaxed` memory ordering, with the only difference being that
-data races are Undefined Behavior from the compiler's point of view. When that
-is the case, safer `Relaxed` atomic volatile operations should be preferred to
-their non-atomic equivalents.
-
-Unfortunately, however, not all of Rust's compilation targets exhibit global
-cache coherence. GPU hardware, such as the `nvptx` target, may only exhibit
-cache coherence among local "blocks" of threads. And abstract machines like WASM
-may not guarantee cache coherence at all without specific precautions. On those
-compilation targets, `Relaxed` loads and stores may either be unavailable, or
-lead to the generation of machine instructions more complex than native loads
-and stores, which may not be wanted where maximal hardware control or CPU
-performance is desired.
-
-It is only for the sake of providing an alternative to the current
-`ptr::[read|write]_volatile` mechanism on such platforms that the
-`[load|store]_not_atomic()` functions are being proposed, and they should not
-be used where better alternative exists.
-
-Finally, unlike with atomics, the compiler is not allowed to optimize the above
-function into the following...
+### Concurrency and ordering
+[concurrency-and-ordering]: #concurrency-and-ordering
+
+A second difference with familiar atomic operations is that volatile type
+operations do not have an `Ordering` parameter. These operations are atomic in
+the sense that it is safe to perform them concurrently from multiple threads on
+a single target memory location, but their ordering guarantees in multi-threaded
+environments are unlike that of any atomic operation ordering:
+
+- Within a single thread, these operations are guaranteed to be _executed_ by
+  the underlying CPU in program order...
+- ...but the order in which these operations are subsequently _observed_ to
+  occur by another thread performing volatile operations on the same memory
+  location is fully unspecified, and may defy even `Ordering::Relaxed` logic.
+
+For example, assuming that a thread A successively writes the values 24 and 42
+to a memory location using volatile operations, other threads B and C repeatedly
+reading the value of said memory location are allowed to observe these writes
+without any commonly agreed upon coherence ordering:
+
+    Thread B:             Thread C:
+    24                    42
+    42                    24
+
+As a matter of fact, other threads are not even guaranteed to eventually observe
+either of the writes carried out by thread A if no further sychronization action
+is taken to guarantee their visibility.
+
+Individual hardware targets supported by Rust may and often will provide
+stronger ordering and visibility guarantees under the above intra-thread
+execution ordering constraint. But a program relying on these hardware
+guarantees will not be portable to all hardware targets supported by Rust.
+
+Readers who would still feel the perverted temptation to use volatile operations
+for synchronization in spite of the above warning should keep in mind that the
+presence of volatile operations does not affect the optimization of surrounding
+non-volatile operations in any significant ways. Said operations may still be
+added, elided, split, narrowed, extended, or reordered as deemed useful by the
+compiler's optimizer, including by moving them across neighboring volatile
+operations.
+
+### No-elision guarantee
+[no-elision-guarantee]: #no-elision-guarantee
+
+Finally, a third difference with atomic operations is that the Rust compiler is
+not allowed to remove volatile operations as part of the code optimization
+process. For example, a compiler may not leverage knowledge of hardware memory
+load/store semantics to elide the volatile load of the above function...
 
 ```rust
 unsafe fn do_volatile_things(target: NonNull<VolatileU8>) -> u8 {
-    target.store(42, Ordering::Relaxed);
+    target.store(42);
     42
 }
 ```
 
-...or even the following, which it could normally do if it the optimizer managed
-to prove that no other thread has access to the `VolatileU8` variable:
+...nor is it allowed to elide memory accesses to a given location entirely, even
+if the code optimizer somehow managed to prove that no other thread seems to
+have access to the `VolatileU8` variable under consideration:
 
 ```rust
 unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
@@ -181,10 +206,37 @@ unsafe fn do_volatile_things(_target: NonNull<VolatileU8>) -> u8 {
 The fact that hardware loads and stores must be emitted even when the compiler's
 optimizer can predict the results of loads or assert that stores will have no
 effect on program execution is one of the most central characteristics of
-volatile operations. It is what makes these operations suitable for sensitive
+volatile operations. It is what makes these operations suitable for "sensitive"
 memory manipulations such as cryptographic secret erasure or memory-mapped I/O.
 
----
+### Further guarantees
+[further-guarantees]: #further-guarantees
+
+Volatile operations also provide a few more guarantees that are easier to
+explain, which complete their feature set for the intended purpose of providing
+tight control on memory accesses:
+
+- _As long as a memory location is only accessed via `NonNull<VolatileXyz>`_, it
+  is guaranteed that the compiler will not insert any extra accesses to it
+  beyond the loads and stores that are present in the program.
+- Although their ordering guarantees are weaker than `Ordering::Relaxed`,
+  volatile operations are still atomic, which means that they...
+    * ...cannot be split (e.g. an `u64` load/store cannot be transformed into
+      two consecutive `u32` loads/stores).
+    * ...cannot be narrowed (e.g. an `u64` load/store cannot be transformed into
+      an `u8` load/store, even if the optimizer believes that only a subset of
+      the target data has changed)
+    * ...cannot be extended to store data which would not be stored otherwise.
+- Beyond atomic guarantees, volatile operations also guarantee that they cannot
+  be extended to load data which would not be loaded otherwise.
+
+Combined with the above, these guarantees mean that volatile memory operations
+happen as specified in the program, in the order specified by the program, and
+that with a bit of work it is also possible to guarantee that they are the only
+memory operations affecting the target memory location.
+
+## Comparison with `ptr::[read|write]_volatile()`
+[volatilexyz-vs-ptrvolatile]: #volatilexyz-vs-ptrvolatile
 
 Experienced Rust users may be familiar with the previously existing
 `std::ptr::read_volatile()` and `std::ptr::write_volatile()` functions, or
@@ -194,31 +246,33 @@ provided by `std::volatile` compare to those methods.
 The answer is that this new volatile data access API should be preferred to its
 predecessor in almost every use case, as it improves upon it in several ways:
 
-- The data race semantics of `Relaxed` volatile data accesses more closely
-  matches the data race semantics of most hardware, and therefore eliminates an
-  unnecessary mismatch between volatile semantics and low-level hardware load
-  and store semantics when it comes to concurrency.
-- `VolatileXyz` wrappers are only supported for data types which are supported
-  at the machine level. Therefore, one no longer needs to worry about the
-  possibility of the compiler silently turning a Rust-level volatile data access
-  into multiple hardware-level memory operations. In the same manner as with
-  atomics, if a `Volatile` wrapper type is provided by Rust, the underlying
-  hardware is guaranteed to support memory operations of that width.
-- `VolatileXyz` wrappers more strongly encourage developers to refrain from
-  mixing volatile and non-volatile memory accesses, which is usually a mistake.
-- The ability to specify stronger-than-`Relaxed` memory orderings and to use 
-  memory operations other than loads and stores enables Rust to draw a clear
-  distinction between atomic operations which are meant to synchronize normal
-  Rust code and atomic operations which are meant to synchronize with arbitrary
-  FFI edge cases (such as threads spawned by LD_PRELOAD unbeknownst to the Rust
-  compiler), which in turn would enable better optimization of atomic operations
-  in the vast majority of Rust programs, as will be further discussed in the
-  Unresolved Questions section.
+- Unlike its predecessor, this API does not specify concurrent volatile accesses
+  to a region of memory to be undefined behavior. This undefined behavior made
+  no sense to begin with, because...
+    * The purpose of volatile memory accesses is to go as close to target
+      hardware semantics as possible in a high-level programming language, and
+      all targets supported by Rust have well-defined data race semantics.
+    * No performance optimization potential is gained by having this undefined
+      behavior, since volatile operations cannot be meaningfully optimized under
+      the assumption that target memory is not concurrently read or modified.
+- Unlike its predecessor, this API is only available for data types which have
+  a supported machine load/store operation at the hardware level. Therefore, one
+  no longer needs to worry about the possibility of the compiler silently
+  splitting a Rust-level volatile data access into multiple hardware-level
+  memory operations. In the same manner as with atomics, if a `Volatile` wrapper
+  type is provided by Rust, the underlying hardware is guaranteed to support
+  memory operations of that width.
+- Unlike its predecessor, this API sets a clear strong typing barrier to prevent
+  developers from mixing volatile and non-volatile memory accesses, which is
+  almost always a mistake.
 
 
 # Reference-level explanation
 [reference-level-explanation]: #reference-level-explanation
 
+## Semantics
+[semantics]: #semantics
+
 The fundamental purpose of volatile operations, in a system programming language
 like Rust, is to allow a developer to locally escape the Rust Abstract Machine's
 weak memory semantics and defer to the hardware's memory semantics instead,
@@ -238,26 +292,36 @@ operations, do not achieve this goal very well because:
    tracherous in scenarios where memory access patterns must be very precisely
    controlled, such as memory-mapped I/O.
 
-Using LLVM's `Relaxed` atomic volatile operations instead resolves both problems
-on globally cache-coherent hardware where native loads and stores have `Relaxed`
-or stronger semantics. The vast majority of hardware which Rust supports today
-and is expected to support in the future exhibits global cache coherence,
-so making volatile feel more at home on such hardware is a sizeable achievement.
-
-For exotic platforms whose basic memory loads and stores do not guarantee global
-cache coherence, such as `nvptx`, this RFC adds `load_not_atomic()` and
-`store_not_atomic()` operations. It is unclear at this point in time whether
-these two methods should be stabilized, or an alternative solution such as
-extending Rust's atomic operation model with synchronization guarantees weaker
-than `Relaxed` should be researched.
-
-As this feels like a complex and niche edge case that should not block the most
-generally useful subset of volatile atomic operations, this RFC proposes to
-implement these operations behind a different feature gate, and postpone their
-stabilization until supplementary research has determined whether they are
-truly a necessary evil or not.
-
----
+Both of these problems can be resolved by using atomic volatile operations
+instead of non-atomic volatile operations. However, in doing so, we must be very
+careful to stick with the subset of atomic operations that is supported by all
+Rust hardware targets.
+
+This means that we should not expose anything more than loads or stores in our
+standard subset, as some hardware architectures which are either very old or
+lack multiprocessing support do not support more than that.
+
+More surprisingly, maximizing hardware support also means that we cannot rely on
+`Ordering::Relaxed` as a minimal level of atomic sanity, because this ordering
+assumes a global order of operations targeting a single memory location across
+all threads in the program, and this property is not provided by native
+load/store operations on some targets without global cache coherence such as GPU
+architectures (NVPTX, AMD GCN, ...).
+
+As a result, we need to go deeper and expose an even weaker form of atomicity,
+which is more akin to LLVM's `unordered` atomic ordering semantics, but better
+specified in the context of atomic volatile operations because we provide very
+strong control on the sequence of emitted hardware memory accesses, and thus
+developers can refer to their hardware's memory model for precise semantics.
+
+One design goal of the proposed semantics is that it should be possible to
+implement `VolatileXyz::[load|store]` by compiling it down to LLVM unordered
+atomic volatile load/store operations of the same width, without specifying it
+in terms of "whatever LLVM unordered does" as that would make the life of
+alternate rustc backends like cranelift harder.
+
+## API design
+[api-design]: #api-design
 
 Switching to LLVM's atomic volatile accesses without also changing the API of
 `ptr::[read|write]_volatile` would unfortunately not resolve the memory access
@@ -266,11 +330,11 @@ not compile anymore, this fact would only be "reported" to the user via an LLVM
 crash. This is not a nice user experience, which is why volatile wrapper types
 are proposed instead.
 
-Their design largely mirrors that of Rust's existing atomic types, which is only
+Their design borrows from that of Rust's existing atomic types, which is only
 appropriate since they do expose atomic operations. The current proposal would
 be to fill the newly built `std::volatile` module with the following entities
 (some of which may not be available on a given platform, we will come back to
-this point in the moment):
+this point in a moment):
 
 - `VolatileBool`
 - `VolatileI8`
@@ -285,14 +349,16 @@ this point in the moment):
 - `VolatileU64`
 - `VolatileUsize`
 
-The API of these volatile types would then be very much like that of existing
-`AtomicXyz` types, except for the fact that it would be based on raw pointers
-instead of references because the existence of a Rust reference to a memory
-location is fundamentally at odds with the precise control of hardware load and
-store generation that is required by volatile use case.
+Unlike `AtomicXyz` types, these volatile types would be restricted to load and
+store operations with sub-`Relaxed` concurrent ordering guarantees (see above),
+and their API would be based on raw pointers instead of references. That is
+because the existence of a Rust reference to a memory location currently implies
+a `dereferenceable` annotation on the LLVM side, which in turn is fundamentally
+at odds with the precise control of hardware load and store generation that is
+required by volatile access use case.
 
 To give a more concrete example, here is what the API of `VolatileBool` would
-look like on a platform with full support for this type.
+look like on a platform with support for this type:
 
 ```rust
 #![feature(arbitrary_self_types)]
@@ -310,180 +376,106 @@ impl VolatileBool {
     ///
     pub const fn new(v: NonNull<bool>) -> NonNull<Self> { /* ... */ }
 
-    // NOTE: Unlike with `AtomicBool`, `get_mut()` and `into_inner()` operations
-    //       are not provided, because it is never safe to assume that no one
-    //       is concurrently accessing volatile data. As an alternative, these
-    //       operations could be provided in an unsafe way, if someone can find
-    //       a use case for them.
-
-    /// Load a value from the bool
-    ///
-    /// `load` takes an `Ordering` argument which describes the memory ordering
-    /// of this operation. Possible values are SeqCst, Acquire and Relaxed.
-    ///
-    /// # Panics
-    ///
-    /// Panics if order is Release or AcqRel.
+    /// Load the target bool value in an atomic and volatile way
     ///
     /// # Safety
     ///
     /// The `self` pointer must be well-aligned and point to a valid memory
     /// location containing a valid `bool` value.
     ///
-    pub unsafe fn load(self: NonNull<Self>, order: Ordering) -> bool { /* ... */ }
-
-    // ... and then a similar transformation is carried out on all other atomic
-    // operation APIs from `AtomicBool`:
-
-    pub unsafe fn store(self: NonNull<Self>, val: bool, order: Ordering) { /* ... */ }
-
-    pub unsafe fn swap(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
-
-    pub unsafe fn compare_and_swap(
-        self: NonNull<Self>,
-        current: bool,
-        new: bool,
-        order: Ordering
-    ) -> bool { /* ... */ }
-
-    pub unsafe fn compare_exchange(
-        self: NonNull<Self>,
-        current: bool,
-        new: bool,
-        success: Ordering,
-        failure: Ordering
-    ) -> Result<bool, bool> { /* ... */ }
-
-    pub unsafe fn compare_exchange_weak(
-        self: NonNull<Self>,
-        current: bool,
-        new: bool,
-        success: Ordering,
-        failure: Ordering
-    ) -> Result<bool, bool> { /* ... */ }
-
-    pub unsafe fn fetch_and(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
-
-    pub unsafe fn fetch_nand(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
-
-    pub unsafe fn fetch_or(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
-
-    pub unsafe fn fetch_xor(self: NonNull<Self>, val: bool, order: Ordering) -> bool { /* ... */ }
+    pub unsafe fn load(self: NonNull<Self>) -> bool { /* ... */ }
 
-    // Finally, non-atomic load and store operations are provided:
-
-    /// Load a value from the bool in a non-atomic way
-    ///
-    /// This method is provided for the sake of supporting platforms where
-    /// `load(Relaxed)` either is unsupported or compiles down to more than a
-    /// single hardware load instruction. As a counterpart, it is UB to use it
-    /// in a concurrent setting. Use of `load(Relaxed)` should be preferred
-    /// whenever possible.
+    /// Store a bool into the target in an atomic and volatile way
     ///
     /// # Safety
     ///
-    /// The `self` pointer must be well-aligned, and point to a valid memory
+    /// The `self` pointer must be well-aligned and point to a valid memory
     /// location containing a valid `bool` value.
     ///
-    /// Using this operation to access memory which is concurrently written by
-    /// another thread is a data race, and therefore Undefined Behavior.
-    ///
-    pub unsafe fn load_not_atomic(self: NonNull<Self>) -> bool { /* ... */ }
-
-    // ...and we have the same idea for stores:
-    pub unsafe fn store_not_atomic(self: NonNull<Self>, val: bool) { /* ... */ }
+    pub unsafe fn store(self: NonNull<Self>, val: bool) { /* ... */ }
 }
 ```
 
----
+## Availability
+[availability]: #availability
 
 Like `std::sync::atomic::AtomicXyz` APIs, the `std::volatile::VolatileXyz` APIs
-are not guaranteed to be fully supported on every platform. Cases of partial
-platform support which are shared with `AtomicXyz` APIs include:
-
-- Not supporting atomic accesses to certain types like `u64`.
-- Not supporting atomic read-modify-write operations like `swap()`.
-
-In addition, a concern which is specific to `Volatile` types is the case where
-atomic operations are not supported at all, but non-atomic volatile operations
-are supported. In this case, the `AtomicBool` type above would only expose
-the `load_not_atomic()` and `store_not_atomic()` methods.
-
-Platform support for volatile operations can be queried in much of the
-same way as platform support for atomic operations:
-
-- `#[cfg(target_has_atomic = N)]`, which can be used today to test full support
-  for atomic operations of a certain width N, may now also be used to test
-  full support for volatile atomic operations of the same width.
-- `#[cfg(target_has_atomic_load_store = N)]` may similarly be used to test
-  support for volatile atomic load and store operations.
-- A new cfg directive, `#[cfg(target_has_volatile = N)]`, may be used to test
-  support for non-atomic loads and stores of a certain width (i.e. `_not_atomic`
-  operations).
-
-This latter directive can be initially pessimistically implemented as a synonym
-of `#[cfg(target_has_atomic_load_store = N)]`, then gradually extended to
-support targets which have no or non-native support of `Relaxed` atomics
-but do have native load/store instructions of a certain width, such as `nvptx`.
-
-In this way, the proposed volatile atomic operation API can largely re-use the
-already existing atomic operation support infrastructure, which will greatly
-reduce effort duplication between these two closely related functionalities.
-
----
-
-This RFC currently proposes to expose the full power of LLVM's atomic volatile
-operations, including e.g. read-modify-write operations like compare-and-swap,
-because it is consistent with the atomics operation API and could have
-legitimate uses in interprocess communication scenarios, as a marker of the
-nuance between well-optimized program-local synchronization and worst-case FFI
-synchronization. See Unresolved Questions section for more details.
-
-However, the fact that these operations do not necessarily compile into a single
-hardware instruction is arguably a footgun for volatile's use cases, and it
-could be argued that initially only stabilizing loads, stores and `Relaxed`
-atomic ordering would be more prudent. This will be revisited in the
-alternatives section of this RFC.
-
----
+are not guaranteed to be supported on every platform. As with atomic APIs, the
+classic reason why a `VolatileXyz` wrapper would be missing is that the target
+platform does not have load and store instructions of the corresponding width.
 
-As currently designed, this RFC uses `arbitrary_self_types` to give method-like
-semantics to a `NonNull` raw pointer. This seems necessary to get reasonable
-ergonomics with an atomics-like wrapper type approach. However, it could also be
-argued that a `VolatileXyz::store(ptr, data, ordering)` style of API would work
-well enough, and avoid coupling with unstable features. Similarly, the use of
-`NonNull` itself could be debated. This will be revisited in the alternatives
-section of the RFC as well.
+However, the `VolatileXyz` APIs could be made available on some platforms where
+atomic operations cannot be, because they require much weaker concurrent
+ordering semantics from hardware in their quest to be usable in every place
+where native loads and store instructions can be used.
+
+Therefore, we propose to provide a new platform support query,
+`#[cfg(target_has_volatile = N)]`, which works in essentially the same fashion
+as `#[cfg(target_has_atomic_load_store = N)]` and may initially be implemented
+as a synonym of the latter, but can be eventually extended to provide volatile
+load/store support on platforms where `Relaxed` atomic loads and stores are not
+available or more complex than basic loads and stores.
 
 
 # Drawbacks
 [drawbacks]: #drawbacks
 
+## API duplication
+[api-duplication]: #api-duplication
+
 Keeping around two different and subtly incompatible ways to do almost the same
 thing, in the form of these new wrappers and the old volatile read/write methods
 of pointer types, is unsatisfying. It could be argued that the issues with the
 existing volatile operations, while real, do not warrant the full complexity of
 this RFC.
 
-As mentioned above, exposing more than loads and stores, and non-`Relaxed`
-atomic memory orderings, also muddies the "a volatile op should compile into one
-hardware memory instruction" narrative that is so convenient for loads and
-stores. Further, compatibility of non-load/store atomics in IPC scenario may
-require some ABI agreement on how atomics should be implemented between the
-interacting programs.
-
-If this is perceived to be a problem, we could decide to do away with some of
-the complexity by initially focusing on a restricted subset of this proposal
-which only supports `Relaxed` loads and stores, and saving more complex atomic
-operations as a future extension.
+Note that duplication of implementation should not be necessary as
+`ptr::[read|write]_volatile` should be implementable on top of
+`VolatileXyz::[load|store]`, in the spirit of what LLVM's element-wise atomic
+memcpy already does.
+
+## Unordered semantics
+[unordered-semantics]: #unordered-semantics
+
+This RFC also introduces new semantics for atomic operations, morally equivalent
+to LLVM's unordered atomic volatile loads and stores. This is risky because
+LLVM's unordered atomic semantics have not been sufficiently vetted for
+soundness by the language memory model community. In the RFC author's opinion,
+the risk is worthwhile because...
+
+- The alternative of non-atomic volatile is worse. It makes data races undefined
+  behavior in violation of any known hardware memory model, and therefore it
+  takes volatile away from its "defer to hardware memory model" intent.
+- The alternative of `Relaxed` atomic volatile is also worse, it makes the new
+  API for volatile operations unusable on hardware without global cache
+  coherence, such as GPUs.
+- Although unordered atomics are a bit of a minefield, unordered volatile is
+  safer to the point of being reasonable, because volatile provides so many
+  guarantees on its own that adding unordered semantics on top of it should have
+  the sole effect of preventing tearing and making data races well-defined.
+
+## Pointer-based API
+[pointer-based-api]: #pointer-based-api
+
+Because rustc treats references as `dereferenceable` at the LLVM layer, which is
+incompatible with the degree of memory access control that volatile access
+users typically demand, this RFC is forced to design an API around raw pointers.
+
+Using raw pointers means bypassing the borrow checker, which increases the risk
+of memory unsafety. It also makes for a less familiar API design that users will
+have a harder time getting to grips with. In this sense, maybe it would just be
+better to resolve the `dereferenceable` issue first (allowing some sort of
+non-`dereferenceable` reference which remains subjected to all other normal
+guarantees of references), and then redesign this API in terms of whatever the
+chosen solution ends up being.
 
 
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 
-This effort seems worthwhile because it simultaneously eliminates two well-known
-and unnecessary volatile footguns (data races and tearing) and opens interesting
-new possibilities in interprocess communication.
+This effort seems worthwhile because it eliminates two well-known and
+unnecessary footguns of non-atomic volatile operations, namely data races being
+undefined behavior and tearing not being a compilation error.
 
 But although this feature may seem simple, its design space is actually
 remarquably large, and a great many alternatives were considered before reaching
@@ -494,32 +486,43 @@ the current design proposal. Here are some design knobs that were explored:
 
 Atomic volatile operations could be made part of the Atomic wrapper types.
 However, they cannot be just another atomic memory ordering, due to the fact
-that atomic ops take an `&self` (which is against the spirit of volatile as it
-is tagged with `dereferenceable` at the LLVM layer).
+that atomic ops take an `&self`, which is against the spirit of volatile as it
+is tagged with `dereferenceable` at the LLVM layer.
 
 Instead, one would need to add more methods to atomic wrapper types, which take
 a pointer as a self-parameter. This API inconsistency was felt to be
-unnecessarily jarring to users, not to mention that potentially having a copy of
-each atomic memory operation with a `_volatile` prefix would reduce the
-readability of the `AtomicXyz` types' API documentation.
+unnecessarily jarring to users.
+
+In addition, one intent of this API is to be usable in places where even
+`Relaxed` ordering is not usable, while punting on the stabilization of weaker
+atomic orderings in the non-volatile case. This would mean adding one third
+"tier" of hardware atomics support (beyond `target_has_atomic_load_store` and
+`target_has_atomic`), which starts to feel a bit much.
 
-In addition, it is extremely common to want either all operations on a memory
+Finally, it is extremely common to want either all operations on a memory
 location to be volatile, or none of them. Providing separate wrapper types
-helps enforce this very common usage pattern at the API level.
+helps enforce this very common usage pattern at the API level. If a need for
+mixed volatile and non-volatile operations on a given memory location ever
+emerges, we could envision providing a new `VolatileXyz` method that casts from
+`NonNull<VolatileXyz>` to `&AtomicXyz`, with a warning to the user that doing so
+voids the warranty of no out-of-thin-air memory operations that normally comes
+attached to `VolatileXyz`.
 
 ## Self-type or not self-type
 [self-type]: #self-type
 
-Instead of using `arbitrary_self_types` to get `a.load(Ordering::Relaxed)`
-method syntax on pointer-like objects, one could instead provide the volatile
-operations as inherent methods, i.e. `VolatileU8::load(a, Ordering::Relaxed)`.
+As currently designed, this RFC uses `arbitrary_self_types` to give method-like
+semantics to a `NonNull` raw pointer. This seems necessary to get reasonable
+ergonomics with an atomics-like wrapper type approach. However, it could also be
+argued that a `VolatileXyz::store(ptr, data)` style of API would work
+well enough, and avoid coupling with unstable features.
 
-This has the advantage of avoiding coupling this feature to another unstable
-feature. But it has the drawback of being incredibly verbose. Typing the same
-volatile type name over and over again in a complex transaction would certainly
-get old and turn annoying quickly, and we don't want anger to distract low-level
-developers from the delicate task of implementing the kind of subtle algorithm
-that requires volatile operations.
+This would have the advantage of avoiding coupling this feature to another
+unstable feature. But it would have the drawback of being very verbose. Typing
+the same volatile type name over and over again in a complex transaction would
+certainly obscure the intent of the code, whereas code that requires use of
+volatile operations is often very subtle and benefits from being as readable as
+it gets.
 
 ## `NonNull<T>` vs `*mut T` vs `*const T` vs other
 [pointers]: #pointers
@@ -551,31 +554,25 @@ normally has. For example, it has been proposed before that `&UnsafeCell<T>`
 should not exhibit this behavior. This could be enough for `VolatileXyz`'s need
 if it were extended to transparent newtypes of `UnsafeCell<Xyz>`.
 
-## Full atomics vocabulary vs sticking with hardware semantics
+## Hardware loads/store vs full atomics vocabulary
 [atomics-vs-hardware]: #atomics-vs-hardware
 
-Currently, this RFC basically proposes exposing a volatile version of every
-atomic operation supported by Rust for maximal expressive power.
+The current version of this RFC proposes to restrict the `VolatileXyz` API to
+atomic loads and stores with no concurrent ordering guarantees. It could
+alternatively expose the full functionality of LLVM's atomic volatile operations
+in the same manner as `AtomicXyz` does for non-volatile atomic operations.
 
-But it could also be argued that this distracts us from volatile's main purpose
-of generating a stream of simple hardware instructions without using inline
-assembly:
+There are three reasons why it was chosen not to do so:
 
-- Non-`Relaxed` atomics will entail memory barriers
-- Compare-and-swap may be implemented as a load-linked/store-conditional loop
-- Some types like `VolatileBool` are dangerous when interacting with untrusted
-  code because they come with data validity invariants.
+- It goes against the typical intent of volatile, which is to exert tight
+  control on the hardware load/store instructions that are generated.
+- It would duplicate the `AtomicXyz` API to a large extent.
+- There is no known use case for nontrivial atomic volatile operations that
+  wouldn't be better served by adding volatile orderings to regular atomic
+  operations, at a reduced API duplication cost.
 
-From this perspective, there would be an argument in favor of only supporting
-`Relaxed` load/stores and machine data types, at least initially. In this case,
-one could split this feature into three feature gates:
-
-- `Relaxed` volatile atomic loads and stores, which are most urgently needed.
-- Non-`Relaxed` orderings and read-modify-write atomics, which open new
-  possibilities for shared-memory IPC.
-- `_not_atomic()` operations, where it is not yet clear whether the proposed API
-  is even the right solution to the problem being solved, and more research is
-  needed before reaching a definite conclusion.
+Full discussion of this question, and of the proposed alternative, has therefore
+been moved the "future possibilities" section.
 
 
 # Prior art
@@ -624,6 +621,7 @@ over the course of implementing `ptr::read_volatile()`, `ptr::write_volatile()`
 and `std::sync::atomic` (to the best of the author's knowledge at least).
 
 ## Should shared-memory IPC always be volatile?
+[should-shared-memory-be-volatile]: #should-shared-memory-be-volatile
 
 There is [some ongoing discussion](https://github.com/rust-lang/unsafe-code-guidelines/issues/215)
 in the Unsafe Code Guidelines group concerning whether a Rust implementation
@@ -645,13 +643,8 @@ volatile memory accesses.
   the mere presence of atomics acts as a trigger that disables the above
   optimizations.
 
-## Necessity of non-atomic operations
-
-The necessity of having `load_not_atomic()` and `store_not_atomic()` methods,
-as opposed to alternatives such as weaker-than-`Relaxed` atomics, should be
-researched before stabilizing that subset of this RFC.
-
 ## Safer self types
+[safer-self-types]: #safer-self-types
 
 This RFC would also benefit from a safer way to interact with volatile memory
 regions than raw pointers, by providing a way to opt out of LLVM's
@@ -663,6 +656,7 @@ memory-safety guarantees provided by the Rust borrow checker.
 [future-possibilities]: #future-possibilities
 
 ## Untrusted shared-memory IPC
+[untrusted-shared-memory]: #untrusted-shared-memory
 
 Although it performs a step in the right direction by strengthening the
 definition of volatile accesses to reduce the amount of possible avenues for
@@ -673,14 +667,48 @@ shared memory region.
 Doing so would also require work on clarifying LLVM semantics so that it is
 absolutely clear that a malicious process cannot cause UB in another process by
 by writing data in memory that's shared between the two, no matter if said
-writes are non-atomic, non-volatile, etc.
-
-## Volatile atomic operations
-
-If we decide to drop advanced atomic orderings and operations from this RFC,
-then they will fall out in this section.
+writes are non-atomic, non-volatile, or contains uninitialized data.
+
+## Stronger orderings and RMW operations
+[stronger-ordering-and-rmw]: #stronger-ordering-and-rmw
+
+This RFC proposes to initially stabilize an absolute minimum of atomic volatile
+operations, morally equivalent to LLVM's `unordered` atomic volatile loads and
+stores. But atomic volatile operations can, in principle, support stronger
+semantics, such as `Relaxed` and stronger orderings and read-modify-write
+operations like `compare_and_swap`.
+
+The reason why these operations are not made part of the `VolatileXyz` API is
+that they interfere with one common intent behind volatile operations, which is
+to exert tight control on the stream of hardware instructions that is emitted:
+
+- `Relaxed` and stronger ordering may lead to the emission of different hardware
+  load and store instructions, or to the emission of hardware memory barriers,
+  in a target- and ABI-dependent fashion.
+- `compare_and_swap` will lead to the generation a full loop (and thus break
+  wait-freedom progress guarantees, which may be unexpected) on hardware based
+  on the Load-Linked / Store-Conditional (LL/SC) synchronization formalism.
+- Other atomic read-modify-write operations may or may not also compile into an
+  LL/SC loop, depending on whether they are natively supported by hardware, and
+  whether the compiler backend chose to use them if that is the case.
+
+Clearly, the mental model for the process through which these instructions are
+translated into hardware semantics is much more difficult to reason about, and
+may not be stable across compiler versions and architecture/ABI revisions.
+
+So far, the only use case that was found for these operations was to exert
+stronger control over atomic operation emission, on a hypothetical backend
+which would optimize atomics aggressively. This use case does not seem like it
+would require the full constraints of the `VolatileXyz`, and would probably be
+better implemented as an extension of `std::sync::atomic::Ordering`.
+
+One possibility would be to duplicate every ordering with a volatile variant
+(e.g. `RelaxedVolatile`, `AcquireVolatile...`). Another possibility would be to
+turn `std::sync::atomic::Ordering` into a bitfield, which would allow cleaner
+syntax in the `SeqCst | Volatile` style.
 
 ## Deprecating `ptr::[read|write]_volatile`
+[deprecating-ptr-volatile]: #deprecating-ptr-volatile
 
 Finally, one could consider `ptr::[read|write]_volatile` and the corresponding
 methods of pointer types as candidates for future deprecation, as they provide

From 17ab617dd51cb7e8ce399a2a194b13ef6b5d763c Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 15:41:29 +0100
Subject: [PATCH 37/42] More atomics codegen horror stories

---
 rfcs/0000-atomic-volatile.md | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index a7371b72..f1d06e5b 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -684,13 +684,17 @@ to exert tight control on the stream of hardware instructions that is emitted:
 
 - `Relaxed` and stronger ordering may lead to the emission of different hardware
   load and store instructions, or to the emission of hardware memory barriers,
-  in a target- and ABI-dependent fashion.
+  in a target- and ABI-dependent fashion. As memory barriers can be coalesced,
+  the details of which instructions will be used depend on surrounding code.
 - `compare_and_swap` will lead to the generation a full loop (and thus break
   wait-freedom progress guarantees, which may be unexpected) on hardware based
   on the Load-Linked / Store-Conditional (LL/SC) synchronization formalism.
-- Other atomic read-modify-write operations may or may not also compile into an
-  LL/SC loop, depending on whether they are natively supported by hardware, and
-  whether the compiler backend chose to use them if that is the case.
+- Some atomic read-modify-write operations will lead to the generation a loop
+  (and thus break wait-freedom progress guarantees, which may be unexpected) on
+  hardware based on the Load-Linked / Store-Conditional (LL/SC) synchronization
+  formalism. Which instructions will be subjected to this treatment depends on
+  target CPU instruction set and whether the compiler backend will choose to use
+  dedicated wait-free instructions over an LL/SC loop.
 
 Clearly, the mental model for the process through which these instructions are
 translated into hardware semantics is much more difficult to reason about, and
@@ -699,8 +703,8 @@ may not be stable across compiler versions and architecture/ABI revisions.
 So far, the only use case that was found for these operations was to exert
 stronger control over atomic operation emission, on a hypothetical backend
 which would optimize atomics aggressively. This use case does not seem like it
-would require the full constraints of the `VolatileXyz`, and would probably be
-better implemented as an extension of `std::sync::atomic::Ordering`.
+would require the full constraints of the `VolatileXyz` API, and it could
+therefore probably be turned into an extension of `std::sync::atomic::Ordering`.
 
 One possibility would be to duplicate every ordering with a volatile variant
 (e.g. `RelaxedVolatile`, `AcquireVolatile...`). Another possibility would be to

From 8aa979fc4370a2ef6c94380f5bcfb999d3023254 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 15:43:26 +0100
Subject: [PATCH 38/42] Air + wording cleanup

---
 rfcs/0000-atomic-volatile.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index f1d06e5b..32015ca8 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -501,12 +501,13 @@ atomic orderings in the non-volatile case. This would mean adding one third
 
 Finally, it is extremely common to want either all operations on a memory
 location to be volatile, or none of them. Providing separate wrapper types
-helps enforce this very common usage pattern at the API level. If a need for
-mixed volatile and non-volatile operations on a given memory location ever
-emerges, we could envision providing a new `VolatileXyz` method that casts from
-`NonNull<VolatileXyz>` to `&AtomicXyz`, with a warning to the user that doing so
-voids the warranty of no out-of-thin-air memory operations that normally comes
-attached to `VolatileXyz`.
+helps enforce this very common usage pattern at the API level.
+
+If a need for mixed volatile and non-volatile operations on a given memory
+location ever emerges, we could envision providing a new `VolatileXyz` method
+that casts from `NonNull<VolatileXyz>` to `&AtomicXyz`, with a clear warning in
+its documentation that doing so voids the warranty of no out-of-thin-air memory
+operations that `VolatileXyz` tries to hard to provide.
 
 ## Self-type or not self-type
 [self-type]: #self-type

From 91142516a1e32048c9f3fc6a4ef6d21df4b2af71 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 15:54:22 +0100
Subject: [PATCH 39/42] Link to P0062R1

---
 rfcs/0000-atomic-volatile.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 32015ca8..7989330e 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -702,10 +702,11 @@ translated into hardware semantics is much more difficult to reason about, and
 may not be stable across compiler versions and architecture/ABI revisions.
 
 So far, the only use case that was found for these operations was to exert
-stronger control over atomic operation emission, on a hypothetical backend
-which would optimize atomics aggressively. This use case does not seem like it
-would require the full constraints of the `VolatileXyz` API, and it could
-therefore probably be turned into an extension of `std::sync::atomic::Ordering`.
+stronger control over atomic operation emission, on a hypothetical compiler
+which would [optimize atomics too aggressively](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html).
+This use case does not seem like it would require the full constraints of the
+`VolatileXyz` API, and it could therefore probably be turned into an extension
+of `std::sync::atomic::Ordering`.
 
 One possibility would be to duplicate every ordering with a volatile variant
 (e.g. `RelaxedVolatile`, `AcquireVolatile...`). Another possibility would be to

From 79f9545351beaf51db3fa5e1d90e3cb70575e808 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 17:37:28 +0100
Subject: [PATCH 40/42] Last read before pinging

---
 rfcs/0000-atomic-volatile.md | 81 +++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 30 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 7989330e..d4d67867 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -8,10 +8,10 @@
 
 Introduce a set of `core::volatile::VolatileXyz` structs, roughly modeled after
 the existing `core::sync::atomic::AtomicXyz` APIs, which expose volatile loads
-and stores of machine-native width with atomic semantics. Recommend that these
-operations be preferred to `ptr::[read|write]_volatile` and the corresponding
-methods of pointer types, as they have stronger semantics that are almost always
-closer to user intent.
+and stores of machine-native width with atomic (but not ordered) semantics.
+Recommend that these operations be preferred to `ptr::[read|write]_volatile` and
+the corresponding methods of pointer types, as they have stronger semantics that
+are almost always closer to user intent.
 
 
 # Motivation
@@ -150,33 +150,33 @@ environments are unlike that of any atomic operation ordering:
   the underlying CPU in program order...
 - ...but the order in which these operations are subsequently _observed_ to
   occur by another thread performing volatile operations on the same memory
-  location is fully unspecified, and may defy even `Ordering::Relaxed` logic.
+  location is not subjected to any guarantee, to the point where it may defy
+  even seemingly basic `Ordering::Relaxed` logic.
 
 For example, assuming that a thread A successively writes the values 24 and 42
 to a memory location using volatile operations, other threads B and C repeatedly
 reading the value of said memory location are allowed to observe these writes
-without any commonly agreed upon coherence ordering:
+without any commonly agreed upon coherence ordering...
 
-    Thread B:             Thread C:
-    24                    42
-    42                    24
+    Thread B's view:         Thread C's view:
+    24                       42
+    42                       24
 
-As a matter of fact, other threads are not even guaranteed to eventually observe
-either of the writes carried out by thread A if no further sychronization action
-is taken to guarantee their visibility.
+...and that is assuming that they do observe the writes from thread A, which may
+not even happen until further hardware-specific action (such as an explicit
+cache flush) is taken.
 
-Individual hardware targets supported by Rust may and often will provide
+Individual hardware targets supported by Rust can and often will provide
 stronger ordering and visibility guarantees under the above intra-thread
 execution ordering constraint. But a program relying on these hardware
 guarantees will not be portable to all hardware targets supported by Rust.
 
 Readers who would still feel the perverted temptation to use volatile operations
-for synchronization in spite of the above warning should keep in mind that the
+for synchronization in spite of the above warnings should keep in mind that the
 presence of volatile operations does not affect the optimization of surrounding
 non-volatile operations in any significant ways. Said operations may still be
 added, elided, split, narrowed, extended, or reordered as deemed useful by the
-compiler's optimizer, including by moving them across neighboring volatile
-operations.
+optimizer, including by moving them across neighboring volatile operations.
 
 ### No-elision guarantee
 [no-elision-guarantee]: #no-elision-guarantee
@@ -293,9 +293,9 @@ operations, do not achieve this goal very well because:
    controlled, such as memory-mapped I/O.
 
 Both of these problems can be resolved by using atomic volatile operations
-instead of non-atomic volatile operations. However, in doing so, we must be very
-careful to stick with the subset of atomic operations that is supported by all
-Rust hardware targets.
+instead of non-atomic volatile operations. However, for atomic volatile to act
+as a full replacement to non-volatile volatile, we must be very careful to stick
+with the subset of atomic operations that is supported by all Rust targets.
 
 This means that we should not expose anything more than loads or stores in our
 standard subset, as some hardware architectures which are either very old or
@@ -304,7 +304,7 @@ lack multiprocessing support do not support more than that.
 More surprisingly, maximizing hardware support also means that we cannot rely on
 `Ordering::Relaxed` as a minimal level of atomic sanity, because this ordering
 assumes a global order of operations targeting a single memory location across
-all threads in the program, and this property is not provided by native
+all threads in the program, and this property is actually not provided by basic
 load/store operations on some targets without global cache coherence such as GPU
 architectures (NVPTX, AMD GCN, ...).
 
@@ -316,9 +316,9 @@ developers can refer to their hardware's memory model for precise semantics.
 
 One design goal of the proposed semantics is that it should be possible to
 implement `VolatileXyz::[load|store]` by compiling it down to LLVM unordered
-atomic volatile load/store operations of the same width, without specifying it
-in terms of "whatever LLVM unordered does" as that would make the life of
-alternate rustc backends like cranelift harder.
+atomic volatile load/store operations of the same width, but without specifying
+it in terms of "whatever LLVM unordered does" as that would make the life of
+alternate rustc backends like cranelift hard.
 
 ## API design
 [api-design]: #api-design
@@ -466,8 +466,8 @@ of memory unsafety. It also makes for a less familiar API design that users will
 have a harder time getting to grips with. In this sense, maybe it would just be
 better to resolve the `dereferenceable` issue first (allowing some sort of
 non-`dereferenceable` reference which remains subjected to all other normal
-guarantees of references), and then redesign this API in terms of whatever the
-chosen solution ends up being.
+guarantees of Rust references), and then redesign this API in terms of whatever
+the chosen solution ends up being.
 
 
 # Rationale and alternatives
@@ -523,9 +523,9 @@ unstable feature. But it would have the drawback of being very verbose. Typing
 the same volatile type name over and over again in a complex transaction would
 certainly obscure the intent of the code, whereas code that requires use of
 volatile operations is often very subtle and benefits from being as readable as
-it gets.
+it can possibly get.
 
-## `NonNull<T>` vs `*mut T` vs `*const T` vs other
+## `NonNull<T>` vs `*mut T`/`*const T` vs other
 [pointers]: #pointers
 
 It is pretty clear that volatile operations cannot be expressed through `&self`
@@ -636,14 +636,17 @@ implementation has no knowledge of, may or may not require systematic use of
 volatile memory accesses.
 
 - If we go in the "maximal Rust performance" direction, then every access to
-  shared memory must be marked volatile because the Rust compiler is allowed to
-  optimize it out if it is not subsequently used by Rust code (or if it can
-  transform the Rust code to eliminate that use).
+  shared memory in IPC must be marked volatile because the Rust compiler is
+  allowed to optimize it out if it is not subsequently used by Rust code (or if
+  it can transform the Rust code to eliminate that use).
 - If we go in the "maximal FFI ergonomics" direction, then volatile accesses are
   only needed when they are not coupled with atomics-based synchronization, as
   the mere presence of atomics acts as a trigger that disables the above
   optimizations.
 
+So far, the consensus has been that FFI ergonomics should be optimized in this
+case, in which case nothing extra needs to be done here.
+
 ## Safer self types
 [safer-self-types]: #safer-self-types
 
@@ -652,6 +655,24 @@ regions than raw pointers, by providing a way to opt out of LLVM's
 "dereferenceable" semantics without also having to opt out from all the
 memory-safety guarantees provided by the Rust borrow checker.
 
+As further fuel for this design direction, note that the fact that Rust
+references are automatically marked as `dereferenceable` has caused a some of
+pain recently in the Rust community:
+
+- [Unsoundness and poor ergonomics in embedded crates](https://github.com/rust-embedded/wg/pull/387)
+- [Unsoundness in deallocate-on-drop smart pointers](https://github.com/rust-lang/rust/issues/55005)
+
+Therefore, in the author's opinion, it would be most prudent to punt on
+stabilization of this language feature until we have a clearer picture of
+whether or not we want to introduce a way to opt-out of `dereferenceable`
+semantics for Rust references without needing to go for raw pointer based APIs.
+If we do introduce such a way, then the Volatile API should definitely use it.
+
+However, this concern does not seem to block introduction of `Volatile` wrappers
+in an _unstable_ form, as a short-term answer to embedded MMIO concerns until a
+better fix for `dereferenceable` is introduced (if ever). It should only be
+considered as a stabilization blocker.
+
 
 # Future possibilities
 [future-possibilities]: #future-possibilities

From 71ebfcd2c64a45fa60b119073cbccf5fe8feb387 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sun, 8 Dec 2019 17:38:57 +0100
Subject: [PATCH 41/42] Also link to Box issue

---
 rfcs/0000-atomic-volatile.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index d4d67867..87e5cb33 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -660,7 +660,7 @@ references are automatically marked as `dereferenceable` has caused a some of
 pain recently in the Rust community:
 
 - [Unsoundness and poor ergonomics in embedded crates](https://github.com/rust-embedded/wg/pull/387)
-- [Unsoundness in deallocate-on-drop smart pointers](https://github.com/rust-lang/rust/issues/55005)
+- Unsoundness in deallocate-on-drop smart pointers like [Arc](https://github.com/rust-lang/rust/issues/55005) and [Box](https://github.com/rust-lang/rust/issues/66600)
 
 Therefore, in the author's opinion, it would be most prudent to punt on
 stabilization of this language feature until we have a clearer picture of

From bb9cefa01c9ab96f2f840cab0c360be4085f9625 Mon Sep 17 00:00:00 2001
From: Hadrien G <knights_of_ni@gmx.com>
Date: Sat, 4 Jan 2020 09:46:02 +0100
Subject: [PATCH 42/42] Turn VolatileBool into a future extension

---
 rfcs/0000-atomic-volatile.md | 49 +++++++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/rfcs/0000-atomic-volatile.md b/rfcs/0000-atomic-volatile.md
index 87e5cb33..eccfd302 100644
--- a/rfcs/0000-atomic-volatile.md
+++ b/rfcs/0000-atomic-volatile.md
@@ -336,7 +336,6 @@ be to fill the newly built `std::volatile` module with the following entities
 (some of which may not be available on a given platform, we will come back to
 this point in a moment):
 
-- `VolatileBool`
 - `VolatileI8`
 - `VolatileI16`
 - `VolatileI32`
@@ -349,6 +348,18 @@ this point in a moment):
 - `VolatileU64`
 - `VolatileUsize`
 
+Note that this list does not include a `VolatileBool`. Although this type would
+be easy to implement and providing it would be consistent with `AtomicBool`
+prior art, it is left as a possible future extension because...
+
+- Several key use cases for volatile accesses involve interaction with entities
+  which are not guaranteed to uphold `bool` data validity invariants. From this
+  perspective, `VolatileBool` would be somewhat tracherous.
+- It is not clear whether Rust is ready to commit to a stable ABI for `bool`,
+  which would seem to be a prerequisite for many use cases of `VolatileBool`.
+- No use case for `VolatileBool` which would avoid or counterbalance these
+  drawbacks has been submitted yet.
+
 Unlike `AtomicXyz` types, these volatile types would be restricted to load and
 store operations with sub-`Relaxed` concurrent ordering guarantees (see above),
 and their API would be based on raw pointers instead of references. That is
@@ -357,7 +368,7 @@ a `dereferenceable` annotation on the LLVM side, which in turn is fundamentally
 at odds with the precise control of hardware load and store generation that is
 required by volatile access use case.
 
-To give a more concrete example, here is what the API of `VolatileBool` would
+To give a more concrete example, here is what the API of `VolatileU8` would
 look like on a platform with support for this type:
 
 ```rust
@@ -366,33 +377,33 @@ use std::sync::atomic::Ordering;
 
 
 #[repr(transparent)]
-struct VolatileBool(bool);
+struct VolatileU8(u8);
 
-impl VolatileBool {
-    /// Creates a new VolatileBool pointer
+impl VolatileU8 {
+    /// Creates a new VolatileU8 pointer
     ///
     /// This is safe as creating a pointer is considered safe in Rust and
     /// volatile adds no safety invariant to the input pointer.
     ///
-    pub const fn new(v: NonNull<bool>) -> NonNull<Self> { /* ... */ }
+    pub const fn new(v: NonNull<u8>) -> NonNull<Self> { /* ... */ }
 
-    /// Load the target bool value in an atomic and volatile way
+    /// Load the target u8 value in an atomic and volatile way
     ///
     /// # Safety
     ///
     /// The `self` pointer must be well-aligned and point to a valid memory
-    /// location containing a valid `bool` value.
+    /// location containing an initialized `u8` value.
     ///
-    pub unsafe fn load(self: NonNull<Self>) -> bool { /* ... */ }
+    pub unsafe fn load(self: NonNull<Self>) -> u8 { /* ... */ }
 
-    /// Store a bool into the target in an atomic and volatile way
+    /// Store an u8 into the target in an atomic and volatile way
     ///
     /// # Safety
     ///
     /// The `self` pointer must be well-aligned and point to a valid memory
-    /// location containing a valid `bool` value.
+    /// location containing an initialized `u8` value.
     ///
-    pub unsafe fn store(self: NonNull<Self>, val: bool) { /* ... */ }
+    pub unsafe fn store(self: NonNull<Self>, val: u8) { /* ... */ }
 }
 ```
 
@@ -743,3 +754,17 @@ less clear semantics than atomic volatile accesses (e.g. no guarantee of being
 exempted from the definition of data races) for no clear benefit. As a more
 backward-compatible alternative, one could also reimplement those methods using
 a loop of atomic volatile operations of unspecified width.
+
+## `VolatileBool`
+[volatile-bool]: #volatile-bool
+
+As mentioned previously, `VolatileBool` is not proposed for initial
+stabilization and instead left as a possible future extension. Such an extension
+would need to answer the following concerns:
+
+- Which use cases motivate the introduction of a `VolatileBool`, in spite of
+  `bool` being a dangerous type to use for MMIO or interaction with untrusted
+  code due to its data validity invariants?
+- Does the extension come together with a proposal to stabilize the ABI of
+  `repr(Rust)` booleans, and if not how does the proposed use case handle the
+  (so far theoretical) ABI unstability of Rust's `bool`?