Skip to content

Commit 5c0cd2e

Browse files
authored
Merge pull request #1385 from alexcrichton/changes-to-wasm
Add a post about changes to WebAssembly targets
2 parents 5bfe85d + e3f61c9 commit 5c0cd2e

File tree

1 file changed

+265
-0
lines changed

1 file changed

+265
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
---
2+
layout: post
3+
title: "WebAssembly targets: change in default target-features"
4+
author: Alex Crichton
5+
team: The Compiler Team <https://www.rust-lang.org/governance/teams/compiler>
6+
---
7+
8+
The Rust compiler has [recently upgraded to using LLVM 19][llvm19] and this
9+
change accompanies some updates to the default set of target features enabled
10+
for WebAssembly targets of the Rust compiler. Beta Rust today, which will
11+
become Rust 1.82 on 2024-10-17, reflects all of these changes and can be
12+
used for testing.
13+
14+
WebAssembly is an evolving standard where extensions are being added over
15+
time through a [proposals process][proposals]. WebAssembly proposals reach
16+
maturity, get merged into the specification itself, get implemented in engines,
17+
and remain this way for quite some time before producer toolchains (e.g. LLVM)
18+
update to **enable these sufficiently-mature proposals by default**. In LLVM 19
19+
this has happened with the [multi-value and reference-types
20+
proposals][llvmenable] for the LLVM/Rust target features `multivalue` and
21+
`reference-types`. These are now enabled by default in LLVM and transitively
22+
means that it's enabled by default for Rust as well.
23+
24+
WebAssembly targets for Rust now [have improved
25+
documentation](https://github.com/rust-lang/rust/pull/128511) about WebAssembly
26+
proposals and their corresponding target features. This post is going to review
27+
these changes and go into depth about what's changing in LLVM.
28+
29+
## WebAssembly Proposals and Compiler Target Features
30+
31+
WebAssembly proposals are the formal means by which the WebAssembly standard
32+
itself is evolved over time. Most proposals need toolchain integration in one
33+
form or another, for example new flags in LLVM or the Rust compiler. The
34+
`-Ctarget-feature=...` mechanism is used to implement this today. This is a
35+
signal to LLVM and the Rust compiler which WebAssembly proposals are enabled or
36+
disabled.
37+
38+
There is a loose coupling between the name of a proposal (often the name of the
39+
github repository of the proposal) and the feature name LLVM/Rust use. For
40+
example there is the [multi-value
41+
proposal](https://github.com/webAssembly/multi-value) but a `multivalue`
42+
feature.
43+
44+
The lifecycle of the implementation of a feature in Rust/LLVM typically looks
45+
like:
46+
47+
1. A new WebAssembly proposal is created in a new repository, for example
48+
WebAssembly/foo.
49+
2. Eventually Rust/LLVM implement the proposal under `-Ctarget-feature=+foo`
50+
3. Eventually the upstream proposal is merged into the specification, and
51+
WebAssembly/foo becomes an archived repository
52+
4. Rust/LLVM enable the `-Ctarget-feature=+foo` feature by default but typically
53+
retain the ability to disable it as well.
54+
55+
The `reference-types` and `multivalue` target features in Rust are at step (4)
56+
here now and this post is explaining the consequences of doing so.
57+
58+
## Enabling Reference Types by Default
59+
60+
The [reference-types proposal to
61+
WebAssembly](https://github.com/webAssembly/reference-types) introduced a few
62+
new concepts to WebAssembly, notably the `externref` type which is a
63+
host-defined GC resource that WebAssembly cannot access but can pass around.
64+
Rust does not have support for the WebAssembly `externref` type and LLVM 19 does
65+
not change that. WebAssembly modules produced from Rust will continue to not use
66+
the `externref` type nor have a means of being able to do so. This may be
67+
enabled in the future (e.g. a hypothetical `core::arch::wasm32::Externref` type
68+
or similar), but it will mostly likely only be done on an opt-in basis
69+
and will not affect preexisting code by default.
70+
71+
Also included in the reference-types proposal, however, was the ability to have
72+
multiple WebAssembly tables in a single module. In the original version of the
73+
WebAssembly specification only a single table was allowed and this restriction
74+
was relaxed with the reference-types proposal. WebAssembly tables are used by
75+
LLVM and Rust to implement indirect function calls. For example function
76+
pointers in WebAssembly are actually table indices and indirect function calls
77+
are a WebAssembly `call_indirect` instruction with this table index.
78+
79+
With the reference-types proposal the binary encoding of `call_indirect`
80+
instructions was updated. Prior to the reference-types proposal `call_indirect`
81+
was encoded with a fixed zero byte in its instruction (required to be exactly
82+
0x00). This fixed zero byte was relaxed to a 32-bit [LEB] to indicate which
83+
table the `call_indirect` instruction was using. For those unfamiliar [LEB] is a
84+
way of encoding multi-byte integers in a smaller number of bytes for smaller
85+
integers. For example the 32-bit integer 0 can be encoded as `0x00` with a
86+
[LEB]. [LEB]s are flexible to additionally allow "overlong" encodings so the
87+
integer 0 can additionally be encoded as `0x80 0x00`.
88+
89+
LLVM's support of separate compilation of source code to a WebAssembly binary
90+
means that when an object file is emitted it does not know the final index of
91+
the table that is going to be used in the final binary. Before reference-types
92+
there was only one option, table 0, so `0x00` was always used when encoding
93+
`call_indirect` instructions. After reference-types, however, LLVM will emit an
94+
over-long [LEB] of the form `0x80 0x80 0x80 0x80 0x00` which is the maximal
95+
length of a 32-bit [LEB]. This [LEB] is then filled in by the linker with a
96+
relocation to the actual table index that is used by the final module.
97+
98+
When putting all of this together, it means that with LLVM 19, which has
99+
the `reference-types` feature enabled by default, any WebAssembly module with an
100+
indirect function call (which is almost always the case for Rust code) will
101+
produce a WebAssembly binary that cannot be decoded by engines and tooling that
102+
do not support the reference-types proposal. It is expected that this change
103+
will have a low impact due to the age of the reference-types proposal and
104+
breadth of implementation in engines. Given the multitude of WebAssembly
105+
engines, however, it's recommended that any WebAssembly users test out
106+
Rust 1.82 beta and see if the produced module still runs on their engine of
107+
choice.
108+
109+
### LLVM, Rust, and Multiple Tables
110+
111+
One interesting point worth mentioning is that despite the reference-types
112+
proposal enabling multiple tables in WebAssembly modules this is not actually
113+
taken advantage of at this time by either LLVM or Rust. WebAssembly modules
114+
emitted will still have at most one table of functions. This means that the
115+
over-long 5-byte encoding of index 0 as `0x80 0x80 0x80 0x80 0x00` is not
116+
actually necessary at this time. LLD, LLVM's linker for WebAssembly, wants to
117+
process all [LEB] relocations in a similar manner which currently forces this
118+
5-byte encoding of zero. For example when a function calls another function the
119+
`call` instruction encodes the target function index as a 5-byte [LEB] which is
120+
filled in by the linker. There is quite often more than one function so the
121+
5-byte encoding enables all possible function indices to be encoded.
122+
123+
In the future LLVM might start using multiple tables as well. For example LLVM
124+
may have a mode in the future where there's a table-per-function type instead of
125+
a single heterogenous table. This can enable engines to implement
126+
`call_indirect` more efficiently. This is not implemented at this time, however.
127+
128+
For users who want a minimally-sized WebAssembly module (e.g. if you're in a web
129+
context and sending bytes over the wire) it's recommended to use an optimization
130+
tool such as [`wasm-opt`] to shrink the size of the output of LLVM. Even before
131+
this change with reference-types it's recommended to do this as [`wasm-opt`] can
132+
typically optimize LLVM's default output even further. When optimizing a module
133+
through [`wasm-opt`] these 5-byte encodings of index 0 are all shrunk to a
134+
single byte.
135+
136+
## Enabling Multi-Value by Default
137+
138+
The second feature enabled by default in LLVM 19 is `multivalue`. The
139+
[multi-value proposal to WebAssembly][multi-value] enables functions to have
140+
more than one return value for example. WebAssembly instructions are
141+
additionally allowed to have more than one return value as well. This proposal
142+
is one of the first to get merged into the WebAssembly specification after the
143+
original MVP and has been implemented in many engines for quite some time.
144+
145+
The consequences of enabling this feature by default in LLVM are more minor for
146+
Rust, however, than enabling the `reference-types` feature by default. LLVM's
147+
default C ABI for WebAssembly code is not changing even when `multivalue` is
148+
enabled. Additionally Rust's `extern "C"` ABI for WebAssembly is not changing
149+
either and continues to match LLVM's (or strives to, [differences to
150+
LLVM](https://github.com/rust-lang/rust/issues/115666) are considered bugs to
151+
fix). Despite this though the change has the possibility of still affecting
152+
Rust users.
153+
154+
Rust for some time has supported an `extern "wasm"` ABI on Nightly which was an
155+
experimental means of exposing the ability of defining a function in Rust which
156+
returned multiple values (e.g. used the multi-value proposal). Due to
157+
infrastructural changes and refactorings in LLVM itself this feature of Rust has
158+
[been removed](https://github.com/rust-lang/rust/pull/127605) and is no longer
159+
supported on Nightly at all. As a result there is no longer any possible method
160+
of writing a function in Rust that returns multiple values at the WebAssembly
161+
function type level.
162+
163+
In summary this change is expected to not affect any Rust code in the wild
164+
unless you were using the Nightly feature of `extern "wasm"` in which case
165+
you'll be forced to drop support for that and use `extern "C"` instead.
166+
Supporting WebAssembly multi-return functions in Rust is a broader topic than
167+
this post can cover, but at this time it's an area that's ripe for contribution
168+
from suitably motivated contributors.
169+
170+
### Aside: ABI Stability and WebAssembly
171+
172+
While on the topic of ABIs and the `multivalue` feature it's perhaps worth
173+
also going over a bit what ABIs mean for WebAssembly. The current definition of
174+
the `extern "C"` ABI for WebAssembly is documented in the [tool-conventions
175+
repository](https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md)
176+
and this is what Clang implements for C code as well. LLVM implements enough
177+
support for lowering to WebAssembly as well to support all of this. The `extern
178+
"Rust` ABI is not stable on WebAssembly, as is the case for all Rust targets,
179+
and is subject to change over time. There is no reference documentation at this
180+
time for what `extern "Rust"` is on WebAssembly.
181+
182+
The `extern "C"` ABI, what C code uses by default as well, is difficult to
183+
change because stability is often required across different compiler versions.
184+
For example WebAssembly code compiled with LLVM 18 might be expected to work
185+
with code compiled by LLVM 20. This means that changing the ABI is a daunting
186+
task that requires version fields, explicit markers, etc, to help prevent
187+
mismatches.
188+
189+
The `extern "Rust"` ABI, however, is subject to change over time. A great
190+
example of this could be that when the `multivalue` feature is enabled the
191+
`extern "Rust"` ABI could be redefined to use the multiple-return-values that
192+
WebAssembly would then support. This would enable much more efficient returns
193+
of values larger than 64-bits. Implementing this would require support in LLVM
194+
though which is not currently present.
195+
196+
This all means that actually using multiple-returns in functions, or the
197+
WebAssembly feature that the `multivalue` enables, is still out on the horizon
198+
and not implemented. First LLVM will need to implement complete lowering support
199+
to generate WebAssembly functions with multiple returns, and then `extern
200+
"Rust"` can be change to use this when fully supported. In the yet-further-still
201+
future C code might be able to change, but that will take quite some time due to
202+
its cross-version-compatibility story.
203+
204+
## Enabling Future Proposals to WebAssembly
205+
206+
This is not the first time that a WebAssembly proposal has gone from
207+
off-by-default to on-by-default in LLVM, nor will it be the last. For example
208+
LLVM already enables the [sign-extension proposal][sign-ext] by default which
209+
MVP WebAssembly did not have. It's expected that in the not-too-distant future
210+
the
211+
[nontrapping-fp-to-int](https://github.com/WebAssembly/nontrapping-float-to-int-conversions)
212+
proposal will likely be enabled by default. These changes are currently not made
213+
with strict criteria in mind (e.g. N engines must have this implemented for M
214+
years), and there may be breakage that happens.
215+
216+
If you're using a WebAssembly engine that does not support the modules emitted
217+
by Rust 1.82 beta and LLVM 19 then your options are:
218+
219+
* Try seeing if the engine you're using has any updates available to it. You
220+
might be using an older version which didn't support a feature but a newer
221+
version supports the feature.
222+
* Open an issue to raise awareness that a change is causing breakage. This could
223+
either be done on your engine's repository, the Rust repository, or the
224+
WebAssembly
225+
[tool-conventions](https://github.com/WebAssembly/tool-conventions)
226+
repository. It's recommended to first search to confirm there isn't already an
227+
open issue though.
228+
* Recompile your code with features disabled, more on this in the next section.
229+
230+
The general assumption behind enabling new features by default is that it's a
231+
relatively hassle-free operation for end users while bringing performance
232+
benefits for everyone (e.g. nontrapping-fp-to-int will make float-to-int
233+
conversions more optimal). If updates end up causing hassle it's best to flag
234+
that early on so rollout plans can be adjusted if needed.
235+
236+
## Disabling on-by-default WebAssembly proposals
237+
238+
For a variety of reasons you might be motivated to disable on-by-default
239+
WebAssembly features: for example maybe your engine is difficult to update or
240+
doesn't support a new feature. Disabling on-by-default features is unfortunately
241+
not the easiest task. It is notably not sufficient to use
242+
`-Ctarget-features=-sign-ext` to disable a feature for just your own project's
243+
compilation because the Rust standard library, shipped in precompiled form, is
244+
still compiled with the feature enabled.
245+
246+
To disable on-by-default WebAssembly proposal it's required that you use Cargo's
247+
[`-Zbuild-std`](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std)
248+
feature. For example:
249+
250+
```shell
251+
$ export RUSTFLAGS=-Ctarget-cpu=mvp
252+
$ cargo +nightly build -Zbuild-std=panic_abort,std --target wasm32-unknown-unknown
253+
```
254+
255+
This will recompiled the Rust standard library in addition to your own code with
256+
the "MVP CPU" which is LLVM's placeholder for all WebAssembly proposals
257+
disabled. This will disable sign-ext, reference-types, multi-value, etc.
258+
259+
[llvm19]: https://github.com/rust-lang/rust/pull/127513
260+
[proposals]: https://github.com/WebAssembly/proposals
261+
[llvmenable]: https://github.com/llvm/llvm-project/pull/80923
262+
[LEB]: https://en.wikipedia.org/wiki/LEB128
263+
[`wasm-opt`]: https://github.com/WebAssembly/binaryen
264+
[multi-value]: https://github.com/webAssembly/multi-value
265+
[sign-ext]: https://github.com/webAssembly/sign-extension-ops

0 commit comments

Comments
 (0)