|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "WebAssembly targets: change in default target-features" |
| 4 | +author: Alex Crichton |
| 5 | +team: The Compiler Team <https://www.rust-lang.org/governance/teams/compiler> |
| 6 | +--- |
| 7 | + |
| 8 | +The Rust compiler has [recently upgraded to using LLVM 19][llvm19] and this |
| 9 | +change accompanies some updates to the default set of target features enabled |
| 10 | +for WebAssembly targets of the Rust compiler. Beta Rust today, which will |
| 11 | +become Rust 1.82 on 2024-10-17, reflects all of these changes and can be |
| 12 | +used for testing. |
| 13 | + |
| 14 | +WebAssembly is an evolving standard where extensions are being added over |
| 15 | +time through a [proposals process][proposals]. WebAssembly proposals reach |
| 16 | +maturity, get merged into the specification itself, get implemented in engines, |
| 17 | +and remain this way for quite some time before producer toolchains (e.g. LLVM) |
| 18 | +update to **enable these sufficiently-mature proposals by default**. In LLVM 19 |
| 19 | +this has happened with the [multi-value and reference-types |
| 20 | +proposals][llvmenable] for the LLVM/Rust target features `multivalue` and |
| 21 | +`reference-types`. These are now enabled by default in LLVM and transitively |
| 22 | +means that it's enabled by default for Rust as well. |
| 23 | + |
| 24 | +WebAssembly targets for Rust now [have improved |
| 25 | +documentation](https://github.com/rust-lang/rust/pull/128511) about WebAssembly |
| 26 | +proposals and their corresponding target features. This post is going to review |
| 27 | +these changes and go into depth about what's changing in LLVM. |
| 28 | + |
| 29 | +## WebAssembly Proposals and Compiler Target Features |
| 30 | + |
| 31 | +WebAssembly proposals are the formal means by which the WebAssembly standard |
| 32 | +itself is evolved over time. Most proposals need toolchain integration in one |
| 33 | +form or another, for example new flags in LLVM or the Rust compiler. The |
| 34 | +`-Ctarget-feature=...` mechanism is used to implement this today. This is a |
| 35 | +signal to LLVM and the Rust compiler which WebAssembly proposals are enabled or |
| 36 | +disabled. |
| 37 | + |
| 38 | +There is a loose coupling between the name of a proposal (often the name of the |
| 39 | +github repository of the proposal) and the feature name LLVM/Rust use. For |
| 40 | +example there is the [multi-value |
| 41 | +proposal](https://github.com/webAssembly/multi-value) but a `multivalue` |
| 42 | +feature. |
| 43 | + |
| 44 | +The lifecycle of the implementation of a feature in Rust/LLVM typically looks |
| 45 | +like: |
| 46 | + |
| 47 | +1. A new WebAssembly proposal is created in a new repository, for example |
| 48 | + WebAssembly/foo. |
| 49 | +2. Eventually Rust/LLVM implement the proposal under `-Ctarget-feature=+foo` |
| 50 | +3. Eventually the upstream proposal is merged into the specification, and |
| 51 | + WebAssembly/foo becomes an archived repository |
| 52 | +4. Rust/LLVM enable the `-Ctarget-feature=+foo` feature by default but typically |
| 53 | + retain the ability to disable it as well. |
| 54 | + |
| 55 | +The `reference-types` and `multivalue` target features in Rust are at step (4) |
| 56 | +here now and this post is explaining the consequences of doing so. |
| 57 | + |
| 58 | +## Enabling Reference Types by Default |
| 59 | + |
| 60 | +The [reference-types proposal to |
| 61 | +WebAssembly](https://github.com/webAssembly/reference-types) introduced a few |
| 62 | +new concepts to WebAssembly, notably the `externref` type which is a |
| 63 | +host-defined GC resource that WebAssembly cannot access but can pass around. |
| 64 | +Rust does not have support for the WebAssembly `externref` type and LLVM 19 does |
| 65 | +not change that. WebAssembly modules produced from Rust will continue to not use |
| 66 | +the `externref` type nor have a means of being able to do so. This may be |
| 67 | +enabled in the future (e.g. a hypothetical `core::arch::wasm32::Externref` type |
| 68 | +or similar), but it will mostly likely only be done on an opt-in basis |
| 69 | +and will not affect preexisting code by default. |
| 70 | + |
| 71 | +Also included in the reference-types proposal, however, was the ability to have |
| 72 | +multiple WebAssembly tables in a single module. In the original version of the |
| 73 | +WebAssembly specification only a single table was allowed and this restriction |
| 74 | +was relaxed with the reference-types proposal. WebAssembly tables are used by |
| 75 | +LLVM and Rust to implement indirect function calls. For example function |
| 76 | +pointers in WebAssembly are actually table indices and indirect function calls |
| 77 | +are a WebAssembly `call_indirect` instruction with this table index. |
| 78 | + |
| 79 | +With the reference-types proposal the binary encoding of `call_indirect` |
| 80 | +instructions was updated. Prior to the reference-types proposal `call_indirect` |
| 81 | +was encoded with a fixed zero byte in its instruction (required to be exactly |
| 82 | +0x00). This fixed zero byte was relaxed to a 32-bit [LEB] to indicate which |
| 83 | +table the `call_indirect` instruction was using. For those unfamiliar [LEB] is a |
| 84 | +way of encoding multi-byte integers in a smaller number of bytes for smaller |
| 85 | +integers. For example the 32-bit integer 0 can be encoded as `0x00` with a |
| 86 | +[LEB]. [LEB]s are flexible to additionally allow "overlong" encodings so the |
| 87 | +integer 0 can additionally be encoded as `0x80 0x00`. |
| 88 | + |
| 89 | +LLVM's support of separate compilation of source code to a WebAssembly binary |
| 90 | +means that when an object file is emitted it does not know the final index of |
| 91 | +the table that is going to be used in the final binary. Before reference-types |
| 92 | +there was only one option, table 0, so `0x00` was always used when encoding |
| 93 | +`call_indirect` instructions. After reference-types, however, LLVM will emit an |
| 94 | +over-long [LEB] of the form `0x80 0x80 0x80 0x80 0x00` which is the maximal |
| 95 | +length of a 32-bit [LEB]. This [LEB] is then filled in by the linker with a |
| 96 | +relocation to the actual table index that is used by the final module. |
| 97 | + |
| 98 | +When putting all of this together, it means that with LLVM 19, which has |
| 99 | +the `reference-types` feature enabled by default, any WebAssembly module with an |
| 100 | +indirect function call (which is almost always the case for Rust code) will |
| 101 | +produce a WebAssembly binary that cannot be decoded by engines and tooling that |
| 102 | +do not support the reference-types proposal. It is expected that this change |
| 103 | +will have a low impact due to the age of the reference-types proposal and |
| 104 | +breadth of implementation in engines. Given the multitude of WebAssembly |
| 105 | +engines, however, it's recommended that any WebAssembly users test out |
| 106 | +Rust 1.82 beta and see if the produced module still runs on their engine of |
| 107 | +choice. |
| 108 | + |
| 109 | +### LLVM, Rust, and Multiple Tables |
| 110 | + |
| 111 | +One interesting point worth mentioning is that despite the reference-types |
| 112 | +proposal enabling multiple tables in WebAssembly modules this is not actually |
| 113 | +taken advantage of at this time by either LLVM or Rust. WebAssembly modules |
| 114 | +emitted will still have at most one table of functions. This means that the |
| 115 | +over-long 5-byte encoding of index 0 as `0x80 0x80 0x80 0x80 0x00` is not |
| 116 | +actually necessary at this time. LLD, LLVM's linker for WebAssembly, wants to |
| 117 | +process all [LEB] relocations in a similar manner which currently forces this |
| 118 | +5-byte encoding of zero. For example when a function calls another function the |
| 119 | +`call` instruction encodes the target function index as a 5-byte [LEB] which is |
| 120 | +filled in by the linker. There is quite often more than one function so the |
| 121 | +5-byte encoding enables all possible function indices to be encoded. |
| 122 | + |
| 123 | +In the future LLVM might start using multiple tables as well. For example LLVM |
| 124 | +may have a mode in the future where there's a table-per-function type instead of |
| 125 | +a single heterogenous table. This can enable engines to implement |
| 126 | +`call_indirect` more efficiently. This is not implemented at this time, however. |
| 127 | + |
| 128 | +For users who want a minimally-sized WebAssembly module (e.g. if you're in a web |
| 129 | +context and sending bytes over the wire) it's recommended to use an optimization |
| 130 | +tool such as [`wasm-opt`] to shrink the size of the output of LLVM. Even before |
| 131 | +this change with reference-types it's recommended to do this as [`wasm-opt`] can |
| 132 | +typically optimize LLVM's default output even further. When optimizing a module |
| 133 | +through [`wasm-opt`] these 5-byte encodings of index 0 are all shrunk to a |
| 134 | +single byte. |
| 135 | + |
| 136 | +## Enabling Multi-Value by Default |
| 137 | + |
| 138 | +The second feature enabled by default in LLVM 19 is `multivalue`. The |
| 139 | +[multi-value proposal to WebAssembly][multi-value] enables functions to have |
| 140 | +more than one return value for example. WebAssembly instructions are |
| 141 | +additionally allowed to have more than one return value as well. This proposal |
| 142 | +is one of the first to get merged into the WebAssembly specification after the |
| 143 | +original MVP and has been implemented in many engines for quite some time. |
| 144 | + |
| 145 | +The consequences of enabling this feature by default in LLVM are more minor for |
| 146 | +Rust, however, than enabling the `reference-types` feature by default. LLVM's |
| 147 | +default C ABI for WebAssembly code is not changing even when `multivalue` is |
| 148 | +enabled. Additionally Rust's `extern "C"` ABI for WebAssembly is not changing |
| 149 | +either and continues to match LLVM's (or strives to, [differences to |
| 150 | +LLVM](https://github.com/rust-lang/rust/issues/115666) are considered bugs to |
| 151 | +fix). Despite this though the change has the possibility of still affecting |
| 152 | +Rust users. |
| 153 | + |
| 154 | +Rust for some time has supported an `extern "wasm"` ABI on Nightly which was an |
| 155 | +experimental means of exposing the ability of defining a function in Rust which |
| 156 | +returned multiple values (e.g. used the multi-value proposal). Due to |
| 157 | +infrastructural changes and refactorings in LLVM itself this feature of Rust has |
| 158 | +[been removed](https://github.com/rust-lang/rust/pull/127605) and is no longer |
| 159 | +supported on Nightly at all. As a result there is no longer any possible method |
| 160 | +of writing a function in Rust that returns multiple values at the WebAssembly |
| 161 | +function type level. |
| 162 | + |
| 163 | +In summary this change is expected to not affect any Rust code in the wild |
| 164 | +unless you were using the Nightly feature of `extern "wasm"` in which case |
| 165 | +you'll be forced to drop support for that and use `extern "C"` instead. |
| 166 | +Supporting WebAssembly multi-return functions in Rust is a broader topic than |
| 167 | +this post can cover, but at this time it's an area that's ripe for contribution |
| 168 | +from suitably motivated contributors. |
| 169 | + |
| 170 | +### Aside: ABI Stability and WebAssembly |
| 171 | + |
| 172 | +While on the topic of ABIs and the `multivalue` feature it's perhaps worth |
| 173 | +also going over a bit what ABIs mean for WebAssembly. The current definition of |
| 174 | +the `extern "C"` ABI for WebAssembly is documented in the [tool-conventions |
| 175 | +repository](https://github.com/WebAssembly/tool-conventions/blob/main/BasicCABI.md) |
| 176 | +and this is what Clang implements for C code as well. LLVM implements enough |
| 177 | +support for lowering to WebAssembly as well to support all of this. The `extern |
| 178 | +"Rust` ABI is not stable on WebAssembly, as is the case for all Rust targets, |
| 179 | +and is subject to change over time. There is no reference documentation at this |
| 180 | +time for what `extern "Rust"` is on WebAssembly. |
| 181 | + |
| 182 | +The `extern "C"` ABI, what C code uses by default as well, is difficult to |
| 183 | +change because stability is often required across different compiler versions. |
| 184 | +For example WebAssembly code compiled with LLVM 18 might be expected to work |
| 185 | +with code compiled by LLVM 20. This means that changing the ABI is a daunting |
| 186 | +task that requires version fields, explicit markers, etc, to help prevent |
| 187 | +mismatches. |
| 188 | + |
| 189 | +The `extern "Rust"` ABI, however, is subject to change over time. A great |
| 190 | +example of this could be that when the `multivalue` feature is enabled the |
| 191 | +`extern "Rust"` ABI could be redefined to use the multiple-return-values that |
| 192 | +WebAssembly would then support. This would enable much more efficient returns |
| 193 | +of values larger than 64-bits. Implementing this would require support in LLVM |
| 194 | +though which is not currently present. |
| 195 | + |
| 196 | +This all means that actually using multiple-returns in functions, or the |
| 197 | +WebAssembly feature that the `multivalue` enables, is still out on the horizon |
| 198 | +and not implemented. First LLVM will need to implement complete lowering support |
| 199 | +to generate WebAssembly functions with multiple returns, and then `extern |
| 200 | +"Rust"` can be change to use this when fully supported. In the yet-further-still |
| 201 | +future C code might be able to change, but that will take quite some time due to |
| 202 | +its cross-version-compatibility story. |
| 203 | + |
| 204 | +## Enabling Future Proposals to WebAssembly |
| 205 | + |
| 206 | +This is not the first time that a WebAssembly proposal has gone from |
| 207 | +off-by-default to on-by-default in LLVM, nor will it be the last. For example |
| 208 | +LLVM already enables the [sign-extension proposal][sign-ext] by default which |
| 209 | +MVP WebAssembly did not have. It's expected that in the not-too-distant future |
| 210 | +the |
| 211 | +[nontrapping-fp-to-int](https://github.com/WebAssembly/nontrapping-float-to-int-conversions) |
| 212 | +proposal will likely be enabled by default. These changes are currently not made |
| 213 | +with strict criteria in mind (e.g. N engines must have this implemented for M |
| 214 | +years), and there may be breakage that happens. |
| 215 | + |
| 216 | +If you're using a WebAssembly engine that does not support the modules emitted |
| 217 | +by Rust 1.82 beta and LLVM 19 then your options are: |
| 218 | + |
| 219 | +* Try seeing if the engine you're using has any updates available to it. You |
| 220 | + might be using an older version which didn't support a feature but a newer |
| 221 | + version supports the feature. |
| 222 | +* Open an issue to raise awareness that a change is causing breakage. This could |
| 223 | + either be done on your engine's repository, the Rust repository, or the |
| 224 | + WebAssembly |
| 225 | + [tool-conventions](https://github.com/WebAssembly/tool-conventions) |
| 226 | + repository. It's recommended to first search to confirm there isn't already an |
| 227 | + open issue though. |
| 228 | +* Recompile your code with features disabled, more on this in the next section. |
| 229 | + |
| 230 | +The general assumption behind enabling new features by default is that it's a |
| 231 | +relatively hassle-free operation for end users while bringing performance |
| 232 | +benefits for everyone (e.g. nontrapping-fp-to-int will make float-to-int |
| 233 | +conversions more optimal). If updates end up causing hassle it's best to flag |
| 234 | +that early on so rollout plans can be adjusted if needed. |
| 235 | + |
| 236 | +## Disabling on-by-default WebAssembly proposals |
| 237 | + |
| 238 | +For a variety of reasons you might be motivated to disable on-by-default |
| 239 | +WebAssembly features: for example maybe your engine is difficult to update or |
| 240 | +doesn't support a new feature. Disabling on-by-default features is unfortunately |
| 241 | +not the easiest task. It is notably not sufficient to use |
| 242 | +`-Ctarget-features=-sign-ext` to disable a feature for just your own project's |
| 243 | +compilation because the Rust standard library, shipped in precompiled form, is |
| 244 | +still compiled with the feature enabled. |
| 245 | + |
| 246 | +To disable on-by-default WebAssembly proposal it's required that you use Cargo's |
| 247 | +[`-Zbuild-std`](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std) |
| 248 | +feature. For example: |
| 249 | + |
| 250 | +```shell |
| 251 | +$ export RUSTFLAGS=-Ctarget-cpu=mvp |
| 252 | +$ cargo +nightly build -Zbuild-std=panic_abort,std --target wasm32-unknown-unknown |
| 253 | +``` |
| 254 | + |
| 255 | +This will recompiled the Rust standard library in addition to your own code with |
| 256 | +the "MVP CPU" which is LLVM's placeholder for all WebAssembly proposals |
| 257 | +disabled. This will disable sign-ext, reference-types, multi-value, etc. |
| 258 | + |
| 259 | +[llvm19]: https://github.com/rust-lang/rust/pull/127513 |
| 260 | +[proposals]: https://github.com/WebAssembly/proposals |
| 261 | +[llvmenable]: https://github.com/llvm/llvm-project/pull/80923 |
| 262 | +[LEB]: https://en.wikipedia.org/wiki/LEB128 |
| 263 | +[`wasm-opt`]: https://github.com/WebAssembly/binaryen |
| 264 | +[multi-value]: https://github.com/webAssembly/multi-value |
| 265 | +[sign-ext]: https://github.com/webAssembly/sign-extension-ops |
0 commit comments