Skip to content

Remove weak symbols, use abort for defaults. #254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 14, 2025

Conversation

rslawson
Copy link
Contributor

@rslawson rslawson commented Jan 8, 2025

As per #247, using weak symbols for overridable functions has proven problematic with lto = true. This is a partial reversion of that, though there are still other functions this may yet be done with. There is also another solution to be done as a sort of nightly-only thing with naked functions and #[linkage = "weak"], however I have opted not to go with that seeing as this project doesn't currently require nightly and I don't think this is an issue worth changing that over.

@rslawson rslawson requested a review from a team as a code owner January 8, 2025 10:06
@rslawson
Copy link
Contributor Author

rslawson commented Jan 8, 2025

I think there may be something else that I need to do to the linker script, since as it is now it currently fails to build on one of the projects I'm working on with the error

  = note: rust-lld: error: 
          BUG(riscv-rt): start of .heap is not 4-byte aligned
          
          rust-lld: error: 
          BUG(riscv-rt): start of .heap is not 4-byte aligned

@romancardenas
Copy link
Contributor

Try to rebase from master. Build errors should disappear

@romancardenas
Copy link
Contributor

@rslawson please, add a brief note in README.md describing your changes

@rslawson
Copy link
Contributor Author

rslawson commented Jan 9, 2025

Yep, they did disappear. I'll make the change to README.md now. Are there any other symbols I should give the same treatment, or is it enough just to change DefaultHandler and ExceptionHandler?

Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I would like to discuss this work with @rust-embedded/all in the next meeting in case there is a better approach to deal with weak symbols and LTO optimizations in Rust.

@rslawson
Copy link
Contributor Author

In the meantime I'll try to keep this rebased with an up-to-date master branch from this repo.

@romancardenas
Copy link
Contributor

We discussed this today and... we didn't come up with a better solution.

Let's leave this PR and #247 open to keep discussing what to do with weak symbols. We must:

  1. Enumerate all the weak symbols we are currently using in riscv-rt
  2. Propose an alternative methodology for them. Not all symbols are easily replaceable as abort or interrupt handlers (e.g., mp_hook).

@rslawson , would you be happy to keep working on this PR until we decide what to do with the remaining weak symbols?

@rslawson
Copy link
Contributor Author

Sure thing, I can do that.

@rmsyn
Copy link
Contributor

rmsyn commented Jan 15, 2025

As per #247, using weak symbols for overridable functions has proven problematic with lto = true.

Is this only a problem with rust-lld (read lld)? If so, does using another linker with LTO resolve the issue you're experiencing @rslawson?

Maybe until there is a stable solution for weak linkage in Rust, we could recommend using a different linker instead of introducing nightly-only features?

@rslawson
Copy link
Contributor Author

Sorry for the late reply, got caught up in a work project for a while. I'm unsure if the problem is with the linker or in codegen, though bjorn3's answer suggests to me that it's the latter.

@ia0
Copy link
Contributor

ia0 commented Mar 5, 2025

#254 (comment)

Let's leave this PR and #247 open to keep discussing what to do with weak symbols.

Has there been progress regarding the fate of other weak symbols? Or regarding whether that fate should continue blocking this PR?

@romancardenas
Copy link
Contributor

There has not been progress as far as I know. I'm currently working on #238 and plan to work on this issue later. Of course, if someone investigates this and gets into a nice solution, it is more than welcome!

@romancardenas
Copy link
Contributor

Remaining weak symbols

  • _pre_init_trap: not sure if t can default to abort, as it must be 16-byte aligned to fulfill with the specs of xtvec.
  • __pre_init: cannot default to abort. Currently, the default implementation does nothing.
  • _mp_hook: cannot default to abort. This function is only required in multi-processor chips. The current implementation leaves other HARTS than 0 in a busy loop.
  • _setup_interrupts: cannot default to abort. It currently configures the xtvec CSR to work either in direct of vectored mode.

@romancardenas
Copy link
Contributor

romancardenas commented Mar 18, 2025

  • I propose a new no-setup-interrupts feature to opt-out the definition of this symbol in riscv-rt. This change is non-breaking.
  • I propose a pre-init feature to opt-in the declaration of this symbol in riscv-rt. Its implementation is always external to riscv-rt. This change is breaking. However, given that the current strategy of using the riscv-rt::pre_init macro to define it and it is unsound to run Rust code at this point, should be worth it.
  • I propose removing our _mp_hook implementation and forcing PACs or BSPs to implement theirs, as our current implementation leaves HARTs other than 0 spinning. This change is breaking for use cases of multi-core RISC-V boards that use our implementation. I suspect this is not a real scenario, as again our hook blocks all except one HARTs. If you find a scenario in which this feature is useful, we can add a feature to opt-out of its implementation instead.

Please let me know what you think about this.

@rslawson
Copy link
Contributor Author

I think that this is a fair compromise. No comment on the HARTs situation since every use case I have is single HART, so unsure of what the pros and cons are of this, so any other input is welcome.

Also, by opt-in and opt-out I assume you mean adding these features to the default features of the crate, yes?

@romancardenas
Copy link
Contributor

romancardenas commented Mar 18, 2025

So, for the _setup_interrupts (but applies to all the other symbols). There are two options.

Option 1: Additive feature, part of default

In Cargo.toml, it would look like:

[features]
default = ["setup-interrupts"]
setup-interrupts = []

In the code, the _setup_interrupts assembly function would be feature-gated under this new setup-interrupts feature:

#[cfg(feature = "setup-interrupts")]
cfg_global_asm!(
    "
_setup_interrupts:",
    #[cfg(not(feature = "v-trap"))]
    "la t0, _start_trap", // _start_trap is 16-byte aligned, so it corresponds to the Direct trap mode
    #[cfg(feature = "v-trap")]
    "la t0, _vector_table
    ori t0, t0, 0x1", // _vector_table is at least 4-byte aligned, so we must set the bit 0 to activate the Vectored trap mode
    #[cfg(feature = "s-mode")]
    "csrw stvec, t0",
    #[cfg(not(feature = "s-mode"))]
    "csrw mtvec, t0",
    "ret",
);

Option 2: Detrimental feature, no default features

In Cargo.toml, it would look like:

[features]
no-setup-interrupts = []

In the code, the _setup_interrupts assembly function would be feature-gated under this new no-setup-interrupts feature:

#[cfg(not(feature = "no-setup-interrupts"))]
cfg_global_asm!(
    ...
);

@romancardenas
Copy link
Contributor

I am not sure which approach is the best, or which approach is more aligned with the Rust ecosystem. Maybe option 1? In any case, I'm sure option 2 is non-breaking, while I'm not that sure about option 1.

If we decide to go for option 1, then we might need to review the no-interrupts and no-exceptions features to follow the same scheme always.

Again, just thinking out loud. I'm far from an expert in these topics. Let me know what you think.

Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alternative uses a no-abort feature to opt-out the definition of the abort symbol.

I think I prefer this approach of no-symbol features instead of using default features. With the no-symbol scheme, crates can explicitly drop parts of the riscv-rt crate. I'm not sure how default features work when one dependent crate uses no-default-features but other dependent crate does not explicitly disable default features.

@ia0
Copy link
Contributor

ia0 commented Mar 20, 2025

I'm not sure how default features work when one dependent crate uses no-default-features but other dependent crate does not explicitly disable default features.

If a crate depends on a non-additive feature (even if it's through the default feature) of one of its dependencies, then that feature is part of their public API. If it happens that they don't need this feature (and don't want it to be part of their public API, i.e. they don't need it at the moment), then it's a defect of that crate to depend on that feature, and that crate should be fixed. So in your example, that "other dependent crate [that] does not explicitly disable default features" is the problem, not the riscv-rt crate, even though the riscv-rt is the root of the problem by having non-additive features, but there's really no other choices here.

So I wouldn't worry too much about libraries misusing non-additive features, we can't improve the situation other than having no default features (or at least avoiding adding more). And I agree with your conclusion that option 2 is better because it doesn't add a default feature.

Note that when option 1 says "additive feature" this is technically wrong. The name sounds additive, but the feature is not, because symbols are not covariant, they are invariant. Code may require a symbol to be present or absent. This is different from adding a method to a type, because code cannot depend on this method being absent (only present).

@rmsyn
Copy link
Contributor

rmsyn commented Mar 20, 2025

I think I prefer this approach of no-symbol features instead of using default features. With the no-symbol scheme, crates can explicitly drop parts of the riscv-rt crate. I'm not sure how default features work when one dependent crate uses no-default-features but other dependent crate does not explicitly disable default features.

I'd recommend against a no-* anything feature. From my experience, and reading through style guides, subtractive features are generally discouraged.

As you mention, anything we want in the default builds should be additive features that get added to the default feature. Then, users who want an a-la-carte build can use default-features = false, and select the features they want to include.

@BartMassey
Copy link
Member

The default state of Cargo features, in which the only way to turn off a default feature is to turn them all off and put the rest back (very fragile), is not ideal. That said, yeah, a lot of machinery depends on being able to turn on all the features simultaneously and have everything still compile and work the same as before. Better to have an on-by-default feature than break convention, I think.

@romancardenas
Copy link
Contributor

romancardenas commented Mar 21, 2025

To clarify the situation, because, as @ia0 said, I'm not sure if additive/subtractive is a good terminology:

In riscv-rt we need an abort routine to make our runtime work. Historically, riscv-rt provides an abort loop, which is the most common situation. Thus, the abort loop must be enabled by default.

When linking to other crates or C libraries (e.g., #197), there is a chance that the abort symbol is already defined somewhere else. Therefore, we want users to be able to opt out the definition of the symbol. However, the abort symbol must still be defined somewhere else.

Option 1: abort feature

With this feature, riscv-rt defines its own abort loop. The feature must be part of the default feature

Option 2: no-abort feature

riscv-rt will only define its own abort loop when this feature is not enabled.

My concerns on using default features

I am a bit worried about using a lot of default features to avoid defining symbols such as abort, because if any crate in the dependency chain "forgets" about disabling default features, then the linker will probably find duplicate symbols. As said by @ia0 , this situation could be thought as a "misuse" of riscv-rt, and therefore it would not be our fault. But still, I'm worried about ergonomics. The typical scenario when using svd2rust is

  • user-app (depends on riscv-rt)
  • Other crates (e.g., rtic) (may depend on riscv-rt)
  • board-bsp (depends on riscv-rt)
  • chip-hal
  • chip-pac (depends on riscv-rt)

Here, PACs, BSPs, and other crates must use default-features = false to let use applications decide. It is not such a great deal, but we must warn about this change, document it, etc.

We must be aware of #[linkage = "weak"]

I hope that 29603 will eventually be stabilized, which will make this issue and PR outdated. I guess that we should vote for an approach that will make adapting the new riscv-rt to #[linkage = "weak"] smooth.

Let me know your thoughts :)

@ia0
Copy link
Contributor

ia0 commented Mar 21, 2025

I also believe we shouldn't rely on default features because they only make sense for additive features. And I also believe they are a misfeature. They should only be enabled by default in final crates (binaries) and not in libraries, but that's out of our hands and even out of those of Cargo's team since it would be a major breaking change.

I think using no-abort is a very reasonable choice, because it follows the intent of default features but for non-additive features. The default behavior is to define an abort symbol since that's the most useful for most users. For the rare cases where it's an issue, users have the option to opt out by enabling no-abort. And in some sense, no-abort is additive, because if a crate knows there will be a conflict for the abort symbol if riscv-rt were to define it, then this impacts the whole binary, regardless of the crate taking the decision in the dependency graph.

For those who think no-abort is subtractive because it starts with no-, we could rename that feature to custom-abort, user-defined-abort, user-abort, conflicting-abort, or abort-conflict. Then it feels additive (and is somehow additive in the sense described above, that symbol conflicts are additive).

@romancardenas
Copy link
Contributor

The custom-abort sounds good to me. Then we might also rename the no-interrupts and no-exceptions features to custom-*, as the use case is very similar.

@rmsyn
Copy link
Contributor

rmsyn commented Mar 22, 2025

To clarify the situation, because, as @ia0 said, I'm not sure if additive/subtractive is a good terminology:

In riscv-rt we need an abort routine to make our runtime work. Historically, riscv-rt provides an abort loop, which is the most common situation. Thus, the abort loop must be enabled by default.

Since this would be a breaking change anyway (requiring a minor version release), we could still use an abort feature, and not have a default feature-set. We would want to provide documentation, and some loud messaging, that riscv-rt will now need the abort feature unless the user defines their own definitions.

Option 2: no-abort feature

riscv-rt will only define its own abort loop when this feature is not enabled.

I really dislike the no-abort option. Mostly because it leads to the cfg(not(feature = "no-abort")) double-negative. If it ends up being the most ergonomic solution from a user perspective, I can get over my dislike.

I also believe we shouldn't rely on default features because they only make sense for additive features. And I also believe they are a misfeature.

If you think that default features are a misfeature, maybe take it up with the main compiler team. I would very much like to get some popcorn for that conversation :)

I agree that default-features = false, and adding back features you want is not exactly the most elegant solution. However, it is a widely used convention in the Rust ecosystem, and many users find it a very familiar way to handle conditional compilation.

If we have mutually dependent features, we should group them together in another feature, and recommend the grouped feature. I don't know of a way to enforce feature invariants without build script magic.

For those who think no-abort is subtractive because it starts with no-

No, it's subtractive because it literally conditionally excludes code from the library. It "subtracts" the abort routine, and other code from the library.

The custom-abort sounds good to me.

I also think this is a good compromise, if we decide not to wait for #[linkage = "weak"] stabilization.

The feature-gate would then look like:

#[cfg(not(feature = "custom-abort"))]
// default-abort-routine

@romancardenas
Copy link
Contributor

we could still use an abort feature, and not have a default feature-set.

Even if it is a breaking change, I would try my best not to break current "standard" applications. This is, the abort symbol must be included by default, and bumping the riscv-rt version should not break an application. So this is either using default + abort features or a custom-abort feature.

I personally prefer not to use default features, how these are propagated among dependencies is a bit wild, which makes it difficult for users to have full control. Also, if we adopt the custom-* approach, once #[linkage = "weak"] is stabilized, it is as easy as removing the features and letting know "unusual" users using these features that they no longer need them, as now symbols are weak and are automatically overwritten by the linker.

So, my vote is custom-abort. You are welcome to change my mind :)

@ia0
Copy link
Contributor

ia0 commented Mar 22, 2025

No, it's subtractive because it literally conditionally excludes code from the library. It "subtracts" the abort routine, and other code from the library.

That's also not what "additive feature" means. A feature is additive if enabling it doesn't break code (which happens through feature unification). It's true that for methods and functions (outside traits), a feature adding that method or function is additive. However this is not the case for trait methods and symbols which could all be considered "adding code".

In particular in the case of custom-abort, it is additive in the sense that if the feature is initially disabled, then a dependency is added that enables the feature (because it provides its own abort symbol), the feature is suddenly enabled for the whole build and the build doesn't break. However, if a dependency enables it without providing an abort symbol, then the build breaks, so it's not purely additive in that sense.

And note that the oppositive feature abort is in no sense additive. If a project builds without the feature enabled, enabling it (either by adding a dependency or because a dependency enables it), the project will fail to build because it already had an abort symbol, so by adding code you break the build with a duplicate symbol.

@rmsyn
Copy link
Contributor

rmsyn commented Mar 23, 2025

It's true that for methods and functions (outside traits), a feature adding that method or function is additive. However this is not the case for trait methods and symbols which could all be considered "adding code".

This makes no sense, you are literally saying the opposite thing in those two sentences. We are talking about a feature that would be used for defining a global assembly block. Rust code is compiled down into assembly. Explain the concrete difference.

From the Cargo Book:

A consequence of this is that features should be additive.

That is, enabling a feature should not disable functionality,
and it should usually be safe to enable any combination of features.

A feature should not introduce a [SemVer-incompatible change](https://doc.rust-lang.org/cargo/reference/features.html#semver-compatibility).

custom-abort, if enabled, disables the default abort implementation, which goes against the above convention. However, it is a convention, and uses the "should" (not "must") language. We should just be aware that the custom-abort is a subtractive feature.

Maybe it is the best solution given the situation that most (how are we measuring that?) current users expect a default abort implementation. So, allowing non-standard users to provide a custom abort requires us to either introduce default + abort, or the subtractive custom-abort.

Another notable exception to the above convention is the no_std attribute (not exactly a feature, but similar enough). It is an attribute that when positively defined in a library/binary, subtracts the alloc and std libraries from the prelude. Even here though, the above article recommends using a std feature to mark parts of a no_std-default library that require std capabilities. I've made that mistake in libraries I've worked on in the past, and been corrected for it.

no_std itself has a similar history, since the Rust language didn't even have the core, alloc, std in the beginning (libcore introduced in 1.6.0). This necessitated the introduction of the subtractive no_std attribute to handle use-cases of an already existing user-base. Had Rust started as what is now the core library with no OS abstractions, maybe the situation would be different. 🤷

Either way, we are breaking the convention by introducing a feature with a SemVer-incompatible change.

Honestly, I would prefer to wait for the proper fix provided by a stable #[linkage = "weak"].

@romancardenas
Copy link
Contributor

It can take years before #[linkage = "weak"] becomes usable in stable. And the current state of riscv-rt makes it impossible to perform LTOs if someone overrides a weak symbol. I think we should provide an alternative to users using temporary custom-* features to make our ecosystem better until stable Rust provides a sane way of using weak symbols. Proper documentation should suffice, IMHO.

In general, symbols such as abort, pre_init_trap, pre_init, etc. are not overriden. Thus, riscv-rt must provide these symbols out-of-the-box unless users explicitly opt them out.

@romancardenas
Copy link
Contributor

romancardenas commented Mar 23, 2025

Well, I just thought that there is a third option, which is very similar to the current state of this PR: defining _default_* symbols.

In this PR, there is an _abort routine. This routine cannot be discarded. In the linker file, if the abort symbol is not defined, then it uses _abort. However, if other crate/library defines an abort symbol, then this symbol is used.

The main drawback here is that, even if the _abort symbol is not used, the final binary contains all these _default_* symbols, taking space in memory. Originally, riscv-rt was like this, but we moved to pure weak symbols as per #155 to prune unused symbols. However, this change led to breaking LTOs, which, in my opinion, is an important drawback.

So, even though there is no perfect solution until #[linkage = "weak"] is stable, considering that most of these symbols are not overriden, maybe it is better to go back to default symbols that can be overriden at linker file level. No default features nor no-*/custom-* features. Is this a fair compromise?

@romancardenas
Copy link
Contributor

@rslawson could you please rename _abort to _default_abort and resolve conflicts? The symbols that are overridden more often are perhaps DefaultHandler and ExceptionHandler, so this PR would at least allow LTOs in those cases.

@rslawson
Copy link
Contributor Author

Yep, will do. Sorry for the delay - was ill, and have a couple things on my plate at work before I can get to this. It's on the docket for sure though (:

Copy link
Contributor

@romancardenas romancardenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicts solved, just needs to change _abort to _default_abort and is ready to go

Co-authored-by: Román Cárdenas Rodríguez <[email protected]>
@ia0
Copy link
Contributor

ia0 commented Apr 14, 2025

No surprises, but just for confirmation. This PR (at commit 58d4281, the latest when writing this comment) is fixing #247 in my project.

@romancardenas romancardenas added this pull request to the merge queue Apr 14, 2025
Merged via the queue into rust-embedded:master with commit 23be2d1 Apr 14, 2025
138 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants