Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bioconductor packages run_exports suggestions #43905

Closed
dlaehnemann opened this issue Oct 26, 2023 · 6 comments
Closed

bioconductor packages run_exports suggestions #43905

dlaehnemann opened this issue Oct 26, 2023 · 6 comments

Comments

@dlaehnemann
Copy link
Member

It seems like I am the first person trying to include a build: run_exports: section for a bioconductor- package, as I can't find any bioconductor- recipe that contains the string run_exports in its code. But I am getting the respective lint in an (unrelated) fix to the bioconductor-rhdf5package:

ERROR: recipes/bioconductor-rhdf5/meta.yaml:0: missing_run_exports: Recipe should have a run_export statement that ensures correct pinning in downstream packages

(I also found one other bioconductor- PR that I assume is running into the same lint.)

The lint and its background is nicely explained in the docs, but I am a bit unsure to which level (major, minor or patch version) to pin the subpackage. Here's my suggestion for now:

build:
  run_exports:
    - {{ pin_subpackage(name, max_pin="x.x") }}

I figured I'd document this here and open it up for discussion, as it might make sense to either add this to the docs or even update all bioconductor- packages in a bulk rebuild?!

As the bioconda lint docs describe, the idea of requiring these run_exports is to ensure "that the package is automatically pinned to a compatible version if it is used as a dependency in another recipe. This is a conservative strategy to avoid breakage." So I am assuming we are trying to pin to the major version if possible, to allow for flexibility in solving environments, while resorting to minor or patch version pinning if we expect breaking changes in those.

For bioconductor, version numbering is described in their docs, but doesn't seem to follow semantic versioning. The only sentence I could find, that alludes to anything like a breaking change is this one:

As a special case, any package with version x.99.z is bumped to (x+1) . 0 . (z-z) in release, and (x+1) . 1 . 0 in devel. Thus authors wishing to signify a major change to their package should set y to 99 in their devel package.

This could imply, that breaking changes should only happen with major version updates, but I think the wording is too vague for package authors to necessarily understand it that way. Also, I found a recent master's thesis via web search, that proves that breaking changes in the 100 most downloaded bioconductor packages happen all the time, often without being properly reflected via semantic version numbering (section 5.3) and without being mentioned in NEWS.md files (section 5.4).

So for now I would take the conservative approach to pin the minor version by default, and would only suggest to only pin the major version in packages where we can be sure that semantic versioning is followed---either because the package author themselves maintain the bioconda recipe and can confirm that they follow semantic versioning, or somebody has inspected the package code (history) to confirm this.

In the long run, it might be interesting to try and automate an analysis to determine on which level to pin on a per-package basis. I think I found the repository (and the branch with the latest code) used in that master's thesis (but it is in Javascript, a language I am not very familiar with, so I didn't dig deeper):
https://github.com/Hemayet-Chowdhury/BCD-concat/tree/august_2022

So if somebody feels like creating such a tool for bioconductor (and/or CRAN?) packages in the future, this might be a good starting point.

@mfansler
Copy link
Member

mfansler commented Oct 26, 2023

Please consider modifying recipes to run unit tests if adding run_exports.

I think ensuring interoperability is a noble goal and would like to contribute to that end. However, my first reaction to hearing this aim is that imposing run_exports is premature and its primary outcome will be to prompt more "why won't Conda install the latest version?" grievances from end-users.

Specifically, I'm thinking that for nearly all R recipes there is no testing of compatibility during build to justify enforcing a version constraint at solve/runtime. When an R package builds in Conda, only the ability to load the library is tested, and that merely requires that the other packages be available.1 Unless there is some attempt at running unit tests to ensure interoperability exists in the first place, then pinning a version through run_exports is arbitrary and merely contingent on being contemporaneous. That seems like an unnecessary imposition on users.

Further, it is possible that merely adding run_exports without running tests could even lead to cases where we have to rebuild packages/patch metadata because the captured constraint is actually incompatible at runtime.


[1]: Exceptions here are LinkingTo: packages, e.g., Rcpp. We should maybe be adding run_exports on any such packages over on Conda Forge... 🤔

@johanneskoester
Copy link
Contributor

It is certainly a tradeoff. But post-hoc fixing of introduced breakages or having weird errors as a user is a worse experience than some conservative package dependencies. Also, bioconductor packages are frequently bulk (on the new bioconductor releases) updated so that the version incompatibilites should be quite limited.

@johanneskoester
Copy link
Contributor

And yes, unit tests should be run whenever possible, as long as they don't take more than a couple of seconds. PRs welcome!

@dlaehnemann
Copy link
Member Author

Recommendation by @grimbough, developer of the bioconductor-rhdf5* packages (and others, I think), and involved in the Bioconductor community:

If I was trying to pin version numbers I think I would aim to fix on the minor number for all packages across a specific Bioconductor release. It is expected that they will all be compatible with each other. I don't know how feasible that is in your infrastructure, but from those recipe files it looks like bioconda already has some concept of the Bioconductor release numbering etc. so maybe it's actually fairly simple. Things might get messier with CRAN dependencies added to the mix, but you might have to rely on package authors putting that in the DESCRIPTION files since it probably fundamentally affects the package.

Full comment with some more background is in this separate, originally more specific issue.

@aliciaaevans
Copy link
Contributor

As of the latest bulk update to Bioconductor 3.18, the Bioconductor recipes have run_exports with max_pin="x.x". See also bioconda/bioconda-utils#944

@dlaehnemann
Copy link
Member Author

Thanks, for pointing me to this. I cross-referenced a bit more and will close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants