Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transition to versioned bids functions #349

Merged
merged 25 commits into from
Feb 1, 2024

Conversation

pvandyken
Copy link
Contributor

@pvandyken pvandyken commented Dec 11, 2023

TODOS

  • test bidspec mocking docstring_formatter as unavailable to ensure docstring still added
  • exclude if TYPE_CHECKING: from codecov

Another large PR! The task ended up being more finicky than expected, but I'll try to justify some of my decisions below:

Version functions using configuration, rather than import

The versioned imports (e.g. import bids_v0_0_0 as bids) would have worked well in our snakemake context, where we typically directly import the bids function. However, it doesn't scale well to any other context. There's three particular problems to point out:

  1. If bids() is only imported once, as in snakemake, all is well, but in a standard python project, where bids must be repeatedly imported, now you have to ensure you import the same version in every file
  2. I'm hoping one day to have bids-like methods directly on BidsComponent that will derive new paths based on the old component. The versioned imports wouldn't help configure these methods.
  3. The long-term goal is to move path building methods from snakebids into rsbids. rsbids will have a much more robust configuration system, and the versioned imports wouldn't fit well in it.

So I decided instead to make the bids spec globally configurable via snakebids.set_bids_spec. The function takes a spec object (importable via snakebids.paths.specs.vx_x_x()), or the string name of one of the official specs (e.g. "vx_x_x"). Obviously, this introduces non-pure, state-dependent behaviour for bids(), but I think this type of thing is very typical for configuration in python libraries (e.g. numpy, matplotlib, etc all use similar global config objects).

Dynamic generation of python code and interface files

In an effort to reduce the impressive levels of code duplication, especially docstrings, that would have been required for the versioned bids functions, I used a lot of dynamic importing in the path module. Basically, you can define __getattr__ directly in a module, and that will be used for dynamic imports. Then, to avoid reading a bunch of specs that would never get used, I transitioned the entire project to lazy imports (made possible by a custom fork of mkinit that is awaiting a PR). Finally, to keep type checking working, I set up a script to automatically generate *.pyi files for the dynamically generated versioned functions.

And then... I decided to switch over to the configurable system. So now the above paragraph only pertains to the spec reading functions (in snakebids.paths.specs), which are still dynamically generated and versioned. So I'm keeping around all the code generation stuff, but if it seems like overkill, that's why.

Documentation updates

I did some reorganization of the documentation, converting the very flat structure into a more explicit "User guide" and "API" sections. The main api page has been subdivided into different subpages based on category. Unfortunately, the API is complex enough that I could no longer sanely rely on autogen to make the pages, so most of the main functions are explicitly list in the documentation source (although the "Internal" API page is mostly automatic). So something to keep in mind for future maintenance

Blanket deprecation of include_subject_dir and include_session_dir

My original idea was to keep these arguments only for the v0_0_0 spec and not have them for all future specs. But the logic to maintain that exception was getting too complex, so I decided to still keep those arguments for every spec, but deprecate them, with plans for removal in 1.0. I don't know that many people use these arguments anyway (maybe in some of @akhanf's workflows?)

@pvandyken pvandyken force-pushed the feat/bids/versioned branch 3 times, most recently from 0e17fbe to a20fc2b Compare December 12, 2023 18:56
@akhanf
Copy link
Member

akhanf commented Dec 12, 2023

Nice work! Haven't gone through what you've done yet, but looks really cool at a glance!

re: the deprecation of include_subject_dir and include_session_dir that sounds fine, I use it rarely enough that I am good to either keep those uses pinned to an early snakebids version, or adapt the code moving forward.. By the way, what is the preferred way to use e.g. the subject entity in a filename only and not in the folder?

@pvandyken
Copy link
Contributor Author

At this point a custom spec:

bids_factory(specs.v0_0_0(sub_dir=False))(subject="001")

If you need such paths frequently, otherwise use Path(...).name.

Is that sufficient? We can consider a more privileged API if it's sufficiently useful, but I'm hoping the above methods are enough.

@pvandyken
Copy link
Contributor Author

pvandyken commented Dec 13, 2023

One other thing to discuss before this is ready, I still want the bids function to return a Path object rather than a string (if/when this gets ported over to rsbids, it would return a BidsPath). I know I tried this before and broke some workflows (probably because there's workflows doing + operations?). So one, do people think this is a worthwhile goal to begin with, and two, what would be an acceptable transition strategy? If we go with this, the two likely options would be 1) keep a permanent exception for the v0_0_0 spec, and 2) keep a str return until 1.0, and then hard switch everything. 1) is more permanent, 2) is much easier to implement

@kaitj, @akhanf

@kaitj
Copy link
Contributor

kaitj commented Dec 14, 2023

Wow - a lot of great work done here! Will wait for the "ready for review" before going through it.

To your discussion / questions:

  1. I personally agree with returning a Path object, based on my own use. That said, I think this might be a discussion that is worth calling a dev meeting for to get some more opinions of others who are working with Snakebids.
  2. I think if we are make the switch, trying to keep a permanent exception might be a softer transition. The hard switch I think may cause workflows to break unexpectedly when the Snakebids upgrade ultimately done as you noted from your previous attempts.

@pvandyken
Copy link
Contributor Author

pvandyken commented Dec 14, 2023

I think if we are make the switch, trying to keep a permanent exception might be a softer transition. The hard switch I think may cause workflows to break unexpectedly when the Snakebids upgrade ultimately done as you noted from your previous attempts.

Yeah, so with a hard switch, it definitely wouldn't be until 1.0. Basically any workflows that break with the change would have to pin their snakebids version until they can be fixed.

One other advantage of a soft switch is that we wouldn't have to wait until 1.0 to get path returned. But it would make the typing difficult... I can't really type a function based on a configuration, so either we'd have to say it returns Path | str, which isn't super helpful (and not true anyway), or lie and say just Path. So that's not great either.

One other option: if/when this functionality gets ported to rsbids, I plan on renaming the function build. My intent would be to re-export it into snakebids as bids to preserve compatibility, but another option is to start introducing the name change now, so build would return a Path and bids would return a str, but otherwise be the same. To be honest I'm not a huge fan of this, as we have two nearly identical functions whose difference is not clearly indicated by the names, but it is an option.

@kaitj
Copy link
Contributor

kaitj commented Dec 14, 2023

Hmm, yeah, I guess I wasn't thinking about the type hinting too much there. The more I think about it, I think the hard switch is fine as long as we give ample warning (i.e. have a warning message now or whenever we decide that this is going to change come 1.0).

I agree about the redundancy - I'd rather choose one path and go down that way.

@pvandyken pvandyken force-pushed the feat/bids/versioned branch 6 times, most recently from 0b2a7aa to 02d07cb Compare December 20, 2023 02:33
@codecov-commenter
Copy link

codecov-commenter commented Dec 20, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (19215f8) 90.26% compared to head (c5ae653) 90.73%.
Report is 7 commits behind head on main.

❗ Current head c5ae653 differs from pull request most recent head ea55e1a. Consider uploading reports for the commit ea55e1a to get more accurate results

Files Patch % Lines
snakebids/paths/_utils.py 91.42% 3 Missing ⚠️
snakebids/utils/containers.py 97.84% 2 Missing ⚠️
snakebids/paths/_config.py 96.15% 1 Missing ⚠️
snakebids/paths/_templates/spec_func.py 97.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
+ Coverage   90.26%   90.73%   +0.47%     
==========================================
  Files          32       38       +6     
  Lines        1489     1652     +163     
==========================================
+ Hits         1344     1499     +155     
- Misses        145      153       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pvandyken pvandyken force-pushed the feat/bids/versioned branch 3 times, most recently from 66c8ef0 to 131e5ba Compare January 15, 2024 19:17
@pvandyken pvandyken marked this pull request as ready for review January 15, 2024 20:27
Copy link
Contributor

@kaitj kaitj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This largely looks good to me, but given its a fairly large PR, I'll take a second look through to make sure I didn't miss anything before approving.

This is a necessary migration to support our versioned bids functions,
so that the functions can be dynamically generated on demand

To preserve type safety, an __init__.pyi file will be generated for each
module, in addition to the __init__.py. A custom fork of mkinit manages
these files (PR to main mkinit repo pending)
This, along with moving objects from snakebids.types into
snakebids.utils.containers, should clean up the code and solve some
of the circular import problems
Versioned functions are created on the fly upon import. Correspondance
between the bids_functions and the available specs is kept by the
update_bids.py script and corresponding poe task. The `latest` spec
stays automatically updated to the latest of the specs.
Was previously with respect to the number of directories, not the number
of entities
Split out quality tasks, make use of sequence tasks. `update_params`
now automatically calls `mkinit`, and `mkinit` automatically calls
`isort`
Incorporates all of the bids 1.9.0 entities, as well as from, to, and
any other entities currently configured in pybids. Long and short names
are used for all relevant entities. Unknown entities are placed just
before description
Reorganize table of contents, simplifying overall structure into "User
Guide" and "API". Main API page is converted from a flat structure to a
multi-page layout, with one page for each major category (path building,
dataset generation, bids apps, dataset functions, and dataset types).

Path documentation features a doc section for each spec, and one generic
doc for the bids function
While explicitly versioned functions are convenient in the snakemake
context (where we typically directly import the bids function), it's not
a scalable solution. For instance, it will not be able to configure
potential path building methods on `BidsComponent`.

Therefore, convert from versioned functions to a global configuration
system. The `set_bids_spec` function will globally set the bids spec
across the runtime context. Any function or method can then read it
using `get_bids_func`.

bids_func template is no longer needed now that versioned bids funcs are
removed
Switch from versioned func to configuration documentation. Update bids
function page
Take a little more care in exposing API to make future updating easier
Warn if `use_subject_dir` or `use_session_dir` is specified, these will
be eliminated in 1.0

Warn if custom entitites are used in a spec when no spec is explicitly
identified
This simplifies some of the building logic by eliminating the special
v0_0_0 case. Use of these arguments is deprecated across the board and
will be eliminated in 1.0, including for the v0_0_0 spec
Check for interactive sessions using sys.ps1 rather than
__main__.__file__, as the latter was not working on github CI
Ensure running `poe update-bids` doesn't change any files
@pvandyken pvandyken force-pushed the feat/bids/versioned branch from ea55e1a to 5210c86 Compare February 1, 2024 15:52
@pvandyken pvandyken requested a review from kaitj February 1, 2024 15:53
@pvandyken
Copy link
Contributor Author

@kaitj, I expanded on the discussion of bids specs in the documentation. It's definitely not perfect: I don't discuss custom specs, and it's really too much prose now to be in the API section. I'd like to leave addressing of these imperfections for another time though, as mentioned above, the docs will likely be evolving quite a bit moving forward anyway.

@pvandyken
Copy link
Contributor Author

Okay, there's two live bugs in the main branch, the fixes for which are split between this PR and #365. So I'm going to push one of them in with failing checks so that I don't have to make the fixes twice.

@pvandyken pvandyken merged commit 7c23557 into khanlab:main Feb 1, 2024
29 of 34 checks passed
@pvandyken pvandyken deleted the feat/bids/versioned branch February 1, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants