Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Summarize Index Patch #302

Merged
merged 16 commits into from
Feb 27, 2024
Merged

BUG: Summarize Index Patch #302

merged 16 commits into from
Feb 27, 2024

Conversation

Clockwork-Rat
Copy link
Contributor

Fixes issue with sample IDs being lost to default indexes.

@Clockwork-Rat
Copy link
Contributor Author

@Oddant1 here is the patch

@lizgehret lizgehret changed the title Summarize Index Patch BUG: Summarize Index Patch Feb 22, 2024
@gregcaporaso
Copy link
Member

This issue is described in #303.

@lizgehret lizgehret linked an issue Feb 22, 2024 that may be closed by this pull request
@Oddant1
Copy link
Member

Oddant1 commented Feb 22, 2024

@Clockwork-Rat I tested it locally and it looks like it works. We should be able to add a unit test using this pandas method that reads tables from html into dataframes.

Not sure if this will be useful to you, but I used that method here to create a helper to assert that all tables in a given html contained no NaNs.

@lizgehret
Copy link
Member

Okay so just for posterity (in case this happens after I'm OOO) - once a test is added here and this is merged into dev, the merge commit will need to be cherry picked onto the Release-2024.2 branch.

cc @ebolyen

@ebolyen
Copy link
Member

ebolyen commented Feb 27, 2024

Hey @Clockwork-Rat and @Oddant1, can we get the tests added soon? Worst case we could put the tests in the dev branch and just cherry-pick this one commit.

text = fi.read()
tbl_rx = re.compile(
r'\{*[(S3)(S2)(S3)]*[(S3)(S2)(S3)]*[(S3)(S2)(S3)]*\}')
self.assertTrue(tbl_rx.search(text))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Clockwork-Rat, this approach feels a little too complicated.

It appears that the regex is redundant as it's the same capture-group group with an optional number of repeats, repeated 3 times. This makes me feel like I'm missing something about how the regex works, which also makes it less clear what the assertion is actually verifying.

(I assume regex.search is truthy when there is any match at all, but then why the very specific regex above?)


What we probably want to do is find the <script> tag with the right type="application/json". That won't be a fun thing to do, but we can probably tolerate a regex in this case. Once you have the contents, however that came to be, you ought to be able to load it into a pandas dataframe with some kind of from_json method which will make it super easy to test and work with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oddant1, @ebolyen updated, I used the built in json.loads() function because the pd.read_json() function did not work, but the loads() function worked fine. If necessary I could pass the results into a DataFrame constructor which should work fine.

Copy link
Member

@ebolyen ebolyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @Clockwork-Rat!

@ebolyen ebolyen merged commit 9c79fab into qiime2:dev Feb 27, 2024
4 checks passed
ebolyen pushed a commit that referenced this pull request Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

sample ids not displayed in summarized feature table
5 participants