-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Summarize Index Patch #302
Conversation
@Oddant1 here is the patch |
This issue is described in #303. |
@Clockwork-Rat I tested it locally and it looks like it works. We should be able to add a unit test using this pandas method that reads tables from html into dataframes. Not sure if this will be useful to you, but I used that method here to create a helper to assert that all tables in a given html contained no NaNs. |
Okay so just for posterity (in case this happens after I'm OOO) - once a test is added here and this is merged into dev, the merge commit will need to be cherry picked onto the Release-2024.2 branch. cc @ebolyen |
Hey @Clockwork-Rat and @Oddant1, can we get the tests added soon? Worst case we could put the tests in the dev branch and just cherry-pick this one commit. |
text = fi.read() | ||
tbl_rx = re.compile( | ||
r'\{*[(S3)(S2)(S3)]*[(S3)(S2)(S3)]*[(S3)(S2)(S3)]*\}') | ||
self.assertTrue(tbl_rx.search(text)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Clockwork-Rat, this approach feels a little too complicated.
It appears that the regex is redundant as it's the same capture-group group with an optional number of repeats, repeated 3 times. This makes me feel like I'm missing something about how the regex works, which also makes it less clear what the assertion is actually verifying.
(I assume regex.search is truthy when there is any match at all, but then why the very specific regex above?)
What we probably want to do is find the <script>
tag with the right type="application/json"
. That won't be a fun thing to do, but we can probably tolerate a regex in this case. Once you have the contents, however that came to be, you ought to be able to load it into a pandas dataframe with some kind of from_json
method which will make it super easy to test and work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @Clockwork-Rat!
Fixes issue with sample IDs being lost to default indexes.