-
Notifications
You must be signed in to change notification settings - Fork 35
astartes
#120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @JacksonBurns, After checking with the editorial board, I'm glad to confirm that your package is considered in scope with a pyOpenSci submission. In the meanwhile I have a first question for you, if you don't mind. |
@cmarmo thanks for the quick response and intro! Thank you for sharing the update about scope and reviewers. We will add a section to the documentation about applications to chemical engineering problems in particular - thank you for this suggestion. |
Hello @JacksonBurns, Below are the basic checks that your package needs to pass to begin our review. Please check our Python packaging guide for more information on the elements below.
Editor commentsThe package is already in excellent shape, congratulations and thanks to all the co-authors for your hard work!
I think this package may be relevant for the scikit-learn user community, perhaps you might be interested in linking I'm going to look for the two reviewers now (good news, I already found one! :) ). |
Hi @cmarmo thanks for the speedy work!
I have just turned on the Zenodo automatic packaging releases (which I also just learned was a thing) so hopefully this doesn't fall behind again! I have also uploaded the current version (1.1.1) to Zenodo manually.
Thank you for the kind words!
The reference for the Kennard-Stone algorithm is in the Implemented Sampling Algorithms table but it is not at all obvious what the links in the As far as accessible articles, we can also try and find open-access versions of those which are pay-walled.
👍 will do.
We can definitely do that. We will perhaps try to migrate these to target Google Colab.
👍 sounds good!
👍 will do.
Didn't know this existed, thanks for sharing! We will do that.
🎉 |
@cmarmo quick question - we are getting these suggested changes made now (JacksonBurns/astartes#147). Since we submitted |
Would you be ok with making a quick release of the changes before the review starts and update the description? As far as the version does not change during the review process I think this is not an issue. Thanks for addressing the comments already! |
That sounds - we will get the changes in and make a release for the reviewers (and update the description) asap. Thanks! |
Hi @cmarmo we have completed the edits mentioned above and pushed a new version (v1.1.2) and updated the description for this PR. Thanks again! |
Hi @JacksonBurns , thank you for your new release: I've still spotted some small issues in the documentation, but I think it's better to move forward with the review. |
Hello @JacksonBurns , I'm finally back! Thank you for your patience! @BerylKanali, @du-phan welcome! 👋 Please take some time to introduce yourself here before starting the review. Please fill out our pre-review surveyBefore beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.
The following resources will help you complete your review:
Note that we ask reviewers to complete reviews in three weeks, the review for Thanks again for your commitment! |
Hi team, |
Hi everyone, My name is Beryl Kanali, I am a Data Scientist and Open Source Advocate. I am also currently a graduate student. I like contributing to open source and I am happy to be a reviewer for @cmarmo I have completed the pre-review survey. |
Thanks @du-phan and @BerylKanali for volunteering your time for this review, we appreciate it! Continued thanks to @cmarmo as well for orchestrating the review process. We look forward to seeing your reviews. p.s. @du-phan I hear about Dataiku all the time - you are one of the key sponsors to my hometown National Public Radio Station WVXU! |
Hi team, |
Hi Jackson, Sorry that my review takes a while, august is always a slow month. Thank you for your contribution! It is a nice package. Package Review
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
For packages also submitting to JOSS
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted. The package contains a
Final approval (post-review)
Estimated hours spent reviewing: 4-5 Review Comments:OverviewThe Usage
Documentation
|
Hi @du-phan thank you for the review!
We have clarified this in the README, and in doing so referenced the demonstration notebook which you mentioned was useful as an explainer.
👍 added details to the docs!
We have added the ability to pass custom metrics to the function, and have ideas about adding a more generic
👍 added! The changes I have described here are in this draft pull request on Thanks again! |
Hi Jackson, Thank you for your patience, I am now settled and here is my review. Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
For packages also submitting to JOSS
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted. The package contains a
Final approval (post-review)
Estimated hours spent reviewing: 4hours 40 minutes Review CommentsAs a person who has used sklearn so many times, I am impressed by the Documentation
Functionality Usability
|
@cmarmo, I have split the review responses into two small, easier to review pull requests. All updates to the documentation are in JacksonBurns/astartes#154 (statement of need, example updates, etc.) and fixes to the source code (single-source package version, remove pointless statements, etc.) are in JacksonBurns/astartes#155. I hope this is better - I can subdivide further if it would be helpful. |
Thank you @JacksonBurns , this is already more reviewer friendly... :) |
Hello @du-phan and @BerylKanali , do you mind letting us know if Jackson addressed all your concerns in JacksonBurns/astartes#154 and JacksonBurns/astartes#155? |
Hi @cmarmo and @JacksonBurns thanks for the reminder. Code I have updated the checklist above. Good job and congratulations on a job well done! |
Thank you @BerylKanali! @cmarmo I will wait for approval from @du-phan before merging any changes! |
Hi @JacksonBurns, The new document and code update are in great shape, thank you for your time and contribution! I have updated the check list in my initial comment! |
Sure! go ahead and congratulations! |
This pull request contains updates to the documentation as part of the PyOpenSci review (see pyOpenSci/software-submission#120). Each commit message contains additional clarifying details.
This pull request contains updates to the source code as part of the PyOpenSci review (see pyOpenSci/software-submission#120). Each commit message contains additional clarifying details.
Dear @JacksonBurns, 🎉 The actions needed to finalize this review are detailed below. Author Wrap Up TasksThere are a few things left to do to wrap up this submission:
It looks like you would like to submit this package to JOSS. Here are the next steps:
Editor Final ChecksPlease complete the final steps to wrap up this review. Editor, please do the following:
If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide. |
🎉! @kspieks, @himaghna, and I are filling out the survey!
Already done 👍
Will do!
Will do!
👍 I will keep this issue up-to-date as the reviews progress. Thanks again to @cmarmo, @BerylKanali, and @du-phan for volunteering your time! |
@JacksonBurns congratulations! astartes has been accepted on JOSS! @BerylKanali and @du-phan thank you once more for your work as reviewers: if you have some time to fill the pos-review-survey this will help us a lot. I am going to close the issue now. |
Congratulations! @JacksonBurns @cmarmo I have responded to the post-review survey. |
Submitting Author: (@JacksonBurns)

All current maintainers: (@kspieks, @himaghna)
Package Name:
astartes
One-Line Description of Package: Better Data Splits for Machine Learning
Repository Link: https://github.com/JacksonBurns/astartes
Version submitted: v1.1.2
Editor: @cmarmo
Reviewer 1: @BerylKanali
Reviewer 2: @du-phan
Archive:
Version accepted: v1.1.3
JOSS DOI:
Date accepted (month/day/year): 10/15/2023
Code of Conduct & Commitment to Maintain Package
Description
note: this is a selection from the abstract of the JOSS paper
Machine Learning (ML) has become an increasingly popular tool to accelerate traditional workflows. Critical to the use of ML is the process of splitting datasets into training, validation, and testing subsets that are used to develop and evaluate models. Common practice in the literature is to assign these subsets randomly. Although this approach is fast and efficient, it only measures a model's capacity to interpolate. Testing errors from random splits may be overly optimistic if given new data that is dissimilar to the scope of the training set; thus, there is a growing need to easily measure performance for extrapolation tasks. To address this issue, we report astartes, an open-source Python package that implements many similarity- and distance-based algorithms to partition data into more challenging splits. Separate from astartes, users can then use these splits to better assess out-of-sample performance with any ML model of choice.
Scope
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
Community Partnerships
If your package is associated with an
existing community please check below:
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
The target audience is data scientists, machine learning scientists, and domain scientists using machine learning. The applications of
astartes
include rigorous ML model validation, automated featurization of chemical data (with flexibility to add others, and instructions for doing so), and reproducibility.We position
astartes
as a replacement toscikit-learn
's providestrain_test_split
function, but with greater flexibility for sampling algorithms, and availability oftrain_val_test_split
for more rigorous validation.@tag
the editor you contacted:N/A
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.on a separate
joss-paper
branchNote: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.
Footnotes
Please fill out a pre-submission inquiry before submitting a data visualization package. ↩
The text was updated successfully, but these errors were encountered: