Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support incremental benchmarking of datasets larger than memory + final config/logic alignment #180

Draft
wants to merge 15 commits into
base: large-scale
Choose a base branch
from

Conversation

ethanglaser
Copy link
Contributor

@ethanglaser ethanglaser commented Mar 24, 2025

Description

Merging incremental logic updates here instead of main branch. This supports ability for user to run incremental algos on a dataset larger than memory by running on batches of the same dataset, required to produce submitted results.

Additionally some minor tweaks to configs and scripts to align reproducer results to report. Includes:

  • fixed logreg strong config error

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@ethanglaser ethanglaser requested a review from KateBlueSky March 24, 2025 14:35
@ethanglaser ethanglaser changed the title Support incremental benchmarking of datasets larger than memory Support incremental benchmarking of datasets larger than memory + final config/logic alignment Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants