Skip to content

Rust: regenerate MaD files using DCA #19674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Rust: regenerate MaD files using DCA #19674

wants to merge 19 commits into from

Conversation

redsun82
Copy link
Contributor

@redsun82 redsun82 commented Jun 5, 2025

  • rust autogenerated models now use the DCA strategy
  • models were regenerated from a recent DCA run
  • the bulk model generator got some changes:
    • the configuration files are now in YAML format, which is terser and more consistent with how we generally configure stuff
    • running the DCA strategy the generator will now take the last DB artifact for each project, which makes it compatible to run against comparing DCA runs
    • downloads from DCA are now run in parallel (up to a maximum of 8 workers), which scales much better with respect to the number of sources
    • the bulk generator cleans up extracted DB locations, which makes it rerunnable without any manual cleanup
    • the generator can now by run directly on POSIX, without needing an explicit python invocation

@github-actions github-actions bot added C++ Rust Pull requests that update Rust code labels Jun 5, 2025
@redsun82 redsun82 marked this pull request as ready for review June 5, 2025 10:54
@Copilot Copilot AI review requested due to automatic review settings June 5, 2025 10:54
@redsun82 redsun82 requested review from a team as code owners June 5, 2025 10:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the bulk model generator to use the DCA strategy for regenerating MaD files, switches configuration from JSON to YAML, and enhances parallelism and cleanup in the Python script.

  • Migrate bulk generation config files from JSON to a terser YAML format.
  • Refactor bulk_generate_mad.py to add a generic run_in_parallel helper for cloning and downloading in parallel, with cleanup of old artifacts.
  • Regenerate all Rust QL test expected files based on the new DCA outputs.

Reviewed Changes

Copilot reviewed 70 out of 70 changed files in this pull request and generated 1 comment.

File Description
misc/scripts/models-as-data/bulk_generate_mad.py Add run_in_parallel, parallel DCA downloads, YAML parsing, cleanup logic
rust/misc/bulk_generation_targets.yml New YAML config replacing JSON targets for Rust bulk generation
cpp/bulk_generation_targets.yml New YAML config replacing JSON targets for C++ bulk generation
Various .expected files under rust/ql/test Regeneration of QL test expectations to reflect new DCA outputs
Comments suppressed due to low confidence (1)

misc/scripts/models-as-data/bulk_generate_mad.py:115

  • The generic type parameters T and U are used in the function signature but not defined; add TypeVar definitions such as T = TypeVar('T') and U = TypeVar('U') before their use.
def run_in_parallel[T, U](


project_dirs = [project_dirs_map[project["name"]] for project in projects]

dirs = run_in_parallel(
Copy link
Preview

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Exiting from within a utility function (via sys.exit in on_error handlers) can make the logic harder to test or reuse; consider returning errors and handling exit at the top level instead.

Suggested change
dirs = run_in_parallel(
failed = run_in_parallel(

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Some comments / discussion, one test annotation needs fixing.

- name: rocket
- name: actix-web
- name: hyper
- name: clap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice simple list, once everything is merged and stable I'll add a bunch more targets to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing to keep in mind is that at the moment this list needs to be topologically ordered with respect to dependencies (so later additions should depend on earlier ones and not the other way around). Possibly worth a comment here, now that this is yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, just so you know, you can tweak what gets generated with any of

with-sinks: false
with-sources: false
with-summaries: false

(all are true by default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the expected use cases for those three options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know, but you can ask Mathias once he's back from his PTO, two of them are used for the C++ generated models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is there are certain libraries that produce a lot of inaccurate models of one type but not the others, and this gives us some additional control. @MathiasVP ? (no rush, not blocking this PR)

Also fix some minor things in `bulk_generate_mad.py`.
@redsun82 redsun82 requested a review from paldepind June 6, 2025 08:17
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really great!

A few comments and I think you need to run black again as there's a few formatting changes from that.

@redsun82 redsun82 requested a review from paldepind June 10, 2025 10:11
paldepind
paldepind previously approved these changes Jun 10, 2025
@redsun82
Copy link
Contributor Author

@paldepind Can I get a reapproval after resolving a merge conflict? 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants