-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Rust: regenerate MaD files using DCA #19674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the bulk model generator to use the DCA strategy for regenerating MaD files, switches configuration from JSON to YAML, and enhances parallelism and cleanup in the Python script.
- Migrate bulk generation config files from JSON to a terser YAML format.
- Refactor
bulk_generate_mad.py
to add a genericrun_in_parallel
helper for cloning and downloading in parallel, with cleanup of old artifacts. - Regenerate all Rust QL test expected files based on the new DCA outputs.
Reviewed Changes
Copilot reviewed 70 out of 70 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
misc/scripts/models-as-data/bulk_generate_mad.py | Add run_in_parallel , parallel DCA downloads, YAML parsing, cleanup logic |
rust/misc/bulk_generation_targets.yml | New YAML config replacing JSON targets for Rust bulk generation |
cpp/bulk_generation_targets.yml | New YAML config replacing JSON targets for C++ bulk generation |
Various .expected files under rust/ql/test |
Regeneration of QL test expectations to reflect new DCA outputs |
Comments suppressed due to low confidence (1)
misc/scripts/models-as-data/bulk_generate_mad.py:115
- The generic type parameters T and U are used in the function signature but not defined; add TypeVar definitions such as
T = TypeVar('T')
andU = TypeVar('U')
before their use.
def run_in_parallel[T, U](
|
||
project_dirs = [project_dirs_map[project["name"]] for project in projects] | ||
|
||
dirs = run_in_parallel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Exiting from within a utility function (via sys.exit in on_error handlers) can make the logic harder to test or reuse; consider returning errors and handling exit at the top level instead.
dirs = run_in_parallel( | |
failed = run_in_parallel( |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Some comments / discussion, one test annotation needs fixing.
- name: rocket | ||
- name: actix-web | ||
- name: hyper | ||
- name: clap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice simple list, once everything is merged and stable I'll add a bunch more targets to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing to keep in mind is that at the moment this list needs to be topologically ordered with respect to dependencies (so later additions should depend on earlier ones and not the other way around). Possibly worth a comment here, now that this is yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, just so you know, you can tweak what gets generated with any of
with-sinks: false
with-sources: false
with-summaries: false
(all are true by default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the expected use cases for those three options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know, but you can ask Mathias once he's back from his PTO, two of them are used for the C++ generated models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is there are certain libraries that produce a lot of inaccurate models of one type but not the others, and this gives us some additional control. @MathiasVP ? (no rush, not blocking this PR)
rust/ql/test/query-tests/security/CWE-770/UncontrolledAllocationSize.expected
Outdated
Show resolved
Hide resolved
Also fix some minor things in `bulk_generate_mad.py`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great!
A few comments and I think you need to run black
again as there's a few formatting changes from that.
@paldepind Can I get a reapproval after resolving a merge conflict? 🙌 |
Uh oh!
There was an error while loading. Please reload this page.