Rust: regenerate MaD files using DCA #19674

redsun82 · 2025-06-05T08:14:24Z

rust autogenerated models now use the DCA strategy
models were regenerated from a recent DCA run
the bulk model generator got some changes:
- the configuration files are now in YAML format, which is terser and more consistent with how we generally configure stuff
- running the DCA strategy the generator will now take the last DB artifact for each project, which makes it compatible to run against comparing DCA runs
- downloads from DCA are now run in parallel (up to a maximum of 8 workers), which scales much better with respect to the number of sources
- the bulk generator cleans up extracted DB locations, which makes it rerunnable without any manual cleanup
- the generator can now by run directly on POSIX, without needing an explicit python invocation

Copilot

Pull Request Overview

This PR updates the bulk model generator to use the DCA strategy for regenerating MaD files, switches configuration from JSON to YAML, and enhances parallelism and cleanup in the Python script.

Migrate bulk generation config files from JSON to a terser YAML format.
Refactor bulk_generate_mad.py to add a generic run_in_parallel helper for cloning and downloading in parallel, with cleanup of old artifacts.
Regenerate all Rust QL test expected files based on the new DCA outputs.

Reviewed Changes

Copilot reviewed 70 out of 70 changed files in this pull request and generated 1 comment.

File	Description
misc/scripts/models-as-data/bulk_generate_mad.py	Add `run_in_parallel`, parallel DCA downloads, YAML parsing, cleanup logic
rust/misc/bulk_generation_targets.yml	New YAML config replacing JSON targets for Rust bulk generation
cpp/bulk_generation_targets.yml	New YAML config replacing JSON targets for C++ bulk generation
Various `.expected` files under `rust/ql/test`	Regeneration of QL test expectations to reflect new DCA outputs

Comments suppressed due to low confidence (1)

misc/scripts/models-as-data/bulk_generate_mad.py:115

The generic type parameters T and U are used in the function signature but not defined; add TypeVar definitions such as T = TypeVar('T') and U = TypeVar('U') before their use.

def run_in_parallel[T, U](

Copilot · 2025-06-05T10:56:45Z

misc/scripts/models-as-data/bulk_generate_mad.py

-
-    project_dirs = [project_dirs_map[project["name"]] for project in projects]
-
+    dirs = run_in_parallel(


[nitpick] Exiting from within a utility function (via sys.exit in on_error handlers) can make the logic harder to test or reuse; consider returning errors and handling exit at the top level instead.

Suggested change

dirs = run_in_parallel(

failed = run_in_parallel(

geoffw0

Looks great! Some comments / discussion, one test annotation needs fixing.

misc/scripts/models-as-data/bulk_generate_mad.py

geoffw0 · 2025-06-05T11:24:06Z

rust/bulk_generation_targets.yml

+- name: rocket
+- name: actix-web
+- name: hyper
+- name: clap


This is a nice simple list, once everything is merged and stable I'll add a bunch more targets to it.

one thing to keep in mind is that at the moment this list needs to be topologically ordered with respect to dependencies (so later additions should depend on earlier ones and not the other way around). Possibly worth a comment here, now that this is yaml

also, just so you know, you can tweak what gets generated with any of

with-sinks: false with-sources: false with-summaries: false

(all are true by default)

What are the expected use cases for those three options?

I don't really know, but you can ask Mathias once he's back from his PTO, two of them are used for the C++ generated models

My guess is there are certain libraries that produce a lot of inaccurate models of one type but not the others, and this gives us some additional control. @MathiasVP ? (no rush, not blocking this PR)

The reason is simply that C++ doesn't yet autogenerate sources and sinks (for a couple of reasons, but mainly because I didn't bother to set that properly up yet). The MaD generator script (which this script invokes under the hood) already provides these hooks to configure which kinds of models are generated, so I just lifted those hooks to this script in #19627

rust/ql/test/query-tests/security/CWE-770/UncontrolledAllocationSize.expected

misc/scripts/models-as-data/bulk_generate_mad.py

Also fix some minor things in `bulk_generate_mad.py`.

paldepind

Looks really great!

A few comments and I think you need to run black again as there's a few formatting changes from that.

misc/scripts/models-as-data/bulk_generate_mad.py

rust/ql/test/query-tests/security/CWE-770/main.rs

Co-authored-by: Geoffrey White <[email protected]>

redsun82 · 2025-06-10T11:40:59Z

@paldepind Can I get a reapproval after resolving a merge conflict? 🙌

Paolo Tranquilli added 10 commits June 5, 2025 08:37

Bulk model generator: switch from json to yml configuration files

31d1604

MaD generator: only pick up last database on comparison DCAs

900a3b0

MaD generator: reformat

d5c16d6

MaD generator: make bulk generator executable

31954fa

MaD generator: move bulk generation config files one directory up

fbd5058

MaD: make bulk generator DCA strategy download DBs in parallel

4f47ee2

MaD: make bulk generator cleanup downloaded DBs

ee7eb86

MaD generator: some final minor tweaks

530b990

Rust: switch to DCA strategy for MaD bulk generation

f4bbef9

Rust: regenerate MaD models

ec77eb3

github-actions bot added C++ Rust Pull requests that update Rust code labels Jun 5, 2025

Rust: accept test changes

6162cf5

redsun82 mentioned this pull request Jun 5, 2025

Rust: Use QL computed canonical paths in MaD Field tokens #19667

Merged

redsun82 marked this pull request as ready for review June 5, 2025 10:54

Copilot AI review requested due to automatic review settings June 5, 2025 10:54

redsun82 requested review from a team as code owners June 5, 2025 10:54

Copilot AI reviewed Jun 5, 2025

View reviewed changes

geoffw0 reviewed Jun 5, 2025

View reviewed changes

Rust: address review

e1eb1f6

Also fix some minor things in `bulk_generate_mad.py`.

redsun82 requested a review from paldepind June 6, 2025 08:17

paldepind requested changes Jun 6, 2025

View reviewed changes

misc/scripts/models-as-data/bulk_generate_mad.py Outdated Show resolved Hide resolved

misc/scripts/models-as-data/bulk_generate_mad.py Outdated Show resolved Hide resolved

misc/scripts/models-as-data/bulk_generate_mad.py Outdated Show resolved Hide resolved

MaD generator: use decompress terminology instead of extract

d6d13b9

geoffw0 reviewed Jun 9, 2025

View reviewed changes

rust/ql/test/query-tests/security/CWE-770/main.rs Outdated Show resolved Hide resolved

redsun82 and others added 3 commits June 10, 2025 10:52

Update rust/ql/test/query-tests/security/CWE-770/main.rs

e6056f9

Co-authored-by: Geoffrey White <[email protected]>

Merge branch 'main' into redsun82/mad

bcfc009

MaD generator: run formatter

ecc35e5

redsun82 requested a review from paldepind June 10, 2025 10:11

MaD generator: address review

ca99add

paldepind previously approved these changes Jun 10, 2025

View reviewed changes

Merge branch 'main' into redsun82/mad

0d03699

redsun82 dismissed paldepind’s stale review via 0d03699 June 10, 2025 11:39

redsun82 requested a review from paldepind June 10, 2025 11:40

Rust: accept test changes

4ac4e44

paldepind approved these changes Jun 11, 2025

View reviewed changes

redsun82 merged commit fbcd9ea into main Jun 11, 2025
30 of 35 checks passed

redsun82 deleted the redsun82/mad branch June 11, 2025 09:10

MathiasVP mentioned this pull request Jun 13, 2025

C++: Add more MaD summaries #19753

Merged


		project_dirs = [project_dirs_map[project["name"]] for project in projects]

		dirs = run_in_parallel(

Rust: regenerate MaD files using DCA #19674

Rust: regenerate MaD files using DCA #19674

Uh oh!

Conversation

redsun82 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

geoffw0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

geoffw0 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

redsun82 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

redsun82 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

geoffw0 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

redsun82 Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

geoffw0 Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

MathiasVP Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paldepind left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

redsun82 commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

redsun82 commented Jun 5, 2025 •

edited

Loading