-
Notifications
You must be signed in to change notification settings - Fork 43
v1.69.0 #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.69.0 #154
Conversation
mickgmdb
commented
Dec 5, 2025
- Reduced per-match memory usage by compacting stored source locations and interning repeated capture names.
- Stored optional validation response bodies as boxed strings to avoid allocating empty payloads and to streamline validator caches.
- Parallelized git cloning based on the configured job count and begin scanning repositories as soon as each clone finishes to reduce end-to-end scan times.
- Combined per-repository results into a single aggregate summary after scans complete.
- Added initial access-map support and report viewer html file. Currently beta features.
…s and interning repeated capture names. - Stored optional validation response bodies as boxed strings to avoid allocating empty payloads and to streamline validator caches. - Parallelized git cloning based on the configured job count and begin scanning repositories as soon as each clone finishes to reduce end-to-end scan times. - Combined per-repository results into a single aggregate summary after scans complete. - Added initial access-map support and report viewer html file. Currently beta features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces version 1.69.0 with significant performance improvements and a new access-map feature for cloud credential analysis.
Key Changes:
- Memory optimization through compact source location storage and interned capture names
- Parallel git repository cloning with streaming scan capability
- Beta access-map feature for mapping cloud credentials to identities and permissions
Reviewed changes
Copilot reviewed 56 out of 58 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Version bump to 1.69.0 and AWS SDK dependencies added |
| src/location.rs | Introduced CompactSourceSpan for memory-efficient source tracking |
| src/matcher.rs | Updated to use interned capture names and compact locations |
| src/validation_body.rs | New module for optimized validation response storage |
| src/access_map.rs | New access-map feature implementation |
| src/scanner/runner.rs | Parallel repository scanning with streaming clones |
| src/reporter.rs | Access-map integration into reporting |
| tests/* | Test updates for new access_map fields |
Comments suppressed due to low confidence (1)
src/access_map/azure copy.rs:1
- This file appears to be a duplicate with ' copy' in its name. This should either be renamed properly or removed if it's an accidental duplicate of azure.rs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Makefile
Outdated
| darwin-dev: | ||
| # @echo "Checking Rust for darwin-arm64..." | ||
| # @$(MAKE) check-rust || ( \ | ||
| # echo "Rust not found or out-of-date. Installing via Homebrew..." && \ | ||
| # brew install rust \ | ||
| # ) | ||
| # @brew list cmake >/dev/null 2>&1 || brew install cmake | ||
| # @brew list boost >/dev/null 2>&1 || brew install boost | ||
| # @brew install gcc libpcap pkg-config ragel sqlite coreutils gnu-tar | ||
| # @rustup target add aarch64-apple-darwin | ||
| cargo build --profile=dev --target aarch64-apple-darwin --features system-alloc | ||
| # @cd target/aarch64-apple-darwin/release && \ | ||
| # find ./$(PROJECT_NAME) -type f -not -name "*.d" -not -name "*.rlib" -exec shasum -a 256 {} \; > CHECKSUM.txt | ||
| # @mkdir -p target/release | ||
| # @cp target/aarch64-apple-darwin/release/$(PROJECT_NAME) target/release/ | ||
| # @cp target/aarch64-apple-darwin/release/CHECKSUM.txt target/release/CHECKSUM-darwin-arm64.txt | ||
| # @cd target/release && \ | ||
| # rm -rf $(PROJECT_NAME)-darwin-arm64.tgz && \ | ||
| # $(ARCHIVE_CMD) $(PROJECT_NAME)-darwin-arm64.tgz $(PROJECT_NAME) CHECKSUM-darwin-arm64.txt && \ | ||
| # if [ -f $(PROJECT_NAME)-darwin-arm64.tgz ]; then \ | ||
| # shasum -a 256 $(PROJECT_NAME)-darwin-arm64.tgz >> CHECKSUM-darwin-arm64.txt; \ | ||
| # fi | ||
| # $(MAKE) list-archives |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large blocks of commented-out code reduce maintainability. Either remove this commented code or document why it's being preserved for future use.
| [profile.dev] | ||
| opt-level = 0 | ||
| # debug = true | ||
| debug = true |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Debug symbols are enabled for dev profile. This is fine for development, but ensure this was intentional and not accidentally left enabled.
README.md
Outdated
| ## Identity mapping for cloud credentials | ||
|
|
||
| Use the `identity-map` command to understand the blast radius of cloud credentials by resolving the owning identity, attached roles (including inherited org/folder bindings), and risky permissions. The command prints a JSON summary to stdout by default and can optionally emit a standalone HTML report. | ||
|
|
||
| ```bash | ||
| # Map AWS credentials using your default CLI environment (env vars, config files), | ||
| # write JSON to disk, and emit an interactive HTML report | ||
| kingfisher identity-map aws \ | ||
| --json-out identity-map.json \ | ||
| --html-out identity-map.html | ||
|
|
||
| # Map a GCP service account key and save JSON + HTML to disk | ||
| kingfisher identity-map gcp path/to/key.json \ | ||
| --json-out identity-map.json \ | ||
| --html-out identity-map.html |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation refers to 'identity-map' command but the code implementation uses 'access-map' (see src/cli/commands/access_map.rs and src/cli/global.rs line 66). Update documentation to use 'access-map' for consistency.
| ## Identity mapping for cloud credentials | |
| Use the `identity-map` command to understand the blast radius of cloud credentials by resolving the owning identity, attached roles (including inherited org/folder bindings), and risky permissions. The command prints a JSON summary to stdout by default and can optionally emit a standalone HTML report. | |
| ```bash | |
| # Map AWS credentials using your default CLI environment (env vars, config files), | |
| # write JSON to disk, and emit an interactive HTML report | |
| kingfisher identity-map aws \ | |
| --json-out identity-map.json \ | |
| --html-out identity-map.html | |
| # Map a GCP service account key and save JSON + HTML to disk | |
| kingfisher identity-map gcp path/to/key.json \ | |
| --json-out identity-map.json \ | |
| --html-out identity-map.html | |
| ## Access mapping for cloud credentials | |
| Use the `access-map` command to understand the blast radius of cloud credentials by resolving the owning identity, attached roles (including inherited org/folder bindings), and risky permissions. The command prints a JSON summary to stdout by default and can optionally emit a standalone HTML report. | |
| ```bash | |
| # Map AWS credentials using your default CLI environment (env vars, config files), | |
| # write JSON to disk, and emit an interactive HTML report | |
| kingfisher access-map aws \ | |
| --json-out access-map.json \ | |
| --html-out access-map.html | |
| # Map a GCP service account key and save JSON + HTML to disk | |
| kingfisher access-map gcp path/to/key.json \ | |
| --json-out access-map.json \ | |
| --html-out access-map.html |
|
|
||
| pub async fn map_access(credential_path: Option<&Path>) -> Result<AccessMapResult> { | ||
| let path = credential_path.ok_or_else(|| anyhow!("GCP access-map requires a key.json path"))?; | ||
| let data = std::fs::read_to_string(path).context("Failed to read credential file")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semgrep identified an issue in your code:
The application builds a file path from potentially untrusted data, which can lead to a path traversal vulnerability. An attacker can manipulate the path which the application uses to access files. If the application does not validate user input and sanitize file paths, sensitive files such as configuration or user data can be accessed, potentially creating or overwriting files. To prevent this vulnerability, validate and sanitize any input that is used to create references to file paths. Also, enforce strict file access controls. For example, choose privileges allowing public-facing applications to access only the required files.
Dataflow graph
flowchart LR
classDef invis fill:white, stroke: none
classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none
subgraph File0["<b>src/access_map/gcp.rs</b>"]
direction LR
%% Source
subgraph Source
direction LR
v0["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/gcp.rs#L32 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 32] credential_path</a>"]
end
%% Intermediate
subgraph Traces0[Traces]
direction TB
v2["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/gcp.rs#L32 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 32] path</a>"]
end
%% Sink
subgraph Sink
direction LR
v1["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/gcp.rs#L33 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 33] path</a>"]
end
end
%% Class Assignment
Source:::invis
Sink:::invis
Traces0:::invis
File0:::invis
%% Connections
Source --> Traces0
Traces0 --> Sink
To resolve this comment:
✨ Commit Assistant Fix Suggestion
-
Validate that the
credential_pathis within an expected directory, such as a specific credentials folder. Usestd::path::Pathmethods to check thatcredential_pathdoes not contain any parent directory traversal components, such as... -
Reject or sanitize any path input that is not a direct file name within the allowed directory. For example, you can canonicalize the path and verify that it starts with the base directory by using
canonicalizeandstarts_with. -
If you expect only a fixed filename, ensure the path exactly matches it instead of accepting arbitrary input.
-
Update your function to enforce this, for example:
let base_dir = Path::new("/safe/credentials/directory");
let path = credential_path.ok_or_else(|| anyhow!("GCP access-map requires a key.json path"))?;
let canonical = path.canonicalize()?;
if !canonical.starts_with(base_dir) { return Err(anyhow!("Access to this path is not allowed")); }
let data = std::fs::read_to_string(canonical).context("Failed to read credential file")?;
Alternatively, if credential files are only named key.json, ignore user input and always construct the path as let path = base_dir.join("key.json");.
Checking and sanitizing file paths prevents attackers from supplying a path like ../../../../etc/passwd and accessing sensitive files.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by tainted-path.
Help? Slack #semgrep-help or go/semgrep-help.
Resolution Options:
- Fix the code
- Reply
/fp $reason(if security gap doesn’t exist) - Reply
/ar $reason(if gap is valid but intentional; add mitigations/monitoring) - Reply
/other $reason(e.g., test-only)
You can view more details about this finding in the Semgrep AppSec Platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/fp this path is constructed from credentials discovered and validated by kingfisher, which ensures that it's the file it is intended to be.
| } | ||
|
|
||
| fn load_credentials_from_file(path: &Path) -> Result<Credentials> { | ||
| let raw = std::fs::read_to_string(path).context("Failed to read AWS credential file")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semgrep identified an issue in your code:
The application builds a file path from potentially untrusted data, which can lead to a path traversal vulnerability. An attacker can manipulate the path which the application uses to access files. If the application does not validate user input and sanitize file paths, sensitive files such as configuration or user data can be accessed, potentially creating or overwriting files. To prevent this vulnerability, validate and sanitize any input that is used to create references to file paths. Also, enforce strict file access controls. For example, choose privileges allowing public-facing applications to access only the required files.
Dataflow graph
flowchart LR
classDef invis fill:white, stroke: none
classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none
subgraph File0["<b>src/access_map/aws.rs</b>"]
direction LR
%% Source
subgraph Source
direction LR
v0["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/aws.rs#L693 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 693] path</a>"]
end
%% Intermediate
%% Sink
subgraph Sink
direction LR
v1["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/aws.rs#L693 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 693] path</a>"]
end
end
%% Class Assignment
Source:::invis
Sink:::invis
File0:::invis
%% Connections
Source --> Sink
To resolve this comment:
✨ Commit Assistant Fix Suggestion
- Only allow file access inside a specific directory, such as a dedicated credentials folder. Define a base directory, for example,
let base = Path::new("/some/safe/dir");. - Before reading the file, join the provided
pathargument to the base directory:let combined = base.join(path);. - Canonicalize the resulting path:
let canonical = combined.canonicalize()?;. - Check that the canonical path starts with the base directory:
if !canonical.starts_with(base) { return Err(anyhow!("Invalid file path")); }. - Use
canonicalfor opening/reading files instead of the originalpath.
Alternatively, if the file name is provided by the user,
validate that it only contains allowed characters (like alphanumeric and underscores) and does not contain .., /, or \. For example, use a regular expression to allow only safe patterns.
This protects against path traversal where a malicious input like "../../etc/passwd" could access sensitive files outside the intended directory.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by tainted-path.
Help? Slack #semgrep-help or go/semgrep-help.
Resolution Options:
- Fix the code
- Reply
/fp $reason(if security gap doesn’t exist) - Reply
/ar $reason(if gap is valid but intentional; add mitigations/monitoring) - Reply
/other $reason(e.g., test-only)
You can view more details about this finding in the Semgrep AppSec Platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/fp this path is constructed from credentials discovered and validated by kingfisher, which ensures that it's the file it is intended to be.
…s and interning repeated capture names. - Stored optional validation response bodies as boxed strings to avoid allocating empty payloads and to streamline validator caches. - Parallelized git cloning based on the configured job count and begin scanning repositories as soon as each clone finishes to reduce end-to-end scan times. - Combined per-repository results into a single aggregate summary after scans complete. - Added initial access-map support and report viewer html file. Currently beta features.