Skip to content

Conversation

@mickgmdb
Copy link
Collaborator

@mickgmdb mickgmdb commented Dec 5, 2025

  • Reduced per-match memory usage by compacting stored source locations and interning repeated capture names.
  • Stored optional validation response bodies as boxed strings to avoid allocating empty payloads and to streamline validator caches.
  • Parallelized git cloning based on the configured job count and begin scanning repositories as soon as each clone finishes to reduce end-to-end scan times.
  • Combined per-repository results into a single aggregate summary after scans complete.
  • Added initial access-map support and report viewer html file. Currently beta features.

…s and interning repeated capture names.

- Stored optional validation response bodies as boxed strings to avoid allocating empty payloads and to streamline validator caches.
- Parallelized git cloning based on the configured job count and begin scanning repositories as soon as each clone finishes to reduce end-to-end scan times.
- Combined per-repository results into a single aggregate summary after scans complete.
- Added initial access-map support and report viewer html file. Currently beta features.
Copilot AI review requested due to automatic review settings December 5, 2025 06:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces version 1.69.0 with significant performance improvements and a new access-map feature for cloud credential analysis.

Key Changes:

  • Memory optimization through compact source location storage and interned capture names
  • Parallel git repository cloning with streaming scan capability
  • Beta access-map feature for mapping cloud credentials to identities and permissions

Reviewed changes

Copilot reviewed 56 out of 58 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Cargo.toml Version bump to 1.69.0 and AWS SDK dependencies added
src/location.rs Introduced CompactSourceSpan for memory-efficient source tracking
src/matcher.rs Updated to use interned capture names and compact locations
src/validation_body.rs New module for optimized validation response storage
src/access_map.rs New access-map feature implementation
src/scanner/runner.rs Parallel repository scanning with streaming clones
src/reporter.rs Access-map integration into reporting
tests/* Test updates for new access_map fields
Comments suppressed due to low confidence (1)

src/access_map/azure copy.rs:1

  • This file appears to be a duplicate with ' copy' in its name. This should either be renamed properly or removed if it's an accidental duplicate of azure.rs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Makefile Outdated
Comment on lines 210 to 232
darwin-dev:
# @echo "Checking Rust for darwin-arm64..."
# @$(MAKE) check-rust || ( \
# echo "Rust not found or out-of-date. Installing via Homebrew..." && \
# brew install rust \
# )
# @brew list cmake >/dev/null 2>&1 || brew install cmake
# @brew list boost >/dev/null 2>&1 || brew install boost
# @brew install gcc libpcap pkg-config ragel sqlite coreutils gnu-tar
# @rustup target add aarch64-apple-darwin
cargo build --profile=dev --target aarch64-apple-darwin --features system-alloc
# @cd target/aarch64-apple-darwin/release && \
# find ./$(PROJECT_NAME) -type f -not -name "*.d" -not -name "*.rlib" -exec shasum -a 256 {} \; > CHECKSUM.txt
# @mkdir -p target/release
# @cp target/aarch64-apple-darwin/release/$(PROJECT_NAME) target/release/
# @cp target/aarch64-apple-darwin/release/CHECKSUM.txt target/release/CHECKSUM-darwin-arm64.txt
# @cd target/release && \
# rm -rf $(PROJECT_NAME)-darwin-arm64.tgz && \
# $(ARCHIVE_CMD) $(PROJECT_NAME)-darwin-arm64.tgz $(PROJECT_NAME) CHECKSUM-darwin-arm64.txt && \
# if [ -f $(PROJECT_NAME)-darwin-arm64.tgz ]; then \
# shasum -a 256 $(PROJECT_NAME)-darwin-arm64.tgz >> CHECKSUM-darwin-arm64.txt; \
# fi
# $(MAKE) list-archives
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large blocks of commented-out code reduce maintainability. Either remove this commented code or document why it's being preserved for future use.

Copilot uses AI. Check for mistakes.
[profile.dev]
opt-level = 0
# debug = true
debug = true
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Debug symbols are enabled for dev profile. This is fine for development, but ensure this was intentional and not accidentally left enabled.

Copilot uses AI. Check for mistakes.
README.md Outdated
Comment on lines 1050 to 1064
## Identity mapping for cloud credentials

Use the `identity-map` command to understand the blast radius of cloud credentials by resolving the owning identity, attached roles (including inherited org/folder bindings), and risky permissions. The command prints a JSON summary to stdout by default and can optionally emit a standalone HTML report.

```bash
# Map AWS credentials using your default CLI environment (env vars, config files),
# write JSON to disk, and emit an interactive HTML report
kingfisher identity-map aws \
--json-out identity-map.json \
--html-out identity-map.html

# Map a GCP service account key and save JSON + HTML to disk
kingfisher identity-map gcp path/to/key.json \
--json-out identity-map.json \
--html-out identity-map.html
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation refers to 'identity-map' command but the code implementation uses 'access-map' (see src/cli/commands/access_map.rs and src/cli/global.rs line 66). Update documentation to use 'access-map' for consistency.

Suggested change
## Identity mapping for cloud credentials
Use the `identity-map` command to understand the blast radius of cloud credentials by resolving the owning identity, attached roles (including inherited org/folder bindings), and risky permissions. The command prints a JSON summary to stdout by default and can optionally emit a standalone HTML report.
```bash
# Map AWS credentials using your default CLI environment (env vars, config files),
# write JSON to disk, and emit an interactive HTML report
kingfisher identity-map aws \
--json-out identity-map.json \
--html-out identity-map.html
# Map a GCP service account key and save JSON + HTML to disk
kingfisher identity-map gcp path/to/key.json \
--json-out identity-map.json \
--html-out identity-map.html
## Access mapping for cloud credentials
Use the `access-map` command to understand the blast radius of cloud credentials by resolving the owning identity, attached roles (including inherited org/folder bindings), and risky permissions. The command prints a JSON summary to stdout by default and can optionally emit a standalone HTML report.
```bash
# Map AWS credentials using your default CLI environment (env vars, config files),
# write JSON to disk, and emit an interactive HTML report
kingfisher access-map aws \
--json-out access-map.json \
--html-out access-map.html
# Map a GCP service account key and save JSON + HTML to disk
kingfisher access-map gcp path/to/key.json \
--json-out access-map.json \
--html-out access-map.html

Copilot uses AI. Check for mistakes.

pub async fn map_access(credential_path: Option<&Path>) -> Result<AccessMapResult> {
let path = credential_path.ok_or_else(|| anyhow!("GCP access-map requires a key.json path"))?;
let data = std::fs::read_to_string(path).context("Failed to read credential file")?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:
The application builds a file path from potentially untrusted data, which can lead to a path traversal vulnerability. An attacker can manipulate the path which the application uses to access files. If the application does not validate user input and sanitize file paths, sensitive files such as configuration or user data can be accessed, potentially creating or overwriting files. To prevent this vulnerability, validate and sanitize any input that is used to create references to file paths. Also, enforce strict file access controls. For example, choose privileges allowing public-facing applications to access only the required files.

Dataflow graph
flowchart LR
    classDef invis fill:white, stroke: none
    classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none

    subgraph File0["<b>src/access_map/gcp.rs</b>"]
        direction LR
        %% Source

        subgraph Source
            direction LR

            v0["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/gcp.rs#L32 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 32] credential_path</a>"]
        end
        %% Intermediate

        subgraph Traces0[Traces]
            direction TB

            v2["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/gcp.rs#L32 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 32] path</a>"]
        end
        %% Sink

        subgraph Sink
            direction LR

            v1["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/gcp.rs#L33 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 33] path</a>"]
        end
    end
    %% Class Assignment
    Source:::invis
    Sink:::invis

    Traces0:::invis
    File0:::invis

    %% Connections

    Source --> Traces0
    Traces0 --> Sink


Loading

To resolve this comment:

✨ Commit Assistant Fix Suggestion
  1. Validate that the credential_path is within an expected directory, such as a specific credentials folder. Use std::path::Path methods to check that credential_path does not contain any parent directory traversal components, such as ...

  2. Reject or sanitize any path input that is not a direct file name within the allowed directory. For example, you can canonicalize the path and verify that it starts with the base directory by using canonicalize and starts_with.

  3. If you expect only a fixed filename, ensure the path exactly matches it instead of accepting arbitrary input.

  4. Update your function to enforce this, for example:

    let base_dir = Path::new("/safe/credentials/directory");
    let path = credential_path.ok_or_else(|| anyhow!("GCP access-map requires a key.json path"))?;
    let canonical = path.canonicalize()?;
    if !canonical.starts_with(base_dir) { return Err(anyhow!("Access to this path is not allowed")); }
    let data = std::fs::read_to_string(canonical).context("Failed to read credential file")?;

Alternatively, if credential files are only named key.json, ignore user input and always construct the path as let path = base_dir.join("key.json");.

Checking and sanitizing file paths prevents attackers from supplying a path like ../../../../etc/passwd and accessing sensitive files.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by tainted-path.

Help? Slack #semgrep-help or go/semgrep-help.

Resolution Options:

  • Fix the code
  • Reply /fp $reason (if security gap doesn’t exist)
  • Reply /ar $reason (if gap is valid but intentional; add mitigations/monitoring)
  • Reply /other $reason (e.g., test-only)

You can view more details about this finding in the Semgrep AppSec Platform.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/fp this path is constructed from credentials discovered and validated by kingfisher, which ensures that it's the file it is intended to be.

}

fn load_credentials_from_file(path: &Path) -> Result<Credentials> {
let raw = std::fs::read_to_string(path).context("Failed to read AWS credential file")?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:
The application builds a file path from potentially untrusted data, which can lead to a path traversal vulnerability. An attacker can manipulate the path which the application uses to access files. If the application does not validate user input and sanitize file paths, sensitive files such as configuration or user data can be accessed, potentially creating or overwriting files. To prevent this vulnerability, validate and sanitize any input that is used to create references to file paths. Also, enforce strict file access controls. For example, choose privileges allowing public-facing applications to access only the required files.

Dataflow graph
flowchart LR
    classDef invis fill:white, stroke: none
    classDef default fill:#e7f5ff, color:#1c7fd6, stroke: none

    subgraph File0["<b>src/access_map/aws.rs</b>"]
        direction LR
        %% Source

        subgraph Source
            direction LR

            v0["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/aws.rs#L693 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 693] path</a>"]
        end
        %% Intermediate

        %% Sink

        subgraph Sink
            direction LR

            v1["<a href=https://github.com/mongodb/kingfisher/blob/078fa16e6a9511b47a5c72413ea567c76376207e/src/access_map/aws.rs#L693 target=_blank style='text-decoration:none; color:#1c7fd6'>[Line: 693] path</a>"]
        end
    end
    %% Class Assignment
    Source:::invis
    Sink:::invis

    File0:::invis

    %% Connections

    Source --> Sink


Loading

To resolve this comment:

✨ Commit Assistant Fix Suggestion
  1. Only allow file access inside a specific directory, such as a dedicated credentials folder. Define a base directory, for example, let base = Path::new("/some/safe/dir");.
  2. Before reading the file, join the provided path argument to the base directory: let combined = base.join(path);.
  3. Canonicalize the resulting path: let canonical = combined.canonicalize()?;.
  4. Check that the canonical path starts with the base directory: if !canonical.starts_with(base) { return Err(anyhow!("Invalid file path")); }.
  5. Use canonical for opening/reading files instead of the original path.

Alternatively, if the file name is provided by the user,
validate that it only contains allowed characters (like alphanumeric and underscores) and does not contain .., /, or \. For example, use a regular expression to allow only safe patterns.

This protects against path traversal where a malicious input like "../../etc/passwd" could access sensitive files outside the intended directory.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by tainted-path.

Help? Slack #semgrep-help or go/semgrep-help.

Resolution Options:

  • Fix the code
  • Reply /fp $reason (if security gap doesn’t exist)
  • Reply /ar $reason (if gap is valid but intentional; add mitigations/monitoring)
  • Reply /other $reason (e.g., test-only)

You can view more details about this finding in the Semgrep AppSec Platform.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/fp this path is constructed from credentials discovered and validated by kingfisher, which ensures that it's the file it is intended to be.

…s and interning repeated capture names.

- Stored optional validation response bodies as boxed strings to avoid allocating empty payloads and to streamline validator caches.
- Parallelized git cloning based on the configured job count and begin scanning repositories as soon as each clone finishes to reduce end-to-end scan times.
- Combined per-repository results into a single aggregate summary after scans complete.
- Added initial access-map support and report viewer html file. Currently beta features.
@mickgmdb mickgmdb merged commit 2f31157 into main Dec 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants