Skip to content

Tool: Implement SemGrep and CI workflow#36

Open
sandervonk wants to merge 2 commits intomainfrom
sandervonk/tool-semgrep
Open

Tool: Implement SemGrep and CI workflow#36
sandervonk wants to merge 2 commits intomainfrom
sandervonk/tool-semgrep

Conversation

@sandervonk
Copy link
Copy Markdown
Collaborator

@sandervonk sandervonk commented Mar 12, 2026

Semgrep (Static Analysis)

Note: This PR text was generated with the help of the agent who integrated the tool with me.

Semgrep (docs) is an open-source, lightweight static analysis tool that searches for the semantic meaning of code (e.g., matching x + y == 2 even when x = 1; y = x) to find security vulnerabilities, correctness bugs, and coding standard violations.


Evidence of Installation

  • .github/workflows/semgrep.yml: New CI workflow that runs Semgrep on every push/PR to main/master/develop via the official semgrep/semgrep Docker image (zero local install needed in CI).
  • UserGuide.md: New Tools section with usage docs, pro/con analysis, and local install instructions.

Evidence of Successful Execution

Installed locally via brew install semgrep (v1.155.0) and ran:

semgrep scan --config "p/javascript" --config "p/nodejs" .
  • 74 rules across 796 files, 24 findings in 7 files, completed in ~18 seconds.

Artifacts:

File Description
semgrep-evidence.txt Human-readable terminal output with code snippets, rule IDs, and remediation links.
semgrep-results.json Structured JSON output (68 KB) with all 24 findings.

What does it catch?

Semgrep uses Taint Analysis to track untrusted user input (sources) as it flows into dangerous functions (sinks), identifying vulnerabilities that standard linters like ESLint typically miss.

Category Count Example
Session cookie hardening gaps 12 src/webserver.js, install/web.js
Path traversal risk (req.paramspath.join) 5 src/controllers/admin/themes.js, src/middleware/assert.js
Unsafe res.sendFile with user input 2 src/controllers/write/users.js
Dynamic require() 1 install/web.js
TLS verification bypass 1 src/emailer.js

False positives / negatives?

  • True positives you might not care about: 12 of the 24 findings (50%) are session cookie warnings on session() calls. NodeBB handles these dynamically via nconf/setupCookie(), so they are correct-in-isolation but false-positive-in-context. These can be suppressed to keep CI "green."
  • False negatives: The free tier only analyzes within single files. Cross-file data flows (e.g., user input entering in a route handler and reaching a sink in a utility module) are missed without the paid tier.
  • Overall: Every finding maps to a real code location with an actionable explanation and remediation link. No findings on test files, generated code, or vendored dependencies.

Customization

Before first use: None required—the community rulesets (the p/ prefix stands for Registry Packages like p/javascript) work out of the box. The CI workflow is only 35 lines of YAML.

Over time:

  • Exclusions: Add .semgrepignore to exclude vendored/generated code.
  • Suppressions: Suppress confirmed false positives with inline // nosemgrep comments.
  • Custom Rules: Write project-specific patterns (e.g., ensuring privileges.check() is called before any data access logic).

Integration into development process

  • Continuous Integration (CI): The .github/workflows/semgrep.yml runs automatically on push/PR. It uses a baseline threshold of 24 findings—existing findings are tolerated, but any new findings will fail the build.
  • Local Development: Developers can run semgrep scan --config "p/javascript" --config "p/nodejs" . after installing via brew or pip.
  • Note: Because Semgrep scans the entire codebase (no incremental/watch mode), it is best used as a final check before pushing rather than a real-time git hook.

Pros / Cons

Pros Cons
Catches security issues (path traversal, TLS bypass) that ESLint cannot. Not installable via npm; requires brew/pip separately.
Zero config—community registries work immediately. 50% of findings are arguably false positives for this specific framework.
Fast (~18s for 796 files in a single Docker container). No incremental/watch mode for real-time local feedback.
Low noise—ignores test/generated/vendored code by default. Deep cross-file analysis requires the paid tier.
Actionable findings with links to remediation docs. Smaller JS/Node ruleset (74 rules) vs. ESLint's ecosystem.

Local run commands

# Installation
brew install semgrep         # macOS
pip install semgrep          # Linux / WSL

# Execution
semgrep scan --config "p/javascript" --config "p/nodejs" .

semgrep-results.txt
semgrep-results.json

@sandervonk sandervonk force-pushed the sandervonk/tool-semgrep branch from 06a64f1 to b659eaa Compare March 12, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant