Skip to content

datadog-labs/agent-skills

Repository files navigation

Datadog Skills for AI Agents

Datadog skills for Claude Code, Codex CLI, Gemini CLI, Cursor, Windsurf, OpenCode, and other AI agents.

Skills

Skill Description
dd-pup Primary CLI - commands, auth, PATH setup
dd-monitors Create, manage, mute monitors
dd-logs Search logs
dd-apm Traces, services, performance, Single-Step Instrumentation
dd-docs Search Datadog documentation
agent-observability Agent Observability: experiments, eval RCA, evaluator generation, session classification
dd-browser-sdk Browser SDK: RUM, Logs, Session Replay, profiling, product analytics, error tracking, version migration
dd-audit Audit Trail investigations: who changed what, key compromise, cost spike root cause, compliance evidence (SOC 2/PCI), AI activity auditing
dd-software-delivery CI/CD workflow skills — unblock PR pipelines, triage flaky tests (MCP + pup)
dd-apps Build Datadog Apps — scaffold, run locally, upload, publish, CI/CD, DDSQL data access

Install

Setup Pup

# Homebrew (macOS/Linux) — recommended
brew tap datadog-labs/pack
brew install datadog-labs/pack/pup

# Or build from source
git clone https://github.com/datadog-labs/pup.git && cd pup
cargo build --release
cp target/release/pup ~/.local/bin

Pre-built binaries are also available from the latest release.

# Authenticate
pup auth login

Add Skill(s)

For JUST dd-pup:

npx skills add datadog-labs/agent-skills \
  --skill dd-pup \
  --full-depth -y

For ALL skills:

npx skills add datadog-labs/agent-skills \
  --skill dd-pup \
  --skill dd-monitors \
  --skill dd-logs \
  --skill dd-apm \
  --skill dd-docs \
  --skill dd-browser-sdk \
  --skill dd-audit \
  --skill service-remapping \
  --skill agent-install \
  --skill enable-ssi \
  --skill verify-ssi \
  --skill troubleshoot-ssi \
  --skill onboarding-summary \
  --skill upgrade-browser-sdk-v7 \
  --skill dd-audit-security-investigation \
  --skill dd-audit-key-compromise \
  --skill dd-audit-cost-spike-investigation \
  --skill dd-audit-compliance-report \
  --skill dd-audit-ai-activity \
  --skill agent-observability-experiment-analyzer \
  --skill agent-observability-experiment-py-bootstrap \
  --skill agent-observability-trace-rca \
  --skill agent-observability-eval-bootstrap \
  --skill agent-observability-eval-pipeline \
  --skill agent-observability-session-classify \
  --skill k9-ownership-byod-setup \
  --full-depth -y

Agent Observability (LLMO)

The agent-observability directory contains six skills for working with Agent Observability data:

Skill Purpose
agent-observability-experiment-analyzer Analyze and compare offline LLM experiments
agent-observability-experiment-py-bootstrap Generate self-contained Python experiment code using the ddtrace.llmobs SDK
agent-observability-trace-rca Root-cause production failures using eval judge signal or runtime errors
agent-observability-eval-bootstrap Generate evaluator code from traces, optionally seeded by RCA output. Also emits a dataset from traces in --emit-dataset mode.
agent-observability-eval-pipeline Eight-phase pipeline: classify → RCA → bootstrap evaluators → create dataset → publish → generate experiment → run → analyze. Stop early with --stop-after.
agent-observability-session-classify Classify whether user intent was satisfied in a session (trace + RUM signals)

Eval pipeline flow:

agent-observability-session-classify    agent-observability-trace-rca → agent-observability-eval-bootstrap
 (classify sessions)          (diagnose why)      (build evals)

Run agent-observability-trace-rca to understand why an app is failing by analyzing eval judge verdicts or runtime errors across production traces. Then run agent-observability-eval-bootstrap to generate evaluator code that captures those failure patterns. Pass the RCA output directly to agent-observability-eval-bootstrap to seed it with the discovered failure taxonomy.

Use agent-observability-eval-pipeline to run all three steps in sequence with checkpoints between each phase.

Use agent-observability-session-classify independently to evaluate whether individual assistant sessions satisfied user intent, combining Agent Observability trace data with RUM behavioral signals.

Use agent-observability-experiment-py-bootstrap to generate a self-contained Python experiment client that uses the ddtrace.llmobs SDK — runnable as a .py script or .ipynb notebook, with inline records, a CSV path, or a named Datadog dataset as the input.

Install

# Claude Code — copy any or all skills
cp -r agent-observability/agent-observability-experiment-analyzer ~/.claude/skills
cp -r agent-observability/agent-observability-experiment-py-bootstrap ~/.claude/skills
cp -r agent-observability/agent-observability-trace-rca ~/.claude/skills
cp -r agent-observability/agent-observability-eval-bootstrap ~/.claude/skills
cp -r agent-observability/agent-observability-eval-pipeline ~/.claude/skills
cp -r agent-observability/agent-observability-session-classify ~/.claude/skills

MCP Requirements

All six skills require the LLMO toolset:

claude mcp add --scope user --transport http "datadog-llmo-mcp" 'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=llmobs'

experiment-analyzer uses the core toolset for notebook export (optional). eval-session-classify requires it for RUM behavioral analysis and efficient batched fetches of trace session spans:

claude mcp add --scope user --transport http "datadog-mcp-core" 'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=core'

Usage

# Analyze experiments
experiment-analyzer <experiment_id>                         # single experiment
experiment-analyzer <baseline_id> <candidate_id>            # compare two experiments
experiment-analyzer <id(s)> <question>                      # ask a specific question
experiment-analyzer <id(s)> [question] --output notebook    # export to Datadog notebook

# Root-cause why an app is failing
What's wrong with <ml_app> based on its evals over the last 24h
Analyze eval failures for <eval_name> over the last week
Look at the errors on <ml_app> over the last 24h

# Generate evaluator code from production traces
/eval-bootstrap <ml_app>                                    # cold start
/eval-bootstrap <ml_app> [paste eval-trace-rca output here] # seeded from RCA
/eval-bootstrap <ml_app> --data-only                        # emit JSON spec instead of Python SDK code

# Generate a Python experiment client using the ddtrace.llmobs SDK
/agent-observability-experiment-py-bootstrap                                                  # 3-record inline sample
/agent-observability-experiment-py-bootstrap --dataset ./data/qa.json --format ipynb          # local JSON dataset, notebook
/agent-observability-experiment-py-bootstrap --dataset-name qa_v3 --project-name customer-qa  # existing Datadog dataset
/agent-observability-experiment-py-bootstrap --evaluator-style remote                         # server-side RemoteEvaluator stubs

# Classify a session
/eval-session-classify <session_id>

# Guided end-to-end pipeline (6 narrated phases — classify → RCA → eval bootstrap → dataset → experiment → analyze)
/agent-observability-eval-pipeline <ml_app>
/agent-observability-eval-pipeline <ml_app> --timeframe now-30d --trace-limit 25 --format ipynb

Software Delivery (dd-software-delivery)

The dd-software-delivery directory contains workflow skills for CI/CD visibility and test reliability:

Skill Purpose
unblock-pr Investigate a failing PR CI pipeline — classify each failure as flaky, infra, or regression; fetch code coverage and PR quality/security insights; propose targeted actions
triage-flaky-test Deep-dive on a specific flaky test — get history, blast radius, root cause category, and recommend a code fix or quarantine

Workflow:

unblock-pr → (if flaky failure) → triage-flaky-test → quarantine or fix

Backend

Both skills auto-detect the available backend at runtime:

  • MCP mode (preferred): uses the Datadog software-delivery MCP tools (search_datadog_ci_pipeline_events, get_datadog_flaky_tests, retry_datadog_ci_job, etc.). Enables PR quality/security insights and native GitHub Actions retry.
  • pup mode (fallback): uses the pup CLI. PR quality/security data is not available; GitHub Actions retry falls back to gh run rerun.

Pass --backend pup to force pup mode regardless of MCP availability.

MCP Requirements

Connect the Datadog MCP server with the software-delivery toolset:

claude mcp add --scope user --transport http "datadog-mcp" \
  'https://mcp.datadoghq.com/api/unstable/mcp-server/mcp?toolsets=core,software-delivery'

Prerequisites

Requires pup CLI for pup mode (and as a fallback). See Setup Pup.

Install

# Claude Code — copy any or all skills
cp -r dd-software-delivery/unblock-pr ~/.claude/skills
cp -r dd-software-delivery/triage-flaky-test ~/.claude/skills

Or via npx:

npx skills add datadog-labs/agent-skills \
  --skill dd-software-delivery/unblock-pr \
  --skill dd-software-delivery/triage-flaky-test \
  --full-depth -y

Usage

# Investigate a failing PR
unblock-pr                                     # auto-detects branch and repo from git
unblock-pr my-feature-branch                   # explicit branch
unblock-pr my-feature-branch github.com/org/repo

# Triage a specific flaky test
triage-flaky-test TestMyFunc
triage-flaky-test com.example.MyTest github.com/org/repo

Audit Trail (dd-audit)

The dd-audit directory contains five skills for investigating Datadog Audit Trail data:

Skill Purpose
security-investigation Who changed what, user activity, login geo, deletions, permission changes
key-compromise Investigate a potentially compromised API key — timeline, geo/IP, endpoints called
cost-spike-investigation Correlate usage spike (Usage Metering) with config changes (Audit Trail) to find root cause
compliance-report Generate SOC 2 / PCI DSS evidence from audit data
ai-activity-audit Audit what the Bits AI / MCP assistant did in your org

Prerequisites

These skills use the Datadog Audit REST API directly (no pup audit command exists yet). You need an API key + App key with audit_logs_read scope:

export DD_API_KEY=<your-api-key>
export DD_APP_KEY=<your-app-key>
export DD_SITE=datadoghq.com   # or us3/us5/eu/ap1/ap2

Install

# Claude Code — copy any or all skills
cp -r dd-audit/security-investigation ~/.claude/skills
cp -r dd-audit/key-compromise ~/.claude/skills
cp -r dd-audit/cost-spike-investigation ~/.claude/skills
cp -r dd-audit/compliance-report ~/.claude/skills
cp -r dd-audit/ai-activity-audit ~/.claude/skills

Usage

# Security investigation
Who deleted monitors in the last 24 hours?
What did user@example.com do this week?
Show login activity from unexpected locations

# Key compromise
Was API key <key_id> used from unexpected locations?
Investigate this API key: <key_id>

# Cost spike
Why did our Agent Observability usage spike on May 1?
What caused the cost increase this week?

# Compliance
Generate SOC 2 evidence for CC6.2 and CC6.3 for Q1 2026
Create a PCI DSS Requirement 10 report for the last 90 days

# AI activity
What did the Bits AI assistant do in my org this week?
Show me a governance report for AI tool calls in April

Software Delivery (dd-software-delivery)

The dd-software-delivery directory contains workflow skills for CI/CD visibility and test reliability:

Skill Purpose
unblock-pr Investigate a failing PR CI pipeline — classify each failure as flaky, infra, or regression; fetch code coverage; propose targeted actions
triage-flaky-test Deep-dive on a specific flaky test — get history, blast radius, root cause category, and recommend a code fix or quarantine

Workflow:

unblock-pr → (if flaky failure) → triage-flaky-test → quarantine or fix

Run unblock-pr when CI is red on a PR to attribute each failing job. If a failure is classified as flaky, the skill hands off to triage-flaky-test for deeper investigation and a targeted fix or quarantine via pup test-optimization flaky-tests update.

Prerequisites

Requires pup CLI installed and authenticated (pup auth login). See Setup Pup.

Install

# Claude Code — copy any or all skills
cp -r dd-software-delivery/unblock-pr ~/.claude/skills
cp -r dd-software-delivery/triage-flaky-test ~/.claude/skills

Or via npx:

npx skills add datadog-labs/agent-skills \
  --skill dd-software-delivery/unblock-pr \
  --skill dd-software-delivery/triage-flaky-test \
  --full-depth -y

Usage

# Investigate a failing PR
unblock-pr                                    # auto-detects branch and repo from git
unblock-pr my-feature-branch                  # explicit branch
unblock-pr my-feature-branch github.com/org/repo

# Triage a specific flaky test
triage-flaky-test TestMyFunc
triage-flaky-test com.example.MyTest github.com/org/repo

Datadog Apps (dd-apps)

The dd-apps directory contains a skill for building Datadog Apps — locally-developed web apps built with TypeScript and React that integrate with Datadog surfaces.

Skill Purpose
datadog-app Scaffold, run locally, build, upload, publish, set up CI/CD, trigger Workflow Automation, and query data with DDSQL or Action Catalog

Prerequisites

A Datadog account with an API key and application key that have Actions API Access enabled. See App Builder Access and Authentication.

export DD_API_KEY="<YOUR_API_KEY>"
export DD_APP_KEY="<YOUR_APPLICATION_KEY>"

Node.js 20.19+ or 22.12+ is required. Use Volta, nvm, or fnm to manage versions.

Install

# Claude Code
cp -r dd-apps/datadog-app ~/.claude/skills

Or via npx:

npx skills add datadog-labs/agent-skills \
  --skill datadog-app \
  --full-depth -y

Usage

# Scaffold a new app
Scaffold a new Datadog App called my-app

# Run locally
Run my Datadog App locally

# Upload and publish
Upload my app to Datadog
How do I publish my app?

# Troubleshoot
I'm getting a 401 error when uploading
My backend function isn't working

# Query data
Query my app datastore with DDSQL
Trigger a Workflow Automation workflow from a backend function

Quick Reference

Task Command
Search error logs pup logs search --query "status:error" --from 1h
List monitors pup monitors list
Schedule monitor downtime pup downtime create --file downtime.json
Find slow traces pup traces search --query "service:api @duration:>500ms" --from 1h
Query metrics pup metrics query --query "avg:system.cpu.user{*}"
List services for an env (required) pup apm services list --env <env> --from 1h --to now
Check auth pup auth status
Refresh token pup auth refresh

More commands for pup are found in the official pup docs.

Auth

# Check auth first (includes token time remaining)
pup auth status

# If commands fail with 401/403, try refresh first
pup auth refresh

# If refresh fails or no session exists, do full OAuth login
pup auth login

# Non-default site/org
pup auth login --site datadoghq.eu --org <org>

If the browser opens the wrong profile/window, use the one-time URL printed by pup auth login and open it manually in the correct session.

More Skills

Additional skills available soon.

# List all available
npx skills add datadog-labs/agent-skills --list --full-depth

License

MIT

About

Public repository for Datadog Agent Skills

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors