Skip to content

feat(mcp): add HTTP auth middleware and caller-provided backend tokens#1253

Open
sriaradhyula wants to merge 55 commits into0.5.0from
101-mcp-auth-caller-key
Open

feat(mcp): add HTTP auth middleware and caller-provided backend tokens#1253
sriaradhyula wants to merge 55 commits into0.5.0from
101-mcp-auth-caller-key

Conversation

@sriaradhyula
Copy link
Copy Markdown
Member

Summary

  • New shared package ai_platform_engineering/agents/common/mcp-auth/ (mcp-agent-auth) providing MCPAuthMiddleware and get_request_token() for all MCP servers
  • Three auth modes selectable via MCP_AUTH_MODE env var: none (default, backward-compat), shared_key (constant-time HMAC compare vs MCP_SHARED_KEY), oauth2 (JWT validation via JWKS)
  • Caller-provided backend tokens: the Authorization: Bearer <token> header that authenticates the MCP connection is also used as the backend service API key — no per-server credential env vars required in HTTP deployments
  • All 10 MCP servers updated: argocd, backstage, confluence, jira, komodor, netutils, pagerduty, splunk, victorops, webex
  • 21 unit tests covering all auth modes, public path bypass, SSE error format, and get_request_token resolution order

How it works

Incoming auth (HTTP transport only):

MCP_AUTH_MODE=none           # pass-through (default)
MCP_AUTH_MODE=shared_key     # validates Bearer token == MCP_SHARED_KEY
MCP_AUTH_MODE=oauth2         # validates JWT via JWKS_URI / AUDIENCE / ISSUER

Backend token resolution (all transports):

# In HTTP mode: reads Authorization: Bearer header from live request
# In STDIO mode: falls back to env var (e.g. ARGOCD_API_TOKEN)
token = get_request_token("ARGOCD_API_TOKEN")

In shared_key mode, the same bearer token serves as both the MCP auth credential and the backend API key — a single credential for both hops.

Test plan

  • PYTHONPATH=. uv run pytest tests/test_mcp_auth_middleware.py -v — 21 tests pass
  • make lint — clean
  • STDIO mode: MCP_AUTH_MODE=none + backend token in env → tools work (backward compat)
  • HTTP mode, none: no Authorization header required → tools work with env var token
  • HTTP mode, shared_key: correct MCP_SHARED_KEY → 200; wrong/missing → 401
  • HTTP mode, shared_key, no env token: Authorization: Bearer <api-token> → used for backend calls
  • HTTP mode, oauth2: valid JWT → 200; invalid/expired → 401
  • /healthz public path bypasses auth in all modes

Notes

  • MCP_AUTH_MODE is a new env var; existing deployments continue to work unchanged (defaults to none)
  • Backend token env vars that previously raised ValueError at startup now emit logger.warning — servers can start without pre-set credentials when using caller-provided tokens
  • Webex special case: --auth-token CLI option changed from required=True to optional; _get_token() helper resolves per-request in HTTP mode, falls back to startup token in STDIO mode

🤖 Generated with Claude Code

sriaradhyula and others added 30 commits April 15, 2026 08:50
* fix(ci): bump Go 1.25→1.26 and langgraph 1.0.10→1.1.6

The github MCP Docker build fails because go.mod requires go>=1.26.2
but Dockerfile.mcp pins golang:1.25-alpine (running go 1.25.9).

Three sub-packages (agent_ontology, multi_agents, splunk/agent_splunk)
still pinned langgraph==1.0.10, which is incompatible with the
transitively resolved langgraph-prebuilt==1.0.9 that imports
ExecutionInfo from langgraph.runtime — a symbol only present in
langgraph>=1.1.0. Align these to 1.1.6 matching the rest of the repo.

Also bumps langchain 1.1.3→1.2.15 in agent_ontology (langchain 1.1.3
caps langgraph<1.1.0) and removes a duplicate langgraph line in
multi_agents/pyproject.toml.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(ci): move Grype container scan and integration test to post-release

Remove tag-push triggers from security-scan.yml and
tests-quick-sanity-integration-on-latest-tag.yml — images are not yet
published when the tag event fires, causing MANIFEST_UNKNOWN failures.

Instead, the release-finalize workflow now dispatches both workflows
via workflow_dispatch after all CI builds pass and the release is
published, ensuring container images are available.

Also bumps release-finalize permissions from actions:read to
actions:write so it can dispatch workflows.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

---------

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Covers MongoDB schema changes (agent_configs→agent_skills rename,
feedback extraction, new collections), Helm value changes (Slack MCP
image swap, Langfuse removal, checkpoint persistence), environment
variable changes (DISTRIBUTED_AGENTS), RAG entity rename, dependency
upgrades, rollback scripts, and pre/post-upgrade checklists.

Assisted-by: Claude:claude-opus-4-6

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Bumps the npm_and_yarn group with 1 update in the /docs directory: [dompurify](https://github.com/cure53/DOMPurify).


Updates `dompurify` from 3.3.3 to 3.4.0
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](cure53/DOMPurify@3.3.3...3.4.0)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.0
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

* SDPL-1601 fix(middleware): allow LLM to synthesize after RAG cap exhaustion

[GAI]

When the RAG search cap is exhausted and the agent keeps calling search,
DeterministicTaskMiddleware injects a ToolMessage telling the agent to
synthesize from what was already retrieved — but then immediately set
jump_to="end", skipping the LLM's chance to actually produce a response.
This caused "I've completed your request." empty responses.

Remove jump_to="end" from the RAG loop termination path so the agent
gets one more LLM turn to generate a real answer from retrieved context.

Signed-off-by: Erik Lutz <elutz@splunk.com>

* fix(middleware): improve RAG loop synthesis logging message

Update the log message from "terminating" to "allowing LLM one more turn to synthesize response"
to accurately reflect the behavior — we inject a synthesis prompt and return to the model,
we don't terminate the graph. The misleading log message was confusing during debugging.

Assisted-by: Claude:claude-haiku-4-5-20251001-v1
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(middleware): add debug logging and inline comments for RAG loop detection

- Add debug log showing which RAG tools are capped (helps diagnose RAG loop scenarios)
- Add comprehensive inline comments explaining the RAG loop detection logic
- Change exception logging from debug to warning level with full traceback (exc_info=True)
- Clarify that we return to the model WITHOUT jump_to:"end" so LLM gets a synthesis turn

This improves debuggability when RAG caps are exhausted.

Assisted-by: Claude:claude-haiku-4-5-20251001-v1
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

---------

Signed-off-by: Erik Lutz <elutz@splunk.com>
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Erik Lutz <elutz@splunk.com>
Co-authored-by: Sri Aradhyula <sraradhy@cisco.com>
- Quote associative array keys in declare statements to prevent treating them as unbound variables under set -u
- Use parameter expansion default `[@]:-` syntax for array expansions that may be unset, matching bash best practices for nounset-safe iteration
- Fixes errors: "argocd: unbound variable" at line 1484 and "PF_PIDS[@]: unbound variable" at line 97

Assisted-by: Claude:claude-haiku-4-5-20251001-v1:0

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…nt writes (#1234)

Problem
-------
The caipe-dev supervisor pod crashes repeatedly (12+ restarts / 21h) with:

  Primary stream failed: update command document too large

Root cause: the LangGraph MongoDBSaver persists the FULL graph state on
every step. In single-node mode the state includes two large channels that
are always re-populated from the in-process skill cache at the start of
every graph invocation and therefore have no business being in MongoDB:

  files["/skills/*"]
    Raw SKILL.md content injected by
    skills_middleware.backend_sync.build_skills_files().
    56 skills across 3 hub sources produce 422 file entries (~1-3 MB).

  skills_metadata
    Processed skill catalogue built by
    deepagents.middleware.skills.SkillsMiddleware.before_model() from
    the files channel on every graph step (~1-2 MB).

As conversation state grows (messages + todos + tasks + files + skills_metadata),
the BSON update document exceeds MongoDB's hard 16 MB limit and the
supervisor crashes. Nginx then returns 503 until the pod recovers (~2 min).

Fix
---
Add two new helper functions alongside the existing _truncate_large_messages:

  _strip_skills_from_checkpoint(checkpoint)
    Called in _LazyAsyncMongoDBSaver.aput().
    - Removes all files entries whose key starts with "/skills/" while
      preserving user-created files (write_file / read_file output) so
      cross-turn file-passing continues to work.
    - Removes the skills_metadata channel entirely.

  _strip_skills_from_writes(writes)
    Called in _LazyAsyncMongoDBSaver.aput_writes().
    - Drops any ("skills_metadata", ...) write entry.
    - Strips "/skills/*" keys from any ("files", {...}) write entry.

Both functions only modify the copy that is persisted to MongoDB. The
live in-memory LangGraph state is unaffected, so SkillsMiddleware
continues to see the full skill catalogue for the duration of the
current turn.

Why this is safe
----------------
agent.py:AIPlatformEngineerA2ABinding.stream() (and the equivalent
deep_agent.py:serve()/serve_stream() paths) already inject

  state_dict["files"] = dict(self._mas_instance._skills_files)

on EVERY call before graph.astream(). LangGraph reads the checkpoint
ONCE at the start of an invocation and then merges the inputs dict into
the recovered state via file_reducer ({**checkpoint_files, **input_files}).
Skills are therefore always present in the in-memory state for the entire
turn even though they are absent from the persisted checkpoint. Between
turns (pod restarts, HITL resumes) the next stream() call re-injects them.

Observed impact
---------------
- files channel: -422 entries (~1-3 MB) stripped per checkpoint write
- skills_metadata channel: ~1-2 MB stripped per checkpoint write
- Total checkpoint size reduction: 2-5 MB per write
- With a 16 MB BSON limit, this gives ~3x more headroom for conversation
  state (messages, todos, tasks) before the limit is reached



assisted-by claude code claude-sonnet-4-6

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…1237)

PlatformEngineerResponse.is_task_complete, require_user_input, and
Metadata.user_input had no defaults. A partial LLM response (common
for read-only tasks like doc retrieval) omitting these fields failed
Pydantic validation, triggering ModelRetryMiddleware to retry 5× before
bailing with on_failure="continue" and emitting no response.

Add sensible defaults (is_task_complete=True, require_user_input=False,
user_input=False) so partial ResponseFormat calls succeed without retries.

Update test_response_format.py to reflect the new behavior and add
regression tests in test_supervisor_fetch_document_cap_e2e.py that
validate the exact partial-payload shapes that were previously failing.

Assisted-by: Claude:claude-sonnet-4-6

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
* fix(ui): don't persist session-expired errors in chat history

When the A2A client receives a 401, it throws a 'Session expired:' error.
Previously this was appended to the assistant message and persisted in
MongoDB, so recovering a chat would show the error inline. The
TokenExpiryGuard already handles the UX (modal + redirect), so skip
appendToMessage for session expiry errors across all 4 catch blocks in
ChatPanel and DynamicAgentChatPanel.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* test(ui): add tests for session-expired error suppression in chat

Verify that appendToMessage is NOT called when the A2A client throws
'Session expired: …', and that other errors (network, backend) are
still appended inline as expected.

Also adds @testing-library/dom as an explicit dev dependency.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

---------

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
The devDependency used ^10.4.1 (semver range), which fails the
pinned-deps CI check that requires all Node deps to be exact versions.

Assisted-by: Claude:claude-sonnet-4-6

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
#1235)

* fix(a2a): isolate execution plan state per request to prevent cross-session leakage

The AIPlatformEngineerA2ABinding is a singleton shared across all
concurrent chat sessions. Four mutable instance attributes tracking
execution plan state (_execution_plan_sent, _previous_todos,
_task_plan_entries, _in_self_service_workflow) caused plan data from
one user's session to leak into another's when requests overlapped.

Introduces a PlanState dataclass instantiated as a local variable in
stream(), ensuring each request gets its own isolated plan tracking
state. Helper methods _build_todo_plan_text() and _build_task_plan_text()
now accept PlanState as a parameter instead of reading instance state.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): correct plan_step_id assertion on post-plan streaming chunks

The test asserted post-plan streaming_result chunks should NOT carry
plan_step_id, but agent_executor.py:830-835 intentionally stamps it so
the UI nests content under plan steps. Updated assertion to match actual
behavior.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix:  remove unused import

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* test(a2a): add concurrent session isolation tests for PlanState

Simulate two overlapping stream() calls on the same singleton binding
using asyncio barriers and verify plan entries, execution_plan_sent
flag, and sequential state are all per-session isolated.

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unused PlanState import in concurrent test

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…nce (#1229)

* feat(auth): add dual auth middleware for shared key + OAuth2 coexistence

When both A2A_AUTH_SHARED_KEY and A2A_AUTH_OAUTH2=true are set,
the new DualAuthMiddleware allows both auth methods simultaneously:

1. Shared key → immediate access (A2A machine-to-machine clients)
2. OAuth2 JWT → validated via JWKS (UI OIDC flow)

Previously, setting A2A_AUTH_SHARED_KEY would override OAuth2 auth
entirely (if/elif), breaking the UI's OIDC-based backend calls.

The middleware priority chain in main.py is now:
- Both set → DualAuthMiddleware (shared key OR JWT)
- Only shared key → SharedKeyMiddleware
- Only OAuth2 → OAuth2Middleware
- Neither → no auth

verify_token is imported lazily inside dispatch() to avoid pulling
in oauth2_middleware module-level env validation at import time.

Includes unit tests for all auth paths.

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>

* fix(auth): address review findings in dual auth middleware

- Use hmac.compare_digest() for shared key comparison to prevent timing attacks
- Fix broken test mocking: verify_token mock was being restored before
  requests ran, so OAuth2 fallback tests were hitting real (uninitialized)
  verify_token. Moved mocking to test-level context managers.

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>

* fix: address PR review feedback

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>

---------

Signed-off-by: Arthur Drozdov <adrozdov@cisco.com>
Signed-off-by: Ubuntu <ubuntu@ip-10-175-48-77.us-east-2.compute.internal>
Signed-off-by: Ubuntu <ubuntu@ip-10-175-48-77.us-east-2.compute.internal>
Signed-off-by: Ubuntu <ubuntu@ip-10-175-48-77.us-east-2.compute.internal>
Signed-off-by: Ubuntu <ubuntu@ip-10-175-48-77.us-east-2.compute.internal>
…-prefix

feat(helm): allow custom ingress path for supervisor ingress
…gent output

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Kevin Kantesaria and others added 20 commits April 17, 2026 14:33
…repeated failures

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
…ming behavior

Signed-off-by: Kevin Kantesaria <kkantesaria@splunk.com>
[GAI]

Removes the diagnostic logger.info block that logged every A2A event
(type, artifact, text_len, preview, step statuses). This was added for
debugging but clutters production logs and was flagged in PR review.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Erik Lutz <elutz@splunk.com>
…e-io/ai-platform-engineering into fix/slack-streaming-restore-v0241-ux

# Conflicts:
#	ai_platform_engineering/integrations/slack_bot/utils/ai.py
…pts (SDPL-1601)

[GAI]

The system prompt instructed the LLM to call search(keyword_search=True/False)
but this parameter never existed in the tool schema. Pydantic rejected it on
every call, causing 7 traced production failures today where users received
"I ran into an issue" error messages instead of answers.

Fixes both locations where the bad instruction appeared:
- ai_platform_engineering/integrations/slack_bot/utils/config_models.py
- charts/ai-platform-engineering/data/prompt_config.rag.yaml

Replaces keyword_search references with equivalent phrasing guidance that
doesn't reference nonexistent tool parameters.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Erik Lutz <elutz@splunk.com>
…ompt

fix(slack-bot): remove nonexistent keyword_search param from RAG search prompts
…41-ux

fix(slack): restore v0.2.41 streaming UX and suppress duplicate sub-agent output
Establishes a consistent `appuser` (UID/GID 1001) for running applications within containers.

Signed-off-by: Louise Champ <myauie@gmail.com>
…1242)

* fix(supervisor): add curl tool to supervisor utility tools

The curl tool was implemented and exported but never added to the
supervisor utility_tools list in _build_graph_async. Without it,
the supervisor only had fetch_url (GET-only) for HTTP requests.
PUT/POST requests would fail with 405 or be silently mishandled,
causing the agent to hallucinate success on write API calls.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(supervisor): suppress curl/fetch_url raw content from stream; restore response_format defaults

curl and fetch_url return large HTTP response bodies that were streaming
directly to clients (Slack appendStream failures, raw HTML in chat).
Add _SUPPRESS_CONTENT_TOOLS constant in agent.py (curl, fetch_url,
fetch_markdown, wget) and suppress their ToolMessage content the same
way RAG tools are suppressed — the tool_call completion notification
is sufficient for the client.

Also restore Metadata.user_input, PlatformEngineerResponse.is_task_complete,
and require_user_input defaults that were accidentally dropped in this
branch. Partial LLM responses omitting these fields were triggering Pydantic
ValidationError → ModelRetryMiddleware retry loop (5×) → bailout after 1 step.

Update test_missing_user_input_raises_validation_error → defaults_to_false
to reflect the lenient-parsing design.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* refactor(tools): replace fetch_url with curl; add strip_html param

fetch_url was a GET-only requests.get() wrapper — a strict subset of
what curl already handles. Remove it from utility_tools and add an
optional strip_html=True parameter to the curl tool that runs
BeautifulSoup stripping on the response, preserving the one unique
capability fetch_url had.

Also remove fetch_url from _SUPPRESS_CONTENT_TOOLS since it is no
longer a supervisor tool.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(curl): add guardrails — https-only, block file-write flags

- Reject any URL with a scheme other than https:// to prevent
  http://, file://, ftp:// etc.
- Block -o/--output and --config/-K flags that write to disk or
  read curl config files from LLM-generated commands.
- Validation runs on parsed args before subprocess.run so blocked
  commands never reach the shell.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(curl): replace terse errors with detailed user-facing messages

Guardrail rejections now explain what was blocked, why, and what
is supported — so the LLM can relay the information clearly instead
of surfacing a bare "ERROR:" string.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(curl): unblock -o/--output, only block --config/-K

-o is needed for legitimate use cases (e.g. saving API responses
to pass to jq). Only --config/-K (reads curl config from disk)
remains blocked.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* fix(curl): simplify guardrails to https-only URL check

Remove flag blocklist entirely — only enforce https:// URLs.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* test(curl): add unit and e2e tests for curl tool guardrails and supervisor wiring

Unit tests (test_curl_tool.py):
- _validate_curl_args: https accepted, http/file/ftp rejected,
  error messages contain scheme and list supported alternatives,
  query strings stripped from error output
- Argument handling: curl prefix auto-prepended, not doubled,
  PUT/body args passed through, invalid shlex rejected
- Subprocess results: success, empty output, nonzero exit,
  stderr appended, timeout, curl not found
- strip_html: tags stripped, raw returned when False, default is False

E2E tests (TestCurlToolInSupervisor):
- curl present in tools after _build_graph()
- fetch_url absent from tools after _build_graph()
- strip_html and timeout params exposed on the tool
- http:// and file:// URLs return informative user-facing strings

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* feat(rag): add tool arbitration guidance — KB-first, curl for execution only

Add tool_arbitration_prompt section to prompt_config.rag.yaml that
explicitly tells the LLM to use the knowledge base (search/fetch_document)
first for discovery and documentation, and reserve curl for state-changing
API calls and live data not available in the KB.

Wire _TOOL_ARBITRATION_PROMPT into both _RAG_ONLY_INSTRUCTIONS and
_RAG_WITH_GRAPH_INSTRUCTIONS so the rule applies regardless of whether
graph RAG is enabled.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* feat(rag): add explicit-curl and live-data exceptions to arbitration rule

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* feat(rag): rephrase curl arbitration exception

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* feat(rag): finalize curl arbitration wording

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* refactor(agent): remove _SUPPRESS_CONTENT_TOOLS, inline curl check

The frozenset only covered curl in practice (fetch_markdown/wget are
not supervisor tools). Inline tool_name == "curl" directly.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* docs(curl): expand docstring with wget/fetch_markdown equivalent examples

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* docs(curl): remove internal reference to wget/fetch_markdown

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* test(curl): add file download test; fix stdout mock for -o flag

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>

* chore: bump chart versions for ai-platform-engineering

---------

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* feat: add vertical pod autoscaler support

- Add Vertical Pod Autoscaler capabilities for agent and supervisor-agent deployments, as well as for MCP containers when running in single-node mode

Signed-off-by: Louise Champ <myauie@gmail.com>

* feat(helm): add global.vpa to enable VPA across all agent sub-charts

Introduces global.vpa in the parent chart values so operators can enable
VPA on the supervisor and all agent sub-charts with a single flag instead
of repeating vpa.enabled=true under each agent section.

Signed-off-by: Louise Champ <myauie@gmail.com>

* chore: bump chart versions for agent supervisor-agent

* chore: bump chart versions for ai-platform-engineering

* feat(helm): enable VPA in more sub-charts

- Standardises VPA resource policies by setting `controlledValues` to `RequestsAndLimits` for all VPA resources.
- Extends VPA support by adding VPA templates and default configurations to:
- `caipe-ui`
- `caipe-ui-mongodb`
- `dynamic-agents`
- `langgraph-redis`
- `slack-bot`
- `agent-ontology`
- `rag-ingestors` (creates one VPA per ingestor)
- `rag-server`
- Separates VPA for the main agent container and the standalone MCP HTTP server in the `agent` chart, with a dedicated VPA resource for MCP when running as a deployment. - Adjusts VPA rendering logic to handle single-node deployment mode and remote agents for better compatibility.

Signed-off-by: Louise Champ <myauie@gmail.com>

* chore: bump chart versions for agent caipe-ui-mongodb dynamic-agents langgraph-redis slack-bot supervisor-agent agent-ontology rag-ingestors rag-server

* chore: bump chart versions for ai-platform-engineering rag-stack

---------

Signed-off-by: Louise Champ <myauie@gmail.com>
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sri Aradhyula <sraradhy@cisco.com>
…tters to fix CVEs (#1248)

Addresses Dependabot security alerts:
- CVE: langchain-openai SSRF DNS rebinding (fix: >=1.1.14)
- CVE: langchain-text-splitters HTMLHeaderTextSplitter SSRF (fix: >=1.1.2)

Changes:
- langchain-openai: 1.0.3/1.1.0/1.1.1 → 1.1.14 across all agents and RAG packages
- langchain-core: 1.2.28 → 1.2.31 (required by langchain-openai 1.1.14)
- langchain-text-splitters: 1.0.0 → 1.1.2 in rag/ingestors and rag/server
- openai: 2.19.0 → 2.32.0 in rag/server (required by langchain-openai 1.1.14)
- Regenerated all 26 affected uv.lock files

Assisted-by: Claude:claude-sonnet-4-6

Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
Addresses remaining Dependabot security alerts (151 open):
- pypdf 6.10.0 → 6.10.2 (medium CVE)
- python-multipart, authlib (1.6.11), langsmith (0.7.31): transitive bumps via uv lock upgrade
- langchain-text-splitters: final remaining lock file updated

Changes:
- pypdf==6.10.2 in constraint-dependencies across 19 pyproject.toml files
- Regenerated 38 uv.lock files (incl. mcp/ subworkspaces) upgrading:
  pypdf, python-multipart, authlib, langsmith, langchain-text-splitters
- scrapy: no fix available upstream, left as-is

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
…smith-vulnerabilities

fix(deps): bump pypdf, python-multipart, authlib, langsmith to fix CVEs
All 10 first-party MCP servers previously had no authentication in HTTP
mode. This change adds three selectable auth modes via a new shared
package mcp-agent-auth, and allows the caller Authorization Bearer token
to serve as the backend service API key, eliminating the need for
per-server credential env vars in HTTP deployments.

New package ai_platform_engineering/agents/common/mcp-auth:
- MCPAuthMiddleware: Starlette BaseHTTPMiddleware supporting none,
  shared_key (hmac.compare_digest), and oauth2 (JWT via JWKS) modes
- get_request_token(): resolves token from HTTP request header at call
  time, falling back to env var for STDIO backward compatibility

All 10 MCP servers:
- mcp-agent-auth added as uv path dependency
- MCPAuthMiddleware injected when MCP_MODE=http
- api/client.py updated to use get_request_token()
- Module-level ValueError for token env vars changed to logger.warning
- Webex: per-request _get_token() helper; --auth-token made optional
- VictorOps: OrgCredentials.api_key made Optional
- 21 unit tests covering all auth modes and get_request_token behavior

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

Add ContextVar-based token propagation so the bearer token that
authenticates each A2A request is automatically forwarded as the
Authorization header on every outbound MCP HTTP call made by
LangGraph — isolated per async Task with no bleed across requests.

Changes:
- utils/auth/token_context.py: new ContextVar current_bearer_token
- SharedKeyMiddleware, OAuth2Middleware, DualAuthMiddleware: set
  ContextVar after successful auth
- mcp_agent_auth/token_context.py: separate ContextVar for MCP-side
  token (supports MCP-to-MCP chaining)
- MCPAuthMiddleware: set mcp ContextVar after auth passes (all modes
  except none)
- base_langgraph_agent._build_httpx_client_factory: always returns a
  callable; reads ContextVar at call time to inject Authorization
  header; resolves TBD_USER_JWT stubs in _load_mcp_tools and
  _setup_mcp_and_graph
- tests/test_token_context.py: 23 unit tests covering ContextVar
  defaults, all middleware token injection, factory behavior, SSL
  verify, concurrent task isolation
- tests/test_auth_token_forwarding_e2e.py: 18 e2e tests covering the
  full A2A->LangGraph->MCP token forwarding chain

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Sri Aradhyula <sraradhy@cisco.com>
@github-actions
Copy link
Copy Markdown
Contributor

✅ No proprietary content detected. This PR is clear for review!

@pytest.mark.anyio
async def test_factory_token_does_not_leak_after_request(self):
"""After a request completes the ContextVar is restored to its prior value."""
factory = MagicMock()
import asyncio
import importlib
import os
from typing import Any
import importlib
import os
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
import asyncio
import importlib
import os
from unittest.mock import patch, MagicMock
assert len(captured_client) == 1
auth = captured_client[0].headers.get("authorization")
assert auth == f"Bearer {SHARED_KEY}"
import asyncio; asyncio.run(captured_client[0].aclose())
# ContextVar is None (no middleware set it)
client = factory()
assert "authorization" not in {k.lower() for k in client.headers}
import asyncio; asyncio.run(client.aclose())
@sriaradhyula sriaradhyula marked this pull request as ready for review April 18, 2026 10:43
@sriaradhyula sriaradhyula changed the base branch from main to 0.5.0 April 20, 2026 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

7 participants