Skip to content

feat(kagent-adk): remove litellm as dependency from kagent-adk#1540

Merged
EItanya merged 9 commits intokagent-dev:mainfrom
jmhbh:feat/remove-litellm
Mar 26, 2026
Merged

feat(kagent-adk): remove litellm as dependency from kagent-adk#1540
EItanya merged 9 commits intokagent-dev:mainfrom
jmhbh:feat/remove-litellm

Conversation

@jmhbh
Copy link
Contributor

@jmhbh jmhbh commented Mar 24, 2026

Removes litellm as a dependency from kagent. litellm is now replaced with provider specific sdks.

Testing

ollama

  1. deploy ollama
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: kagent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        resources:
          requests:
            memory: "2Gi"
          limits:
            memory: "4Gi"
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: kagent
spec:
  selector:
    app: ollama
  ports:
  - port: 11434
    targetPort: 11434
EOF
  1. Pull a small model into the ollama deployment - kubectl -n kagent exec -it deploy/ollama -- ollama pull llama3.2:1b
  2. create model config an agent
kubectl apply -f - <<EOF
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: ollama-test-config
  namespace: kagent
spec:
  provider: Ollama
  model: llama3.2:1b
  ollama:
    host: "http://ollama:11434"
    options:
      num_ctx: "2048"
      temperature: "0.7"
      top_k: "40"
---
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: ollama-test
  namespace: kagent
spec:
  type: Declarative
  description: "Ollama native SDK test"
  declarative:
    modelConfig: ollama-test-config
    systemMessage: "You are a helpful assistant. Answer concisely."
EOF
  1. Port forward UI and test agent. kubectl port-forward -n kagent svc/kagent-ui 3000:8080
Screenshot 2026-03-25 at 7 21 20 PM 5. Test memory recall by pulling an embedding model `kubectl -n kagent exec -it deploy/ollama -- ollama pull nomic-embed-text` 6. Create the embedding model config an ollama memory test agent ```bash kubectl apply -f - <

embedding - google

  1. Create secret and model config
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: gemini-api-key-secret
  namespace: kagent
type: Opaque
data:
  GOOGLE_API_KEY: <your_api_key>
---
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: gemini-2-flash-config
  namespace: kagent
spec:
  model: gemini-2.0-flash
  provider: Gemini
  apiKeySecret: gemini-api-key-secret
  apiKeySecretKey: GOOGLE_API_KEY
  gemini: {}
---
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: gemini-embedding-config
  namespace: kagent
spec:
  model: gemini-embedding-001
  provider: Gemini
  apiKeySecret: gemini-api-key-secret
  apiKeySecretKey: GOOGLE_API_KEY
  gemini: {}
EOF
  1. Create Agent
kubectl apply -f - <<EOF
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: memory-openai-test
  namespace: kagent
spec:
  type: Declarative
  description: "Memory with Gemini embedding"
  declarative:
    modelConfig: gemini-2-flash-config
    systemMessage: "You are a helpful assistant with memory."
    memory:
      modelConfig: gemini-embedding-config
EOF
  1. Port forward UI and test agent. kubectl port-forward -n kagent svc/kagent-ui 3000:8080
Screenshot 2026-03-25 at 7 20 18 PM

bedrock

  1. Create aws-credentials
kubectl -n kagent create secret generic aws-credentials \
  --from-literal=AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
  --from-literal=AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
  --from-literal=AWS_DEFAULT_REGION="<your_region>" \
  --from-literal=AWS_SESSION_TOKEN="$AWS_SESSION_TOKEN" \
  --dry-run=client -o yaml | kubectl apply -f -
  1. Create model config and agent
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: bedrock-model-config
  namespace: kagent
spec:
  model: us.anthropic.claude-haiku-4-5-20251001-v1:0
  provider: Bedrock
  bedrock:
    region: us-east-1
---
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: bedrock-test
  namespace: kagent
spec:
  type: Declarative
  description: "Bedrock Converse API test"
  declarative:
    systemMessage: "You are a helpful assistant. Answer concisely."
    modelConfig: bedrock-model-config
    deployment:
      env:
      - name: AWS_ACCESS_KEY_ID
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: AWS_ACCESS_KEY_ID
      - name: AWS_SECRET_ACCESS_KEY
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: AWS_SECRET_ACCESS_KEY
      - name: AWS_DEFAULT_REGION
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: AWS_DEFAULT_REGION
      - name: AWS_SESSION_TOKEN
        valueFrom:
          secretKeyRef:
            name: aws-credentials
            key: AWS_SESSION_TOKEN
Screenshot 2026-03-26 at 2 20 21 PM

jmhbh added 2 commits March 24, 2026 19:04
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
iplay88keys

This comment was marked as duplicate.

Copy link
Contributor

@iplay88keys iplay88keys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor — stale LiteLLM references in docstrings

A few docstrings in _memory_service.py still reference LiteLLM after this change:

  • Line 27 (class docstring): "Generates embeddings using LiteLLM"
  • Lines 60, 71 (add_session_to_memory / _add_session_to_memory_background docstrings): "Optional ADK model object (e.g., LiteLlm, OpenAI)"
  • Line 447 (_summarize_session_content_async docstring): same

These are cosmetic but worth updating for accuracy.

Comment left by Claude on behalf of @iplay88keys

jmhbh and others added 3 commits March 25, 2026 16:58
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
@jmhbh jmhbh marked this pull request as ready for review March 25, 2026 23:24
@jmhbh jmhbh requested a review from EItanya as a code owner March 25, 2026 23:24
Copilot AI review requested due to automatic review settings March 25, 2026 23:24
@jmhbh jmhbh requested review from peterj and yuval-k as code owners March 25, 2026 23:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the litellm dependency from kagent-adk by replacing LiteLLM-based model/embedding usage with provider-specific SDK implementations (Anthropic, Ollama, Bedrock, and OpenAI SDK calls), and adds unit tests to validate the new dispatch behavior.

Changes:

  • Drop litellm from kagent-adk dependencies and lockfile.
  • Replace LiteLLM model creation with native provider model classes (Anthropic/Ollama/Bedrock) and update model dispatch in types.py.
  • Rework embedding generation to call provider SDKs directly, and add new unit tests for embeddings and the new model adapters.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/uv.lock Removes litellm (and its transitive deps like fastuuid) from the workspace lock.
python/packages/kagent-adk/pyproject.toml Removes litellm dependency; retains/uses provider SDK deps (openai/anthropic/boto3/ollama/numpy).
python/packages/kagent-adk/src/kagent/adk/_memory_service.py Replaces LiteLLM embedding calls with provider-specific SDK embedding dispatch.
python/packages/kagent-adk/src/kagent/adk/types.py Updates _create_llm_from_model_config to instantiate native Anthropic/Ollama/Bedrock implementations.
python/packages/kagent-adk/src/kagent/adk/models/_anthropic.py Adds KAgentAnthropicLlm with base_url/headers and API key passthrough support.
python/packages/kagent-adk/src/kagent/adk/models/_bedrock.py Adds KAgentBedrockLlm using Bedrock Converse / ConverseStream APIs via boto3.
python/packages/kagent-adk/src/kagent/adk/models/_ollama.py Adds KAgentOllamaLlm using the native Ollama SDK and tool/function-call conversions.
python/packages/kagent-adk/src/kagent/adk/models/init.py Exports new model classes instead of the removed LiteLLM wrapper.
python/packages/kagent-adk/src/kagent/adk/models/_litellm.py Deletes the LiteLLM wrapper model class.
python/packages/kagent-adk/tests/unittests/test_embedding.py Adds unit tests for embedding dispatch/truncation/normalization without LiteLLM.
python/packages/kagent-adk/tests/unittests/models/test_anthropic.py Adds unit tests for the Anthropic adapter behavior.
python/packages/kagent-adk/tests/unittests/models/test_bedrock.py Adds unit tests for Bedrock adapter + client region selection.
python/packages/kagent-adk/tests/unittests/models/test_ollama.py Adds unit tests for Ollama adapter + option/header forwarding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jmhbh and others added 2 commits March 25, 2026 19:49
@EItanya
Copy link
Contributor

EItanya commented Mar 26, 2026

I threw a claude team at this just to double check, lmk what you think. I'm happy to merge without everything fixed to get rid of litellm quickly, but we're definitely going to need to do deeper testing on bedrock

Review: feat(kagent-adk): remove litellm as dependency

Bugs

1. _normalize_l2 returns np.ndarray — JSON serialization will fail
_memory_service.py ~line 295. After truncation, embeddings are passed through _normalize_l2() which returns a numpy array. When httpx tries to serialize this via json.dumps, it will raise TypeError: Object of type ndarray is not JSON serializable. This affects any embedding model returning vectors > 768 dims (e.g., text-embedding-3-large at 3072).
Fix: embedding = self._normalize_l2(embedding).tolist()

2. Bedrock streaming blocks the event loop
_bedrock.py ~lines 177-218. asyncio.to_thread only wraps the initial converse_stream call, but the for event in stream_body loop iterates synchronously on the main thread. The entire streaming loop needs to be in a thread, or use aioboto3.

3. Bedrock region config field silently ignored
_bedrock.py:_get_bedrock_client() only reads AWS_DEFAULT_REGION/AWS_REGION env vars. The Bedrock model config's region field is never passed through from types.py:518. Pre-existing issue, but the new dedicated function is the right place to fix it.

4. api_key_passthrough causes AttributeError for Bedrock/Ollama
types.py ~lines 499, 518. Neither KAgentBedrockLlm nor KAgentOllamaLlm implement set_passthrough_key() or have api_key_passthrough. If a user configures api_key_passthrough: true, the passthrough plugin will raise AttributeError. Either add no-op implementations or guard in the plugin.

Issues

5. No finish_reason or usage_metadata in Ollama responses
_ollama.py ~lines 205-226. The Ollama adapter never populates these fields, even though ChatResponse has done_reason, prompt_eval_count, and eval_count. Token counts are lost.

6. dimensions=768 incompatible with older OpenAI embedding models
_memory_service.py ~line 374. The dimensions parameter is only supported by text-embedding-3-* models. Passing it to text-embedding-ada-002 (common with Azure) will raise an API error. Pre-existing from the litellm code, but worth fixing now.

7. Missing Anthropic/Bedrock embedding providers
_memory_service.py ~lines 354-361. The embedding dispatch handles openai, azure_openai, ollama, vertex_ai, gemini — but not anthropic or bedrock. Bedrock supports embeddings (Titan). Should at minimum raise a clear error for unsupported providers rather than falling through silently.

8. New boto3 client created on every request
_bedrock.py ~line 152. Unlike Anthropic/Ollama which cache clients, Bedrock creates a fresh boto3.client per generate_content_async call. Should be cached.

9. Bedrock adapter silently ignores inferenceConfig
The Bedrock adapter doesn't forward inferenceConfig (temperature, maxTokens, topP, stopSequences) to the Converse API. Users' model config values are silently dropped with no indication.

Comment left by Claude on behalf of @EItanya

@supreme-gg-gg
Copy link
Contributor

Quick comments based on my context:

  1. Bedrock region config field silently ignored
    _bedrock.py:_get_bedrock_client() only reads AWS_DEFAULT_REGION/AWS_REGION env vars. The Bedrock model config's region field is never passed through from types.py:518. Pre-existing issue, but the new dedicated function is the right place to fix it.

We already set the AWS_REGION env var during translation, so it should work fine

  1. dimensions=768 incompatible with older OpenAI embedding models
    _memory_service.py ~line 374. The dimensions parameter is only supported by text-embedding-3-* models. Passing it to text-embedding-ada-002 (common with Azure) will raise an API error. Pre-existing from the litellm code, but worth fixing now.

Some models do not allow configuring embedding dimensions (returns a fixed size vector that is more than 768), that's the purpose of truncation and re-normalization. According to prior research this works fine in most cases, as long as the call to the model returns a vector longer than 768.

  1. Missing Anthropic/Bedrock embedding providers
    _memory_service.py ~lines 354-361. The embedding dispatch handles openai, azure_openai, ollama, vertex_ai, gemini — but not anthropic or bedrock. Bedrock supports embeddings (Titan). Should at minimum raise a clear error for unsupported providers rather than falling through silently.

Probably out of scope, we might want to rework the embedding interface in the future for wider support

jmhbh added 2 commits March 26, 2026 16:51
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
Signed-off-by: JM Huibonhoa <jm.huibonhoa@solo.io>
@EItanya EItanya merged commit 9dcee3a into kagent-dev:main Mar 26, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants