feat: re-enable EmbeddingGemma-300m support #816

liavweiss · 2025-12-11T20:39:57Z

Re-enable Gemma Embedding Model Support

Task

Summary

This PR re-enables support for the google/embeddinggemma-300m gated model across the codebase, following the resolution of HuggingFace token access (HF_TOKEN now configured by maintainers in CI).

Background

The EmbeddingGemma-300m model (google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication via HF_TOKEN. Due to CI/CD authentication limitations, Gemma support was previously disabled in work related to Issue #573 to allow tests to pass without the gated model.

Now that the maintainer has configured HF_TOKEN in the CI environment, we can restore full Gemma embedding model support.

Changes Made

1. Model Download Configuration (`tools/make/models.mk`)

✅ Added embeddinggemma-300m to download-models-minimal target
✅ Added embeddinggemma-300m to download-models-lora target
✅ Updated comments to reflect that HF_TOKEN is now available in CI
✅ Added graceful skip logic with informative messages when HF_TOKEN is not available

2. Go Test Constants (`candle-binding/semantic-router_test.go`)

✅ Set GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"
✅ Removed t.Skip() from InitGemmaOnly test
✅ Updated test assertions to handle both Qwen3 (1024-dim) and Gemma (768-dim) embeddings
✅ Added isModelInitializationError checks for graceful test skipping when model unavailable

3. Rust Embedding Initialization (`candle-binding/src/ffi/embedding.rs`)

Key Fix: Made Gemma model initialization optional to prevent embedding initialization failures when Gemma cannot be loaded.

✅ Modified init_embedding_models() to continue with Qwen3-only initialization if Gemma fails to load
✅ Changed Gemma registration error handling from return false to warning log + continue
✅ Added informative warning messages explaining that Gemma is optional
✅ This fix ensures that embedding functionality remains available even when Gemma model is not downloaded (e.g., missing HF_TOKEN for gated models)

4. Model Manager Graceful Skip Logic (`src/model_manager/`)

✅ __init__.py: Added GatedModelError exception handling in ensure_all() to gracefully skip gated models when HF_TOKEN is not available
- Logs warning messages instead of failing
- Continues processing other models
- Returns partial results (models that were successfully downloaded)
✅ downloader.py: Enhanced error handling to properly detect and convert gated model errors:
- Handles RepositoryNotFoundError (404) for gated models (HuggingFace returns 404 instead of 401 to avoid revealing repository existence)
- Detects known gated models (e.g., "embeddinggemma", "gemma") in repository IDs
- Converts authentication-related errors to GatedModelError for consistent handling
- Handles GatedRepoError from huggingface_hub library

5. E2E Profile Configurations

✅ Updated e2e/profiles/ai-gateway/values.yaml: Added gemma_model_path
✅ Updated e2e/profiles/dynamic-config/values.yaml:
- Set gemma_model_path to "models/embeddinggemma-300m"
- Changed EMBEDDING_MODEL_OVERRIDE from "qwen3" to "auto" for intelligent model selection
✅ Verified e2e/profiles/routing-strategies/values.yaml: Already configured correctly

6. Helm Chart Configuration (`deploy/helm/semantic-router/values.yaml`)

✅ Added embeddinggemma-300m to initContainer.models list
✅ Configured initContainer.env to use HF_TOKEN from Kubernetes secret (hf-token-secret)
✅ Set optional: true to allow deployment even if secret doesn't exist (for local testing)
✅ Simplified init container script to gracefully skip Gemma download if HF_TOKEN is not available

7. GitHub Actions Workflow Updates

The following GitHub Actions workflows were updated to include HF_TOKEN for downloading gated models:

✅ integration-test-k8s.yml - Kubernetes E2E integration tests
✅ test-and-build.yml - Main test and build workflow
✅ integration-test-docker.yml - Docker Compose integration tests
✅ performance-test.yml - Performance benchmarking tests
✅ performance-nightly.yml - Nightly performance baseline tests
✅ integration-test-helm.yml - Helm chart installation tests

All workflows now:

Pass HF_TOKEN: ${{ secrets.HF_TOKEN }} as an environment variable to the model download steps
Create Kubernetes secrets (hf-token-secret) when HF_TOKEN is available
Include graceful skip logic for forks where HF_TOKEN is not available

8. E2E Framework HF_TOKEN Secret Creation (`e2e/pkg/framework/runner.go`)

✅ Added createHFTokenSecret() function: Creates a Kubernetes secret named hf-token-secret in the vllm-semantic-router-system namespace from the HF_TOKEN environment variable
- Ensures the namespace exists (creating it if necessary) before creating the secret
- Handles cases where the secret already exists (updates it) or the namespace doesn't exist yet
- The secret is namespace-scoped and must be in the same namespace as the semantic-router deployment
✅ Added secret creation call in Run() method: After creating the Kubernetes client, the framework checks for the HF_TOKEN environment variable and automatically creates the secret if present
- Logs appropriate messages indicating whether the secret was created successfully, or if HF_TOKEN is not set
- This fix resolves the 401 Unauthorized and GatedRepoError issues that were preventing Gemma model downloads in E2E tests

9. Helm Deployment Template Fix (`deploy/helm/semantic-router/templates/deployment.yaml`)

✅ Fixed YAML indentation issue in initContainer.env section (nindent 10 changed to nindent 8) that caused Helm installation failures when initContainer.env was actually defined
✅ Simplified init container script to conditionally download Gemma only when HF_TOKEN is available

Testing

✅ Go linter: Passed (0 issues)
✅ Go mod tidy: Passed
✅ Configuration files verified: All changes in place
✅ Model download: Correctly attempts to download Gemma (401 expected without HF_TOKEN)
✅ Graceful skip: Verified that system continues to work with Qwen3-only when Gemma is unavailable
✅ With HF_TOKEN available (maintainer CI): Gemma model downloads successfully, full embedding functionality available
✅ Without HF_TOKEN (fork PRs): Gemma download is skipped gracefully, system continues with Qwen3-only mode, tests pass
✅ E2E Tests: dynamic-config profile now passes because embedding initialization no longer fails when Gemma is unavailable

netlify · 2025-12-11T20:40:04Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`31800e3`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/6953b71bbd2b710008873d12
😎 Deploy Preview	https://deploy-preview-816--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-12-11T20:40:18Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/integration-test-docker.yml
.github/workflows/integration-test-k8s.yml
.github/workflows/test-and-build.yml

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/semantic-router_test.go
candle-binding/src/ffi/embedding.rs

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/docker-compose/docker-compose.ci.yml
deploy/docker-compose/docker-compose.yml
deploy/helm/semantic-router/templates/deployment.yaml
deploy/helm/semantic-router/values.yaml

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/pkg/framework/runner.go
e2e/profiles/ai-gateway/values.yaml
e2e/profiles/dynamic-config/values.yaml

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/modeldownload/downloader.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

liavweiss · 2025-12-11T21:20:07Z

This PR will remain in draft for now—I’m waiting to confirm whether an HF_TOKEN is already configured.

rootfs · 2025-12-12T00:45:55Z

@liavweiss the HF_TOKEN is configured

liavweiss · 2025-12-13T18:59:50Z

@liavweiss the HF_TOKEN is configured

@rootfs I added debug lines into test-ci-compose (in the download models section) and the output confirms the issue:

Repository context: vllm-project/semantic-router (workflow runs in upstream ✅)
Event: pull_request
PR head repo: liavweiss/semantic-router (PR is from a fork)
HF_TOKEN: Not available ❌

Root cause: GitHub Actions does not make secrets available in pull_request events when the PR is from a fork, even though the workflow runs in the upstream repository context. This is a security feature to prevent malicious fork code from accessing secrets.

Current Solution

The workflow gracefully skips Gemma download when the token is unavailable(currently implemented only on Integration Docker compose workflow), allowing PRs to pass while still running full tests on push events (after merge) where secrets are available.

Options

Keep current approach (graceful skip) - Secure, but contributors can't test Gemma on PRs (only if they set a private hf_token in their fork)
Use pull_request_target - Allows secrets, but has security implications
Hybrid: Keep graceful skip + document manual testing via workflow_dispatch with fork secrets

What do you think about these approaches?
My suggestion is to keep the current approach (graceful skip). It's secure and allows contributors to test manually via workflow_dispatch with their fork secrets (but we need to document it).

rootfs · 2025-12-15T13:57:51Z

@liavweiss thanks for the analysis. Let's skip it then. If there is a need to use gemma, we'll revisit this issue.

liavweiss · 2025-12-21T13:53:25Z

PR is ready fro review

liavweiss · 2025-12-23T12:50:08Z

After merging #862, my branch needs a deep refactor
Please don't review it at this moment.

… models Signed-off-by: Liav Weiss <[email protected]>

liavweiss · 2025-12-25T09:58:35Z

PR is ready to review

liavweiss requested review from Xunzhuo and rootfs as code owners December 11, 2025 20:39

github-actions bot assigned rootfs, Xunzhuo and yuluo-yx Dec 11, 2025

liavweiss force-pushed the feature/gemma-model-enable branch 3 times, most recently from 45e4516 to 52451b1 Compare December 11, 2025 21:05

liavweiss marked this pull request as draft December 11, 2025 21:18

samzong mentioned this pull request Dec 12, 2025

bug: Access to restricted models like google/embeddinggemma-300m requires authorization(HF TOKEN) #817

Open

liavweiss force-pushed the feature/gemma-model-enable branch 2 times, most recently from 8aa1c5e to cec298c Compare December 13, 2025 18:46

liavweiss marked this pull request as ready for review December 13, 2025 18:47

liavweiss marked this pull request as draft December 13, 2025 18:54

liavweiss force-pushed the feature/gemma-model-enable branch from cec298c to fee395f Compare December 15, 2025 20:07

liavweiss marked this pull request as ready for review December 15, 2025 20:08

liavweiss force-pushed the feature/gemma-model-enable branch 3 times, most recently from f745a5c to ee7c0c5 Compare December 15, 2025 20:33

liavweiss requested a review from wangchen615 as a code owner December 15, 2025 20:33

github-actions bot assigned wangchen615 Dec 15, 2025

liavweiss marked this pull request as draft December 15, 2025 20:46

liavweiss force-pushed the feature/gemma-model-enable branch from ee7c0c5 to a156f28 Compare December 16, 2025 09:35

liavweiss marked this pull request as ready for review December 16, 2025 09:36

liavweiss force-pushed the feature/gemma-model-enable branch 12 times, most recently from 5812d70 to d08388d Compare December 21, 2025 13:27

liavweiss force-pushed the feature/gemma-model-enable branch 2 times, most recently from 1389599 to 2cdb2bc Compare December 23, 2025 07:47

liavweiss force-pushed the feature/gemma-model-enable branch from f8407d4 to 66b621c Compare December 24, 2025 14:31

Complete rebase: Add Gemma model support with graceful skip for gated…

e13013f

… models Signed-off-by: Liav Weiss <[email protected]>

liavweiss force-pushed the feature/gemma-model-enable branch from 66b621c to e13013f Compare December 25, 2025 08:31

liavweiss added 6 commits December 28, 2025 10:42

Merge branch 'main' into feature/gemma-model-enable

a4fda31

Merge branch 'main' into feature/gemma-model-enable

a2c87fe

Merge branch 'main' into feature/gemma-model-enable

086e70f

Merge branch 'main' into feature/gemma-model-enable

9ba5c66

Merge branch 'main' into feature/gemma-model-enable

493f480

Merge branch 'main' into feature/gemma-model-enable

31800e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: re-enable EmbeddingGemma-300m support #816

feat: re-enable EmbeddingGemma-300m support #816

Uh oh!

liavweiss commented Dec 11, 2025 •

edited

Loading

Uh oh!

netlify bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 11, 2025 •

edited

Loading

Uh oh!

liavweiss commented Dec 11, 2025

Uh oh!

rootfs commented Dec 12, 2025

Uh oh!

liavweiss commented Dec 13, 2025 •

edited

Loading

Uh oh!

rootfs commented Dec 15, 2025

Uh oh!

liavweiss commented Dec 21, 2025

Uh oh!

liavweiss commented Dec 23, 2025

Uh oh!

liavweiss commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: re-enable EmbeddingGemma-300m support #816

Are you sure you want to change the base?

feat: re-enable EmbeddingGemma-300m support #816

Uh oh!

Conversation

liavweiss commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-enable Gemma Embedding Model Support

Task

Summary

Background

Changes Made

1. Model Download Configuration (tools/make/models.mk)

2. Go Test Constants (candle-binding/semantic-router_test.go)

3. Rust Embedding Initialization (candle-binding/src/ffi/embedding.rs)

4. Model Manager Graceful Skip Logic (src/model_manager/)

5. E2E Profile Configurations

6. Helm Chart Configuration (deploy/helm/semantic-router/values.yaml)

7. GitHub Actions Workflow Updates

8. E2E Framework HF_TOKEN Secret Creation (e2e/pkg/framework/runner.go)

9. Helm Deployment Template Fix (deploy/helm/semantic-router/templates/deployment.yaml)

Testing

Uh oh!

netlify bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 Root Directory

📁 candle-binding

📁 deploy

📁 e2e

📁 src

🎉 Thanks for your contributions!

Uh oh!

liavweiss commented Dec 11, 2025

Uh oh!

rootfs commented Dec 12, 2025

Uh oh!

liavweiss commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current Solution

Options

Uh oh!

rootfs commented Dec 15, 2025

Uh oh!

liavweiss commented Dec 21, 2025

Uh oh!

liavweiss commented Dec 23, 2025

Uh oh!

liavweiss commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

liavweiss commented Dec 11, 2025 •

edited

Loading

1. Model Download Configuration (`tools/make/models.mk`)

2. Go Test Constants (`candle-binding/semantic-router_test.go`)

3. Rust Embedding Initialization (`candle-binding/src/ffi/embedding.rs`)

4. Model Manager Graceful Skip Logic (`src/model_manager/`)

6. Helm Chart Configuration (`deploy/helm/semantic-router/values.yaml`)

8. E2E Framework HF_TOKEN Secret Creation (`e2e/pkg/framework/runner.go`)

9. Helm Deployment Template Fix (`deploy/helm/semantic-router/templates/deployment.yaml`)

netlify bot commented Dec 11, 2025 •

edited

Loading

github-actions bot commented Dec 11, 2025 •

edited

Loading

📁 `Root Directory`

📁 `candle-binding`

📁 `deploy`

📁 `e2e`

📁 `src`

liavweiss commented Dec 13, 2025 •

edited

Loading