-
Notifications
You must be signed in to change notification settings - Fork 373
feat: re-enable EmbeddingGemma-300m support #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
45e4516 to
52451b1
Compare
|
This PR will remain in draft for now—I’m waiting to confirm whether an HF_TOKEN is already configured. |
|
@liavweiss the HF_TOKEN is configured |
8aa1c5e to
cec298c
Compare
@rootfs I added debug lines into test-ci-compose (in the download models section) and the output confirms the issue:
Root cause: GitHub Actions does not make secrets available in Current SolutionThe workflow gracefully skips Gemma download when the token is unavailable(currently implemented only on Integration Docker compose workflow), allowing PRs to pass while still running full tests on push events (after merge) where secrets are available. Options
What do you think about these approaches? |
|
@liavweiss thanks for the analysis. Let's skip it then. If there is a need to use gemma, we'll revisit this issue. |
cec298c to
fee395f
Compare
f745a5c to
ee7c0c5
Compare
ee7c0c5 to
a156f28
Compare
5812d70 to
d08388d
Compare
|
PR is ready fro review |
1389599 to
2cdb2bc
Compare
|
After merging #862, my branch needs a deep refactor |
f8407d4 to
66b621c
Compare
… models Signed-off-by: Liav Weiss <[email protected]>
66b621c to
e13013f
Compare
|
PR is ready to review |

Re-enable Gemma Embedding Model Support
Task
Fix #790
Summary
This PR re-enables support for the
google/embeddinggemma-300mgated model across the codebase, following the resolution of HuggingFace token access (HF_TOKENnow configured by maintainers in CI).Background
The EmbeddingGemma-300m model (
google/embeddinggemma-300m) is a gated model on HuggingFace that requires authentication viaHF_TOKEN. Due to CI/CD authentication limitations, Gemma support was previously disabled in work related to Issue #573 to allow tests to pass without the gated model.Now that the maintainer has configured
HF_TOKENin the CI environment, we can restore full Gemma embedding model support.Changes Made
1. Model Download Configuration (
tools/make/models.mk)embeddinggemma-300mtodownload-models-minimaltargetembeddinggemma-300mtodownload-models-loratargetHF_TOKENis now available in CIHF_TOKENis not available2. Go Test Constants (
candle-binding/semantic-router_test.go)GemmaEmbeddingModelPath = "../models/embeddinggemma-300m"t.Skip()fromInitGemmaOnlytestisModelInitializationErrorchecks for graceful test skipping when model unavailable3. Rust Embedding Initialization (
candle-binding/src/ffi/embedding.rs)Key Fix: Made Gemma model initialization optional to prevent embedding initialization failures when Gemma cannot be loaded.
init_embedding_models()to continue with Qwen3-only initialization if Gemma fails to loadreturn falseto warning log + continueHF_TOKENfor gated models)4. Model Manager Graceful Skip Logic (
src/model_manager/)✅
__init__.py: AddedGatedModelErrorexception handling inensure_all()to gracefully skip gated models whenHF_TOKENis not available✅
downloader.py: Enhanced error handling to properly detect and convert gated model errors:RepositoryNotFoundError(404) for gated models (HuggingFace returns 404 instead of 401 to avoid revealing repository existence)GatedModelErrorfor consistent handlingGatedRepoErrorfromhuggingface_hublibrary5. E2E Profile Configurations
e2e/profiles/ai-gateway/values.yaml: Addedgemma_model_pathe2e/profiles/dynamic-config/values.yaml:gemma_model_pathto"models/embeddinggemma-300m"EMBEDDING_MODEL_OVERRIDEfrom"qwen3"to"auto"for intelligent model selectione2e/profiles/routing-strategies/values.yaml: Already configured correctly6. Helm Chart Configuration (
deploy/helm/semantic-router/values.yaml)embeddinggemma-300mtoinitContainer.modelslistinitContainer.envto useHF_TOKENfrom Kubernetes secret (hf-token-secret)optional: trueto allow deployment even if secret doesn't exist (for local testing)HF_TOKENis not available7. GitHub Actions Workflow Updates
The following GitHub Actions workflows were updated to include
HF_TOKENfor downloading gated models:integration-test-k8s.yml- Kubernetes E2E integration teststest-and-build.yml- Main test and build workflowintegration-test-docker.yml- Docker Compose integration testsperformance-test.yml- Performance benchmarking testsperformance-nightly.yml- Nightly performance baseline testsintegration-test-helm.yml- Helm chart installation testsAll workflows now:
HF_TOKEN: ${{ secrets.HF_TOKEN }}as an environment variable to the model download stepshf-token-secret) whenHF_TOKENis availableHF_TOKENis not available8. E2E Framework HF_TOKEN Secret Creation (
e2e/pkg/framework/runner.go)✅ Added
createHFTokenSecret()function: Creates a Kubernetes secret namedhf-token-secretin thevllm-semantic-router-systemnamespace from theHF_TOKENenvironment variable✅ Added secret creation call in
Run()method: After creating the Kubernetes client, the framework checks for theHF_TOKENenvironment variable and automatically creates the secret if presentHF_TOKENis not set401 UnauthorizedandGatedRepoErrorissues that were preventing Gemma model downloads in E2E tests9. Helm Deployment Template Fix (
deploy/helm/semantic-router/templates/deployment.yaml)initContainer.envsection (nindent 10changed tonindent 8) that caused Helm installation failures wheninitContainer.envwas actually definedHF_TOKENis availableTesting
HF_TOKENavailable (maintainer CI): Gemma model downloads successfully, full embedding functionality availableHF_TOKEN(fork PRs): Gemma download is skipped gracefully, system continues with Qwen3-only mode, tests passdynamic-configprofile now passes because embedding initialization no longer fails when Gemma is unavailable