Closed
Description
System Info
image
text-embeddings-inference:turing-1.6-grpc
model id
sentence-transformers/distiluse-base-multilingual-cased-v2
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
error info
dense-embed | 2025-04-23T03:14:03.297238Z INFO text_embeddings_router: router/src/lib.rs:188: Maximum number of tokens per request: 128
dense-embed | 2025-04-23T03:14:03.297495Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 48 tokenization workers
dense-embed | 2025-04-23T03:14:05.341111Z INFO text_embeddings_router: router/src/lib.rs:230: Starting model backend
dense-embed | 2025-04-23T03:14:05.918661Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:317: Starting DistilBertModel model on Cuda(CudaDevice(DeviceId(1)))
dense-embed | 2025-04-23T03:14:07.818830Z ERROR text_embeddings_backend: backends/src/lib.rs:255: Could not start Candle backend: Could not start backend: cannot find tensor encoder.layer.0.attention.q_lin.weight
dense-embed | Error: Could not create backend
dense-embed |
dense-embed | Caused by:
dense-embed | Could not start backend: Could not start a suitable backend
Expected behavior
start service successful
Metadata
Metadata
Assignees
Labels
No labels