-
Notifications
You must be signed in to change notification settings - Fork 332
[NPU] Support NPUW for text-embedding models #3244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: releases/2026/0
Are you sure you want to change the base?
[NPU] Support NPUW for text-embedding models #3244
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for the Qwen3 embedding model release, enhancing text embedding functionality with improved configuration handling and NPU device support.
Changes:
- Added
pad_to_max_lengthconfiguration option for text embeddings - Fixed typo in function name from
get_argprsertoget_argparser - Refactored NPU compilation logic to support text embedding models with dynamic inputs
- Added comprehensive test coverage for Qwen3 embedding model with various pooling types and configurations
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tools/llm_bench/task/text_embeddings.py | Updated tokenizer configuration to support pad_to_max_length parameter |
| tools/llm_bench/llm_bench_utils/ov_utils.py | Reorganized configuration logic and fixed padding_side parameter name |
| tools/llm_bench/llm_bench_utils/model_utils.py | Added emb_pad_to_max_length to model arguments |
| tools/llm_bench/benchmark.py | Fixed typo in get_argparser function name and added embedding_pad_to_max_length argument |
| tests/python_tests/test_rag.py | Added device and properties parameters to run_text_embedding_genai, parameterized validation threshold, and added extensive NPU tests |
| src/cpp/src/utils.hpp | Added declaration for compile_decoder_for_npu_text_embedding function |
| src/cpp/src/utils.cpp | Refactored NPU compilation into reusable functions and added text embedding specific configuration |
| src/cpp/src/rag/text_embedding_utils.hpp | Created new utility header for text embedding operations |
| src/cpp/src/rag/text_embedding_utils.cpp | Implemented utility functions for model reshaping and post-processing |
| src/cpp/src/rag/text_embedding_pipeline.cpp | Refactored to use utility functions and added NPU support with separate post-processing |
| src/cpp/src/rag/npu/text_embedding_pipeline.hpp | Added NPU-specific text embedding pipeline declarations |
| src/cpp/src/rag/npu/text_embedding_pipeline.cpp | Implemented NPU-specific text embedding pipeline creation |
| src/cpp/include/openvino/genai/rag/text_embedding_pipeline.hpp | Added documentation for NPU dynamic prompt input properties |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
Description
The duplicate PR#3088 in master branch.
CVS-###
Fixes #(issue)
Checklist: