Skip to content

Conversation

@mengweiguo
Copy link
Contributor

@mengweiguo mengweiguo commented Jan 29, 2026

Description

The duplicate PR#3088 in master branch.

CVS-###

Fixes #(issue)

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

Copilot AI review requested due to automatic review settings January 29, 2026 01:50
@mengweiguo mengweiguo requested a review from Wovchena as a code owner January 29, 2026 01:50
@github-actions github-actions bot added category: llm_bench Label for tool/llm_bench folder category: CPP API Changes in GenAI C++ public headers no-match-files category: GGUF GGUF file reader category: RAG RAG pipeline components labels Jan 29, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Qwen3 embedding model release, enhancing text embedding functionality with improved configuration handling and NPU device support.

Changes:

  • Added pad_to_max_length configuration option for text embeddings
  • Fixed typo in function name from get_argprser to get_argparser
  • Refactored NPU compilation logic to support text embedding models with dynamic inputs
  • Added comprehensive test coverage for Qwen3 embedding model with various pooling types and configurations

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tools/llm_bench/task/text_embeddings.py Updated tokenizer configuration to support pad_to_max_length parameter
tools/llm_bench/llm_bench_utils/ov_utils.py Reorganized configuration logic and fixed padding_side parameter name
tools/llm_bench/llm_bench_utils/model_utils.py Added emb_pad_to_max_length to model arguments
tools/llm_bench/benchmark.py Fixed typo in get_argparser function name and added embedding_pad_to_max_length argument
tests/python_tests/test_rag.py Added device and properties parameters to run_text_embedding_genai, parameterized validation threshold, and added extensive NPU tests
src/cpp/src/utils.hpp Added declaration for compile_decoder_for_npu_text_embedding function
src/cpp/src/utils.cpp Refactored NPU compilation into reusable functions and added text embedding specific configuration
src/cpp/src/rag/text_embedding_utils.hpp Created new utility header for text embedding operations
src/cpp/src/rag/text_embedding_utils.cpp Implemented utility functions for model reshaping and post-processing
src/cpp/src/rag/text_embedding_pipeline.cpp Refactored to use utility functions and added NPU support with separate post-processing
src/cpp/src/rag/npu/text_embedding_pipeline.hpp Added NPU-specific text embedding pipeline declarations
src/cpp/src/rag/npu/text_embedding_pipeline.cpp Implemented NPU-specific text embedding pipeline creation
src/cpp/include/openvino/genai/rag/text_embedding_pipeline.hpp Added documentation for NPU dynamic prompt input properties

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mengweiguo mengweiguo changed the title Qwen3 embedding release [NPU] Support NPUW for text-embedding models Jan 29, 2026
@mengweiguo
Copy link
Contributor Author

mengweiguo commented Jan 29, 2026

Node.js bindings tests on macOS failure is expected as PR#3227 is not merged in release branch.

@mengweiguo mengweiguo requested a review from dmatveev January 29, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: llm_bench Label for tool/llm_bench folder category: RAG RAG pipeline components no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant