Skip to content
Closed

Testing #3087

Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,7 @@ jobs:
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).tokenizers.test }}
timeout: 60
- name: 'API tests'
cmd: 'python -m pytest -v ./tests/python_tests/test_continuous_batching.py ./tests/python_tests/test_generation_config.py ./tests/python_tests/test_sampling.py ./tests/python_tests/test_text_streamer.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_continuous_batching.py -k "not eagle3" ./tests/python_tests/test_generation_config.py ./tests/python_tests/test_sampling.py ./tests/python_tests/test_text_streamer.py'
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).continuous_batching.test || fromJSON(needs.smart_ci.outputs.affected_components).sampling.test || fromJSON(needs.smart_ci.outputs.affected_components).text_streamer.test }}
timeout: 60
- name: 'Rag tests'
Expand All @@ -551,6 +551,12 @@ jobs:
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
timeout: 90
- name: 'EAGLE3 speculative decoding tests'
cmd: |
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing from a personal GitHub fork using a specific commit hash is fragile and not maintainable. Consider either: 1) merging these changes into the official optimum-intel repository and using a tagged release, or 2) documenting why this fork is necessary and when it can be removed.

Suggested change
cmd: |
cmd: |
# FIXME: Installing from a personal fork is fragile. This is required because the official optimum-intel does not yet support EAGLE3 speculative decoding.
# Remove this and use the official optimum-intel release once https://github.com/huggingface/optimum-intel/pull/XXX is merged and released.

Copilot uses AI. Check for mistakes.
python -m pip install git+https://github.com/xufang-lisa/optimum-intel.git@ea9607daf32919024cdd4390deec9693a7b64d23
python -m pytest -v ./tests/python_tests/test_continuous_batching.py -k "eagle3"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).speculative_decoding.test }}
timeout: 90
defaults:
run:
shell: bash
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/manylinux_2_28.yml
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,7 @@ jobs:
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).tokenizers.test }}
timeout: 60
- name: 'API tests'
cmd: 'python -m pytest -v ./tests/python_tests/test_continuous_batching.py ./tests/python_tests/test_generation_config.py ./tests/python_tests/test_sampling.py ./tests/python_tests/test_text_streamer.py'
cmd: 'python -m pytest -v ./tests/python_tests/test_continuous_batching.py -k "not eagle3" ./tests/python_tests/test_generation_config.py ./tests/python_tests/test_sampling.py ./tests/python_tests/test_text_streamer.py'
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).continuous_batching.test || fromJSON(needs.smart_ci.outputs.affected_components).sampling.test || fromJSON(needs.smart_ci.outputs.affected_components).text_streamer.test }}
timeout: 60
- name: 'Rag tests'
Expand All @@ -489,6 +489,12 @@ jobs:
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
timeout: 90
- name: 'EAGLE3 speculative decoding tests'
cmd: |
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing from a personal GitHub fork using a specific commit hash is fragile and not maintainable. Consider either: 1) merging these changes into the official optimum-intel repository and using a tagged release, or 2) documenting why this fork is necessary and when it can be removed.

Suggested change
cmd: |
cmd: |
# FIXME: Using a personal fork of optimum-intel for EAGLE3 speculative decoding tests.
# Reason: Required changes are not yet merged upstream. See https://github.com/huggingface/optimum-intel/pull/<PR_NUMBER> (replace with actual PR/issue link).
# Remove this and use official optimum-intel release once changes are merged.

Copilot uses AI. Check for mistakes.
python -m pip install git+https://github.com/xufang-lisa/optimum-intel.git@ea9607daf32919024cdd4390deec9693a7b64d23
python -m pytest -v ./tests/python_tests/test_continuous_batching.py -k "eagle3"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).speculative_decoding.test }}
timeout: 90
defaults:
run:
shell: bash
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -623,7 +623,7 @@ jobs:
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).tokenizers.test }}
timeout: 60
- name: 'API tests'
cmd: 'python -m pytest -s -v tests/python_tests/test_continuous_batching.py tests/python_tests/test_generation_config.py tests/python_tests/test_sampling.py tests/python_tests/test_text_streamer.py'
cmd: 'python -m pytest -s -v tests/python_tests/test_continuous_batching.py -k "not eagle3" tests/python_tests/test_generation_config.py tests/python_tests/test_sampling.py tests/python_tests/test_text_streamer.py'
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).continuous_batching.test || fromJSON(needs.smart_ci.outputs.affected_components).sampling.test || fromJSON(needs.smart_ci.outputs.affected_components).text_streamer.test }}
timeout: 60
- name: 'Rag tests'
Expand All @@ -640,6 +640,12 @@ jobs:
python -m pytest -v ./tools/who_what_benchmark/tests -m nanollava
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).WWB.test }}
timeout: 90
- name: 'EAGLE3 speculative decoding tests'
cmd: |
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing from a personal GitHub fork using a specific commit hash is fragile and not maintainable. Consider either: 1) merging these changes into the official optimum-intel repository and using a tagged release, or 2) documenting why this fork is necessary and when it can be removed.

Suggested change
cmd: |
cmd: |
# TODO: Using a personal fork of optimum-intel for EAGLE3 speculative decoding tests.
# Reason: [Add explanation here, e.g., "Required for feature X not yet merged upstream. See PR #123."]
# Remove this and use official optimum-intel release when upstream PR is merged.

Copilot uses AI. Check for mistakes.
python -m pip install git+https://github.com/xufang-lisa/optimum-intel.git@ea9607daf32919024cdd4390deec9693a7b64d23
python -m pytest -v ./tests/python_tests/test_continuous_batching.py -k "eagle3"
run_condition: ${{ fromJSON(needs.smart_ci.outputs.affected_components).speculative_decoding.test }}
timeout: 90
defaults:
run:
shell: pwsh
Expand Down
10 changes: 9 additions & 1 deletion samples/python/visual_language_chat/benchmark_vlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def read_images(path: str) -> list[Tensor]:
def main():
parser = argparse.ArgumentParser(description="Help command")
parser.add_argument("-m", "--model", type=str, help="Path to model and tokenizers base directory")
parser.add_argument("-dm", "--draft_model", type=str, help="Path to draft model and tokenizers base directory")
parser.add_argument("-p", "--prompt", type=str, default=None, help="Prompt")
parser.add_argument("-pf", "--prompt_file", type=str, help="Read prompt from file")
parser.add_argument("-i", "--image", type=str, default="image.jpg", help="Image")
Expand All @@ -61,6 +62,7 @@ def main():
# Perf metrics is stored in VLMDecodedResults.
# In order to get VLMDecodedResults instead of a string input should be a list.
models_path = args.model
draft_model_path = args.draft_model
images = read_images(args.image)
device = args.device
num_warmup = args.num_warmup
Expand All @@ -76,7 +78,13 @@ def main():
scheduler_config = ov_genai.SchedulerConfig()
scheduler_config.enable_prefix_caching = False
scheduler_config.max_num_batched_tokens = sys.maxsize
pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config)

print("draft_model_path=", draft_model_path)
print("device=", device)
draft_model = ov_genai.draft_model(str(draft_model_path), device)
#pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config)
Comment on lines +82 to +85
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements and commented-out code should be removed before merging. These appear to be temporary testing code that shouldn't be in production.

Suggested change
print("draft_model_path=", draft_model_path)
print("device=", device)
draft_model = ov_genai.draft_model(str(draft_model_path), device)
#pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config)
draft_model = ov_genai.draft_model(str(draft_model_path), device)

Copilot uses AI. Check for mistakes.
pipe = ov_genai.VLMPipeline(models_path, device, scheduler_config=scheduler_config, draft_model=draft_model)


input_data = pipe.get_tokenizer().encode(prompt)
prompt_token_size = input_data.input_ids.get_shape()[1]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,18 @@ class OPENVINO_GENAI_EXPORTS ContinuousBatchingPipeline {
class ContinuousBatchingImpl;

class ContinuousBatchingForSpeculativeDecodingImpl;
class ContinuousBatchingForEagle3DecodingImpl;
class ContinuousBatchingForPromptLookupImpl;
class SpeculativeDecodingImpl;
class Eagle3DecodingImpl;
class PromptLookupImpl;

friend class ContinuousBatchingForSpeculativeDecodingImpl;

friend class ContinuousBatchingForPromptLookupImpl;
friend class ContinuousBatchingForEagle3DecodingImpl;
friend class SpeculativeDecodingImpl;
friend class Eagle3DecodingImpl;
friend class PromptLookupImpl;

std::shared_ptr<IContinuousBatchingPipeline> m_impl;
Expand Down
Loading