Skip to content

Conversation

@vpetrovicTT
Copy link
Collaborator

Add Flux (FLUX.1-dev, FLUX.1-schnell) and Motif (Motif-Image-6B-Preview) model readiness for Blackhole QuietBox/GE devices
Follow proper BH device types (P150X4, P150X8, P300X2)...

tstescoTT and others added 30 commits December 16, 2025 20:42
# Release v0.5.0

Co-authored-by: Djordje Madic <[email protected]>
Co-authored-by: Zeljana Torlak <[email protected]>
Co-authored-by: Filip Ivanovic <[email protected]>
Co-authored-by: Lana Jovanovic <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
Co-authored-by: Stephen Osborne <[email protected]>
Co-authored-by: Adam Roberge <[email protected]>
Co-authored-by: Nidhin Jose <[email protected]>
Co-authored-by: Marko Jeremic <[email protected]>
Co-authored-by: Benjamin Goel <[email protected]>
Co-authored-by: Samuel Adesoye <[email protected]>
Co-authored-by: Rico Zhu <[email protected]>
Co-authored-by: Aleksandar Cvejic <[email protected]>
Co-authored-by: Aniruddha Tupe <[email protected]>
Co-authored-by: Sam Tisi <[email protected]>
Co-authored-by: Pavle Popovic <[email protected]>
Co-authored-by: Veljko Maksimovic <[email protected]>
# Conflicts:
#	README.md
#	model_specs_output.json
…tal table on separate page (#1775)

* update landing page README.md and Model Support generation script, move experimental models to separate page

* address #1520 P150x8 not showing up in model support table

* truncate docker image tag str in display table to avoid column width being too large

* ruff format
…ver (#1757)

* Remove USER instruction from build stage for vllm cloud

* Fix tt-media-server Dockerfile

* Add PYTHON_ENV_DIR to PATH

* hardcode path

* try new fix

* revert

* try same fix on media

* try with root user

* try with same fix on vllm cloud

* Initial

* Use pip to install uv

* Run builder stage as root

* add chmod for venv

* Copy uv build results to runtime

* Optimize two into one layer

* try new fix

* polish

* revert

---------

Co-authored-by: Djordje Madic <[email protected]>
* Implement aiperf benchmarking

* Fix report generation

* Add documentation for aiperf

* For aiperf, generate one report with AIPerf Detailed Latency Percentiles and generate another with Throughput Comparison (Stacking on top of current vLLM benchmark we have)

* Add proper warm-up for AIPerf, and fix the combined table so that it only includes Text benchmarks

* Enable image benchmarking using AIPerf

* Fix AIPerf image benchmark parsing to correctly extract image parameters and display targets

* Now searches for all 3 benchmark types

* Unified tables first, detailed tables second

* Clean up benchmark report generation

* Enable limit-samples-mode for aiperf and unify output directory for all 3 benchmarks

* Add --device and --model argumentsto ensure CLI consistency

* Add documentation for changes in benchmarking

* Fix linting error

* Fix indentation error

* Run ruff format to automaticallly fix the formatting

* Rename run_benchmarks_aiperf.py -> run_aiperf_benchmarks.py

* Fix missing import: use benchmark_generate_report_helper alias

* Change report generation to separate tables per tool (text first, then image)

* Run ruff format to fix formatting

* Change genai-perf -> genai for final report

* Add support for text + image benchmarking for GenAI-Perf

* Remove duplicate aiperf runner file after rebase

* Restore CNN benchmark support and logging in summary_report.py

- Add back create_cnn_display_dict() function for CNN display formatting
- Restore CNN results processing in generate_report()
- Re-add logger import and informational log statements for debugging
- Ensures all task types (text, image, audio, embedding, CNN) are fully supported

* Restore audio/embedding/CNN support and galaxy_t3k pattern to match dev branch

- Restore audio/embedding/CNN benchmark support in run_reports.py
  * Add back imports for create_audio_display_dict, create_embedding_display_dict, create_cnn_display_dict
  * Add back audio_sections, embedding_sections, cnn_sections lists
  * Restore processing for all task types in vLLM, AIPerf, and GenAI-Perf sections
  * Update section combining to include all task types

- Restore galaxy_t3k device pattern in summary_report.py
  * Add galaxy_t3k back to image_pattern and text_pattern device regex
  * This was inadvertently removed in commit 2654355e during rebase

These changes ensure non-VLM functionality matches dev branch exactly while preserving GenAI-Perf VLM additions.

* Use create_image_generation_display_dict for CNN to match dev branch

- Restore create_image_generation_display_dict function in summary_report.py
- Update run_reports.py to use create_image_generation_display_dict for CNN display
- Update section combining comment to match dev branch wording
- Ensures CNN display format matches dev branch exactly

* Add 20 second wait for the server to start

---------

Co-authored-by: Djordje Madic <[email protected]>
…#1773)

* try quick fix

* use uv pip

* change for media

* forge fix

* changes

* fix: use uv pip and run installs as root for permission fixes

- Switch from pip to uv pip to match tt-metal commit 29d59d1
- Run all pip installs as root to avoid permission denied errors
- Add --index-strategy unsafe-best-match for vllm to find cmake>=3.26.1
- Fix ownership after installs before switching to non-root user

* fix: recreate venv symlinks in runtime stage to fix broken Python symlinks

* fix: add venv symlink fix to media-server runtime stage

* fix: runtime issues - venv pip bootstrap and Media symlinks

- workflows/run_workflows.py: Add --upgrade-deps and --clear flags to
  ensure pip is properly installed in workflow venvs, fixing
  FileNotFoundError for pip
- tt-media-server/Dockerfile: Improve venv symlink fix in runtime stage
  by also removing pip symlinks and updating activate script VIRTUAL_ENV path

* fix: use ensurepip to bootstrap pip in workflow venvs

On externally-managed Python (PEP 668), venv may not include pip by
default. Use python -m ensurepip --upgrade to ensure pip is available
before installing uv.

* test milos's changes

* revert media changes

* remove unnecessary comment

* temp change: try without sym links

* temp change: try without local share uv

* revert

* tmp change without symlinks and local share uv

* revert

* Revert forge and delete unnecessary instructions in vllm

* ruff format

---------

Co-authored-by: Aleksandar Cvejic <[email protected]>
* Add GenAI-Perf detailed percentiles section to benchmark reports

- Created genai_perf_benchmark_generate_report() function parallel to AIPerf
- Generates detailed percentile tables (mean, P50, P99) for TTFT, TPOT, E2EL
- Supports both text and image benchmarks
- Reuses existing aiperf_release_markdown() for consistent formatting
- Integrated into main report generation workflow

* Fix GenAI-Perf detailed percentiles extraction

- Changed from using benchmark_generate_report_helper() to direct JSON processing
- Now extracts ISL, OSL, Concurrency from filename (like AIPerf does)
- Properly extracts percentile data (median, p99) from JSON
- Separates text and image benchmarks by filename pattern
- Fixes missing data in GenAI-Perf detailed percentiles tables

* Generate detailed percentile reports for GenAI-Perf benchmarks

* Add image dimension columns to detailed percentile tables for image benchmarks

* Fix images_per_prompt field name to match standard convention

* Fix sort key to use images_per_prompt instead of images

* Run ruff format on run_reports.py
* try quick fix

* use uv pip

* change for media

* forge fix

* changes

* fix: use uv pip and run installs as root for permission fixes

- Switch from pip to uv pip to match tt-metal commit 29d59d1
- Run all pip installs as root to avoid permission denied errors
- Add --index-strategy unsafe-best-match for vllm to find cmake>=3.26.1
- Fix ownership after installs before switching to non-root user

* fix: recreate venv symlinks in runtime stage to fix broken Python symlinks

* fix: add venv symlink fix to media-server runtime stage

* fix: runtime issues - venv pip bootstrap and Media symlinks

- workflows/run_workflows.py: Add --upgrade-deps and --clear flags to
  ensure pip is properly installed in workflow venvs, fixing
  FileNotFoundError for pip
- tt-media-server/Dockerfile: Improve venv symlink fix in runtime stage
  by also removing pip symlinks and updating activate script VIRTUAL_ENV path

* fix: use ensurepip to bootstrap pip in workflow venvs

On externally-managed Python (PEP 668), venv may not include pip by
default. Use python -m ensurepip --upgrade to ensure pip is available
before installing uv.

* test milos's changes

* revert media changes

* remove unnecessary comment

* temp change: try without sym links

* temp change: try without local share uv

* revert

* tmp change without symlinks and local share uv

* revert

* Revert forge and delete unnecessary instructions in vllm

* ruff format

* resolve merge conflict

* local uv share

* final change

---------

Co-authored-by: Aleksandar Cvejic <[email protected]>
* Check tt liveness instead of waiting 5 seconds

* Keep server logs for 1 day

* Make test fail

* Fix summary when test fails

* Remove bottleneck on purpose

* Rename const
…1729)

* refactor: use ModelSpec JSON for model registration instead of env vars

* load ModelSpec JSON once at import time and use impl_id for model registration

---------

Co-authored-by: Benjamin Goel <[email protected]>
* add qwen image

* format

* add qwen-image-2512

* ruff format

* ruff format modelspec

* use self.settings

* cleanup flux model spec
…ls and benchmarks (#1797)

* Add DeepSeek-R1-0528 model and eval config (64k)

* add default commits from pprajapati/vllm_tracing

* adding dual and quad WH Galaxy device types in inference-server for DeepSeek-R1-0528

* fix non-DeepSeek-R1-0528 unintended changes

* adding deepseek_r1_galaxy_impl

* register TTDeepseekV3ForCausalLM

* ruff format

---------

Co-authored-by: Mark O'Connor <[email protected]>
…ing name, ID, log file path, and service port. (#1801)
* Add model readiness check before job creation

Check if the model is ready before creating a video job.

* Check if model is ready before submitting job

Add model readiness check before job submission.

* Fix indentation for HTTPException raise

* Remove model readiness check from fine tuning

* Remove model readiness check in video job submission

Removed model readiness check before job creation.

* Add model readiness check before job creation

Check if the model is ready before creating a job.

* Check if model is ready before listing jobs

* Remove redundant model readiness checks
* Removed uv install since it is part of base metal image

* cleanup
* Use vllm bench serve for vLLM http client

* Remove TODO about truncate_prompt_tokens

* Consolidate older vLLM HTTP and vLLM embeddings venvs

* Add BENCHMARKS_VLLM venv type
* STD out logs

* Refactor build_docker_images to have a separate function for listing sha combinations

* Apply suggestion from @bgoelTT

Co-authored-by: Benjamin Goel <[email protected]>

---------

Co-authored-by: Benjamin Goel <[email protected]>
* feat: add video client

* feat: enable video inference running for eval/benchmark

* feat: add model spec and perf

* feat: update benchmark flow for video generation

* test: add unit test for video_client

* feat: update test_media_client_factory

* fix: update test

* test: update test
@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

✅ Test Coverage Report

Coverage of Changed Lines

Metric Value
Coverage 100%
Threshold 50%
Status ✅ PASSED

💡 This checks coverage of newly added/modified lines only, not total codebase coverage.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

✅ Test Results - PASSED

Summary

Component Total Passed Skipped Failed Status
tt-inference-server 392 392 0 0
tt-media-server 467 467 0 0
Overall 859 859 0 0

Details

  • Python Version: 3.10
  • Workflow: Test Gate
  • Commit: 3e942fa
  • Run ID: 21958214792

🎉 All tests passed! This PR is ready for review.

@vpetrovicTT vpetrovicTT changed the base branch from main to dev February 12, 2026 17:58
@vpetrovicTT vpetrovicTT linked an issue Feb 12, 2026 that may be closed by this pull request
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Model Readiness Support]: Flux on BH QB GE