-
Notifications
You must be signed in to change notification settings - Fork 12
Ensure Flux readiness on blackhole quietbox devices #2104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
vpetrovicTT
wants to merge
167
commits into
dev
Choose a base branch
from
vpetrovic/feature/747-flux-bh-qbge-readiness-v2
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Ensure Flux readiness on blackhole quietbox devices #2104
vpetrovicTT
wants to merge
167
commits into
dev
from
vpetrovic/feature/747-flux-bh-qbge-readiness-v2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Release v0.5.0 Co-authored-by: Djordje Madic <[email protected]> Co-authored-by: Zeljana Torlak <[email protected]> Co-authored-by: Filip Ivanovic <[email protected]> Co-authored-by: Lana Jovanovic <[email protected]> Co-authored-by: Igor Djuric <[email protected]> Co-authored-by: Stephen Osborne <[email protected]> Co-authored-by: Adam Roberge <[email protected]> Co-authored-by: Nidhin Jose <[email protected]> Co-authored-by: Marko Jeremic <[email protected]> Co-authored-by: Benjamin Goel <[email protected]> Co-authored-by: Samuel Adesoye <[email protected]> Co-authored-by: Rico Zhu <[email protected]> Co-authored-by: Aleksandar Cvejic <[email protected]> Co-authored-by: Aniruddha Tupe <[email protected]> Co-authored-by: Sam Tisi <[email protected]> Co-authored-by: Pavle Popovic <[email protected]> Co-authored-by: Veljko Maksimovic <[email protected]>
# Conflicts: # README.md # model_specs_output.json
Release v0.7.0
…tal table on separate page (#1775) * update landing page README.md and Model Support generation script, move experimental models to separate page * address #1520 P150x8 not showing up in model support table * truncate docker image tag str in display table to avoid column width being too large * ruff format
Release v0.8.0
…ver (#1757) * Remove USER instruction from build stage for vllm cloud * Fix tt-media-server Dockerfile * Add PYTHON_ENV_DIR to PATH * hardcode path * try new fix * revert * try same fix on media * try with root user * try with same fix on vllm cloud * Initial * Use pip to install uv * Run builder stage as root * add chmod for venv * Copy uv build results to runtime * Optimize two into one layer * try new fix * polish * revert --------- Co-authored-by: Djordje Madic <[email protected]>
* Implement aiperf benchmarking * Fix report generation * Add documentation for aiperf * For aiperf, generate one report with AIPerf Detailed Latency Percentiles and generate another with Throughput Comparison (Stacking on top of current vLLM benchmark we have) * Add proper warm-up for AIPerf, and fix the combined table so that it only includes Text benchmarks * Enable image benchmarking using AIPerf * Fix AIPerf image benchmark parsing to correctly extract image parameters and display targets * Now searches for all 3 benchmark types * Unified tables first, detailed tables second * Clean up benchmark report generation * Enable limit-samples-mode for aiperf and unify output directory for all 3 benchmarks * Add --device and --model argumentsto ensure CLI consistency * Add documentation for changes in benchmarking * Fix linting error * Fix indentation error * Run ruff format to automaticallly fix the formatting * Rename run_benchmarks_aiperf.py -> run_aiperf_benchmarks.py * Fix missing import: use benchmark_generate_report_helper alias * Change report generation to separate tables per tool (text first, then image) * Run ruff format to fix formatting * Change genai-perf -> genai for final report * Add support for text + image benchmarking for GenAI-Perf * Remove duplicate aiperf runner file after rebase * Restore CNN benchmark support and logging in summary_report.py - Add back create_cnn_display_dict() function for CNN display formatting - Restore CNN results processing in generate_report() - Re-add logger import and informational log statements for debugging - Ensures all task types (text, image, audio, embedding, CNN) are fully supported * Restore audio/embedding/CNN support and galaxy_t3k pattern to match dev branch - Restore audio/embedding/CNN benchmark support in run_reports.py * Add back imports for create_audio_display_dict, create_embedding_display_dict, create_cnn_display_dict * Add back audio_sections, embedding_sections, cnn_sections lists * Restore processing for all task types in vLLM, AIPerf, and GenAI-Perf sections * Update section combining to include all task types - Restore galaxy_t3k device pattern in summary_report.py * Add galaxy_t3k back to image_pattern and text_pattern device regex * This was inadvertently removed in commit 2654355e during rebase These changes ensure non-VLM functionality matches dev branch exactly while preserving GenAI-Perf VLM additions. * Use create_image_generation_display_dict for CNN to match dev branch - Restore create_image_generation_display_dict function in summary_report.py - Update run_reports.py to use create_image_generation_display_dict for CNN display - Update section combining comment to match dev branch wording - Ensures CNN display format matches dev branch exactly * Add 20 second wait for the server to start --------- Co-authored-by: Djordje Madic <[email protected]>
…#1773) * try quick fix * use uv pip * change for media * forge fix * changes * fix: use uv pip and run installs as root for permission fixes - Switch from pip to uv pip to match tt-metal commit 29d59d1 - Run all pip installs as root to avoid permission denied errors - Add --index-strategy unsafe-best-match for vllm to find cmake>=3.26.1 - Fix ownership after installs before switching to non-root user * fix: recreate venv symlinks in runtime stage to fix broken Python symlinks * fix: add venv symlink fix to media-server runtime stage * fix: runtime issues - venv pip bootstrap and Media symlinks - workflows/run_workflows.py: Add --upgrade-deps and --clear flags to ensure pip is properly installed in workflow venvs, fixing FileNotFoundError for pip - tt-media-server/Dockerfile: Improve venv symlink fix in runtime stage by also removing pip symlinks and updating activate script VIRTUAL_ENV path * fix: use ensurepip to bootstrap pip in workflow venvs On externally-managed Python (PEP 668), venv may not include pip by default. Use python -m ensurepip --upgrade to ensure pip is available before installing uv. * test milos's changes * revert media changes * remove unnecessary comment * temp change: try without sym links * temp change: try without local share uv * revert * tmp change without symlinks and local share uv * revert * Revert forge and delete unnecessary instructions in vllm * ruff format --------- Co-authored-by: Aleksandar Cvejic <[email protected]>
* Add GenAI-Perf detailed percentiles section to benchmark reports - Created genai_perf_benchmark_generate_report() function parallel to AIPerf - Generates detailed percentile tables (mean, P50, P99) for TTFT, TPOT, E2EL - Supports both text and image benchmarks - Reuses existing aiperf_release_markdown() for consistent formatting - Integrated into main report generation workflow * Fix GenAI-Perf detailed percentiles extraction - Changed from using benchmark_generate_report_helper() to direct JSON processing - Now extracts ISL, OSL, Concurrency from filename (like AIPerf does) - Properly extracts percentile data (median, p99) from JSON - Separates text and image benchmarks by filename pattern - Fixes missing data in GenAI-Perf detailed percentiles tables * Generate detailed percentile reports for GenAI-Perf benchmarks * Add image dimension columns to detailed percentile tables for image benchmarks * Fix images_per_prompt field name to match standard convention * Fix sort key to use images_per_prompt instead of images * Run ruff format on run_reports.py
* try quick fix * use uv pip * change for media * forge fix * changes * fix: use uv pip and run installs as root for permission fixes - Switch from pip to uv pip to match tt-metal commit 29d59d1 - Run all pip installs as root to avoid permission denied errors - Add --index-strategy unsafe-best-match for vllm to find cmake>=3.26.1 - Fix ownership after installs before switching to non-root user * fix: recreate venv symlinks in runtime stage to fix broken Python symlinks * fix: add venv symlink fix to media-server runtime stage * fix: runtime issues - venv pip bootstrap and Media symlinks - workflows/run_workflows.py: Add --upgrade-deps and --clear flags to ensure pip is properly installed in workflow venvs, fixing FileNotFoundError for pip - tt-media-server/Dockerfile: Improve venv symlink fix in runtime stage by also removing pip symlinks and updating activate script VIRTUAL_ENV path * fix: use ensurepip to bootstrap pip in workflow venvs On externally-managed Python (PEP 668), venv may not include pip by default. Use python -m ensurepip --upgrade to ensure pip is available before installing uv. * test milos's changes * revert media changes * remove unnecessary comment * temp change: try without sym links * temp change: try without local share uv * revert * tmp change without symlinks and local share uv * revert * Revert forge and delete unnecessary instructions in vllm * ruff format * resolve merge conflict * local uv share * final change --------- Co-authored-by: Aleksandar Cvejic <[email protected]>
* Check tt liveness instead of waiting 5 seconds * Keep server logs for 1 day * Make test fail * Fix summary when test fails * Remove bottleneck on purpose * Rename const
…_data = data.get("benchmarks: ", data) (#1790)
…1729) * refactor: use ModelSpec JSON for model registration instead of env vars * load ModelSpec JSON once at import time and use impl_id for model registration --------- Co-authored-by: Benjamin Goel <[email protected]>
* add qwen image * format * add qwen-image-2512 * ruff format * ruff format modelspec * use self.settings * cleanup flux model spec
…ls and benchmarks (#1797) * Add DeepSeek-R1-0528 model and eval config (64k) * add default commits from pprajapati/vllm_tracing * adding dual and quad WH Galaxy device types in inference-server for DeepSeek-R1-0528 * fix non-DeepSeek-R1-0528 unintended changes * adding deepseek_r1_galaxy_impl * register TTDeepseekV3ForCausalLM * ruff format --------- Co-authored-by: Mark O'Connor <[email protected]>
…ing name, ID, log file path, and service port. (#1801)
* Add model readiness check before job creation Check if the model is ready before creating a video job. * Check if model is ready before submitting job Add model readiness check before job submission. * Fix indentation for HTTPException raise * Remove model readiness check from fine tuning * Remove model readiness check in video job submission Removed model readiness check before job creation. * Add model readiness check before job creation Check if the model is ready before creating a job. * Check if model is ready before listing jobs * Remove redundant model readiness checks
* Removed uv install since it is part of base metal image * cleanup
* Use vllm bench serve for vLLM http client * Remove TODO about truncate_prompt_tokens * Consolidate older vLLM HTTP and vLLM embeddings venvs * Add BENCHMARKS_VLLM venv type
* STD out logs * Refactor build_docker_images to have a separate function for listing sha combinations * Apply suggestion from @bgoelTT Co-authored-by: Benjamin Goel <[email protected]> --------- Co-authored-by: Benjamin Goel <[email protected]>
* feat: add video client * feat: enable video inference running for eval/benchmark * feat: add model spec and perf * feat: update benchmark flow for video generation * test: add unit test for video_client * feat: update test_media_client_factory * fix: update test * test: update test
Contributor
✅ Test Coverage ReportCoverage of Changed Lines
|
Contributor
✅ Test Results - PASSEDSummary
Details
🎉 All tests passed! This PR is ready for review. |
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Flux (FLUX.1-dev, FLUX.1-schnell) and Motif (Motif-Image-6B-Preview) model readiness for Blackhole QuietBox/GE devices
Follow proper BH device types (P150X4, P150X8, P300X2)...