Skip to content

Mlperf inference results scc24 pku 2 #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions open/pku/code/stable-diffusion-xl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
| Model | Scenario | Accuracy | Throughput | Latency (in ms) |
|---------------------|------------|----------------------|--------------|-------------------|
| stable-diffusion-xl | offline | (14.02827, 84.33062) | 8.281 | - |
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).

*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*

## Host platform

* OS version: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.29
* CPU version: x86_64
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53)
[GCC 9.4.0]
* MLCommons CM version: 3.4.1

## CM Run Command

See [CM installation guide](https://docs.mlcommons.org/inference/install/).

```bash
pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@cm4mlops --checkout=852b297c18a90edb8a9c975dd7ee7cf731e1e347

cm run script \
--tags=run-mlperf,inference,_r4.1-dev,_scc24-main \
--model=sdxl \
--implementation=nvidia \
--max_query_count=5000 \
--min_query_count=504 \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--max_batchsize=8 \
--quiet \
--rerun
```
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:*

```bash
cm rm repo mlcommons@cm4mlops
cm pull repo mlcommons@cm4mlops
cm rm cache -f

```

## Results

Platform: mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main

Model Precision: int8

### Accuracy Results
`CLIP_SCORE`: `14.02827`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
`FID_SCORE`: `84.33062`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`

### Performance Results
`Samples per second`: `8.2807`
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
[2024-11-18 22:46:22,137 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
[2024-11-18 22:46:22,217 main.py:229 INFO] Detected system ID: KnownSystem.sc1
[2024-11-18 22:46:24,991 generate_conf_files.py:107 INFO] Generated measurements/ entries for sc1_TRT/stable-diffusion-xl/Offline
[2024-11-18 22:46:24,991 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/lry/CM/repos/local/cache/6c0ba4746fa74e77/test_results/mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/0fe769204cb64955852be59f43b33ad5.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
[2024-11-18 22:46:24,991 __init__.py:53 INFO] Overriding Environment
[2024-11-18 22:46:27,765 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
2024-11-18 22:46:30,481 INFO worker.py:1567 -- Connecting to existing Ray cluster at address: 10.0.0.1:6379...
2024-11-18 22:46:30,489 INFO worker.py:1743 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 
[2024-11-18 22:46:30,733 harness.py:207 INFO] Start Warm Up!
(SDXLCore pid=220850) [2024-11-18 22:46:34,300 backend.py:428 INFO] initialized
(SDXLCore pid=220850) [2024-11-18 22:46:34,402 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
(SDXLCore pid=220850) [2024-11-18 22:46:34,654 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
(SDXLCore pid=220850) [2024-11-18 22:46:35,018 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan.
(SDXLCore pid=220850) [2024-11-18 22:46:39,168 backend.py:97 INFO] Enabling cuda graphs for unet
(SDXLCore pid=220850) [2024-11-18 22:46:39,604 backend.py:155 INFO] captured graph for BS=1
(SDXLCore pid=18778, ip=10.0.0.3) [2024-11-18 22:46:32,459 backend.py:428 INFO] initialized [repeated 7x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(SDXLCore pid=220848) [2024-11-18 22:46:37,641 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. [repeated 29x across cluster]
(SDXLCore pid=220850) [2024-11-18 22:46:40,416 backend.py:155 INFO] captured graph for BS=2
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:46:40,449 backend.py:97 INFO] Enabling cuda graphs for unet [repeated 8x across cluster]
(SDXLCore pid=220852) [2024-11-18 22:46:44,655 backend.py:155 INFO] captured graph for BS=6 [repeated 45x across cluster]
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:46:36,615 backend.py:428 INFO] initialized
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:46:38,113 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. [repeated 4x across cluster]
[2024-11-18 22:47:06,644 harness.py:209 INFO] Warm Up Done!
[2024-11-18 22:47:06,644 harness.py:211 INFO] Start Test!
[2024-11-18 22:47:06,794 backend.py:852 INFO] 500
(SDXLCore pid=18774, ip=10.0.0.3) [2024-11-18 22:47:03,586 backend.py:630 INFO] generate_images
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:46:45,509 backend.py:155 INFO] captured graph for BS=8 [repeated 25x across cluster]
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:47:11,266 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:47:18,975 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:47:26,675 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=220848) [2024-11-18 22:47:35,394 backend.py:630 INFO] generate_images [repeated 4x across cluster]
(SDXLCore pid=220848) [2024-11-18 22:47:45,024 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=18776, ip=10.0.0.3) [2024-11-18 22:47:49,957 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=220848) [2024-11-18 22:48:04,367 backend.py:630 INFO] generate_images [repeated 9x across cluster]
[2024-11-18 22:48:16,996 backend.py:901 INFO] [Server] Received 500 total samples
[2024-11-18 22:48:16,999 backend.py:911 INFO] [Device 0] Reported 56 samples
[2024-11-18 22:48:17,001 backend.py:911 INFO] [Device 1] Reported 56 samples
[2024-11-18 22:48:17,002 backend.py:911 INFO] [Device 2] Reported 56 samples
[2024-11-18 22:48:17,004 backend.py:911 INFO] [Device 3] Reported 56 samples
[2024-11-18 22:48:17,006 backend.py:911 INFO] [Device 4] Reported 56 samples
[2024-11-18 22:48:17,008 backend.py:911 INFO] [Device 5] Reported 55 samples
[2024-11-18 22:48:17,009 backend.py:911 INFO] [Device 6] Reported 55 samples
[2024-11-18 22:48:17,011 backend.py:911 INFO] [Device 7] Reported 55 samples
[2024-11-18 22:48:17,013 backend.py:911 INFO] [Device 8] Reported 55 samples
[2024-11-18 22:48:17,013 harness.py:214 INFO] Test Done!
[2024-11-18 22:48:17,013 harness.py:216 INFO] Destroying SUT...
[2024-11-18 22:48:17,013 harness.py:219 INFO] Destroying QSL...
(SDXLCore pid=220847) [2024-11-18 22:48:06,100 backend.py:630 INFO] generate_images [repeated 4x across cluster]
benchmark : Benchmark.SDXL
buffer_manager_thread_count : 0
data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/data
gpu_batch_size : 8
gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int32
input_format : linear
log_dir : /home/lry/CM/repos/local/cache/3443882dd9374096/repo/closed/NVIDIA/build/logs/2024.11.18-22.46.18
mlperf_conf_path : /home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf
model_path : /home/lry/CM/repos/local/cache/d2b9079c1073417b/models/SDXL/
offline_expected_qps : 0.0
precision : int8
preprocessed_data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9684X 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=791.59486, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=791594860000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 80GB HBM3', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=700.0, pci_id='0x233010DE', compute_sm=90): 5})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='sc1')
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
test_mode : AccuracyOnly
use_graphs : True
user_conf_path : /home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/0fe769204cb64955852be59f43b33ad5.conf
system_id : sc1
config_name : sc1_stable-diffusion-xl_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : custom_k_99_MaxP
accuracy_level : 99%
inference_server : custom
skip_file_checks : False
power_limit : None
cpu_freq : None
(SDXLCore pid=220850) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
(SDXLCore pid=220850) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
(SDXLCore pid=220850) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan
(SDXLCore pid=18776, ip=10.0.0.3) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan [repeated 30x across cluster]
(SDXLCore pid=18776, ip=10.0.0.3) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan [repeated 3x across cluster]
[2024-11-18 22:48:18,480 run_harness.py:166 INFO] Result: Accuracy run detected.

======================== Result summaries: ========================

Loading
Loading