Skip to content

Commit ee0c899

Browse files
authored
Merge pull request #80 from woonyee28/mlperf-inference-results-scc24
Results on system scc7 NTUHPC
2 parents c15219b + 43e5df2 commit ee0c899

File tree

28 files changed

+13611
-0
lines changed

28 files changed

+13611
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TBD
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
| Model | Scenario | Accuracy | Throughput | Latency (in ms) |
2+
|---------------------|------------|-----------------------|--------------|-------------------|
3+
| stable-diffusion-xl | offline | (16.50375, 232.23582) | 4.188 | - |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"starting_weights_filename": "https://github.com/mlcommons/cm4mlops/blob/main/script/get-ml-model-stable-diffusion/_cm.json#L174",
3+
"retraining": "no",
4+
"input_data_types": "int32",
5+
"weight_data_types": "int8",
6+
"weight_transformations": "quantization, affine fusion"
7+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).
2+
3+
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*
4+
5+
## Host platform
6+
7+
* OS version: Linux-6.5.0-27-generic-x86_64-with-glibc2.29
8+
* CPU version: x86_64
9+
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53)
10+
[GCC 9.4.0]
11+
* MLCommons CM version: 3.4.1
12+
13+
## CM Run Command
14+
15+
See [CM installation guide](https://docs.mlcommons.org/inference/install/).
16+
17+
```bash
18+
pip install -U cmind
19+
20+
cm rm cache -f
21+
22+
cm pull repo mlcommons@cm4mlops --checkout=636343e1980e79ff6f3820e66b6b2f08add3ce46
23+
24+
cm run script \
25+
--tags=run-mlperf,inference,_r4.1-dev,_short,_scc24-main \
26+
--model=sdxl \
27+
--implementation=nvidia \
28+
--framework=tensorrt \
29+
--category=datacenter \
30+
--scenario=Offline \
31+
--execution_mode=test \
32+
--device=cuda \
33+
--quiet \
34+
--target_qps=4.9 \
35+
--offline_target_qps=4.9 \
36+
--batch_size=8 \
37+
--test_query_count=500 \
38+
--clean
39+
```
40+
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
41+
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:*
42+
43+
```bash
44+
cm rm repo mlcommons@cm4mlops
45+
cm pull repo mlcommons@cm4mlops
46+
cm rm cache -f
47+
48+
```
49+
50+
## Results
51+
52+
Platform: Coffeepot-nvidia_original-gpu-tensorrt-vdefault-scc24-main
53+
54+
Model Precision: int8
55+
56+
### Accuracy Results
57+
`CLIP_SCORE`: `16.50375`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
58+
`FID_SCORE`: `232.23582`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`
59+
60+
### Performance Results
61+
`Samples per second`: `4.18764`
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
[2024-11-18 09:47:54,310 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
2+
[2024-11-18 09:47:54,311 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC1. Skipping.
3+
[2024-11-18 09:47:54,311 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC2. Skipping.
4+
[2024-11-18 09:47:54,422 main.py:229 INFO] Detected system ID: KnownSystem.newFourH100
5+
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
6+
warnings.warn(_BETA_TRANSFORMS_WARNING)
7+
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
8+
warnings.warn(_BETA_TRANSFORMS_WARNING)
9+
[2024-11-18 09:47:56,249 generate_conf_files.py:107 INFO] Generated measurements/ entries for newFourH100_TRT/stable-diffusion-xl/Offline
10+
[2024-11-18 09:47:56,250 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/cmuser/CM/repos/local/cache/6712b485075c4fe8/test_results/1b41d1041a1b-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/cmuser/CM/repos/local/cache/02144893f8ce40a0/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=false --user_conf_path="/home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/6216a4579b7f4250b03d916b96039c13.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
11+
[2024-11-18 09:47:56,250 __init__.py:53 INFO] Overriding Environment
12+
[2024-11-18 09:47:58,985 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
13+
[2024-11-18 09:47:58,985 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC1. Skipping.
14+
[2024-11-18 09:47:58,985 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC2. Skipping.
15+
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
16+
warnings.warn(_BETA_TRANSFORMS_WARNING)
17+
/home/cmuser/.local/lib/python3.8/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
18+
warnings.warn(_BETA_TRANSFORMS_WARNING)
19+
[2024-11-18 09:48:00,997 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
20+
[2024-11-18 09:48:01,144 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
21+
[2024-11-18 09:48:01,856 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan.
22+
[2024-11-18 09:48:03,677 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.
23+
[2024-11-18 09:48:06,073 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
24+
[2024-11-18 09:48:06,206 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
25+
[2024-11-18 09:48:06,915 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan.
26+
[2024-11-18 09:48:08,694 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.
27+
[2024-11-18 09:48:11,036 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
28+
[2024-11-18 09:48:11,172 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
29+
[2024-11-18 09:48:11,884 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan.
30+
[2024-11-18 09:48:13,622 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.
31+
[2024-11-18 09:48:15,958 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
32+
[2024-11-18 09:48:16,091 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
33+
[2024-11-18 09:48:16,803 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan.
34+
[2024-11-18 09:48:18,551 backend.py:71 INFO] Loading TensorRT engine: ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan.
35+
[2024-11-18 09:48:20,157 harness.py:207 INFO] Start Warm Up!
36+
[2024-11-18 09:49:19,420 harness.py:209 INFO] Warm Up Done!
37+
[2024-11-18 09:49:19,420 harness.py:211 INFO] Start Test!
38+
[2024-11-18 09:49:34,322 backend.py:801 INFO] [Server] Received 50 total samples
39+
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 0] Reported 8 samples
40+
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 1] Reported 16 samples
41+
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 2] Reported 16 samples
42+
[2024-11-18 09:49:34,322 backend.py:809 INFO] [Device 3] Reported 10 samples
43+
[2024-11-18 09:49:34,322 harness.py:214 INFO] Test Done!
44+
[2024-11-18 09:49:34,322 harness.py:216 INFO] Destroying SUT...
45+
[2024-11-18 09:49:34,322 harness.py:219 INFO] Destroying QSL...
46+
benchmark : Benchmark.SDXL
47+
buffer_manager_thread_count : 0
48+
data_dir : /home/cmuser/CM/repos/local/cache/5a62a909e14a4c17/data
49+
gpu_batch_size : 8
50+
gpu_copy_streams : 1
51+
gpu_inference_streams : 1
52+
input_dtype : int32
53+
input_format : linear
54+
log_dir : /home/cmuser/CM/repos/local/cache/bcfcc3f269b147a7/repo/closed/NVIDIA/build/logs/2024.11.18-09.47.48
55+
mlperf_conf_path : /home/cmuser/CM/repos/local/cache/02144893f8ce40a0/inference/mlperf.conf
56+
model_path : /home/cmuser/CM/repos/local/cache/5a62a909e14a4c17/models/SDXL/
57+
offline_expected_qps : 4.0
58+
precision : int8
59+
preprocessed_data_dir : /home/cmuser/CM/repos/local/cache/5a62a909e14a4c17/preprocessed_data
60+
scenario : Scenario.Offline
61+
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9654 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=1): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=1.584936672, byte_suffix=<ByteSuffix.TB: (1000, 4)>, _num_bytes=1584936672000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 PCIe', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=350.0, pci_id='0x233110DE', compute_sm=90): 4})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='newFourH100')
62+
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
63+
test_mode : AccuracyOnly
64+
use_graphs : False
65+
user_conf_path : /home/cmuser/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/6216a4579b7f4250b03d916b96039c13.conf
66+
system_id : newFourH100
67+
config_name : newFourH100_stable-diffusion-xl_Offline
68+
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
69+
optimization_level : plugin-enabled
70+
num_profiles : 1
71+
config_ver : custom_k_99_MaxP
72+
accuracy_level : 99%
73+
inference_server : custom
74+
skip_file_checks : False
75+
power_limit : None
76+
cpu_freq : None
77+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
78+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
79+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan
80+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan
81+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
82+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
83+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan
84+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan
85+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
86+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
87+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan
88+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan
89+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
90+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
91+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan
92+
[I] Loading bytes from ./build/engines/newFourH100/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan
93+
[2024-11-18 09:49:37,158 run_harness.py:166 INFO] Result: Accuracy run detected.
94+
95+
======================== Result summaries: ========================
96+

0 commit comments

Comments
 (0)