Skip to content

benchmark on audio_transcriptions fails #522

@tukwila

Description

@tukwila

Describe the bug

Precondition:
This script can generate audio test dataset.

import numpy as np
import pandas as pd
import wave
import struct

def generate_and_save_wav_with_metadata():
    """generate WAV format file and save it into CSV file"""
    
    # audio parameters
    sample_rate = 44100
    duration = 3.0
    frequency = 523.25 
    
    # generate audio data
    t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
    audio_data = 0.5 * np.sin(2 * np.pi * frequency * t)
    
    # transfer to 16 bit PCM
    audio_int16 = np.int16(audio_data * 32767)
    
    # save into one WAV file
    with wave.open('test_audio.wav', 'w') as wav_file:
        wav_file.setnchannels(1)  # single channle
        wav_file.setsampwidth(2)   # 2 bytes = 16 bits
        wav_file.setframerate(sample_rate)
        wav_file.writeframes(audio_int16.tobytes())
    
    # create metadata for CSV
    metadata = pd.DataFrame([{
        'filename': 'test_audio.wav',
        'sample_rate': sample_rate,
        'duration': duration,
        'frequency_hz': frequency,
        'channels': 1,
        'bits_per_sample': 16,
        'num_samples': len(audio_data),
        'max_amplitude': float(np.max(np.abs(audio_data))),
        'rms': float(np.sqrt(np.mean(audio_data**2)))
    }])
    
    metadata.to_csv('audio_metadata.csv', index=False)
    
    print(f"WAV save into: test_audio.wav")
    print(f"metadata save into: audio_metadata.csv")
    print("\n metadata content:")
    print(metadata.T)


generate_and_save_wav_with_metadata()

benchmark steps:

  1. checkout branch:https://github.com/vllm-project/guidellm/pull/521,then do cmd: pip install -e ./[dev]
  2. Start mock server via cmd: guidellm mock-server --host 0.0.0.0 --port 8080
  3. Append some console print in src/guidellm/scheduler/scheduler.py or else there is no any valid info in benchmark result
Image
  1. Execute audio generating script to generate audio wav file and metadata csv file into local path

  2. Do benchmark test via cmd:

guidellm benchmark \
    --target "http://localhost:8080" \
    --request-type "audio_transcriptions" \
    --rate-type "throughput" \
    --rate 1 \
   --max-requests 1 \
   --data "./audio_metadata.csv"

benchmark console print:

 request_info in scheduler  request_id='7562ef0e-8c58-435c-a48a-1292511e8d7f'
status='queued' scheduler_node_id=-1 scheduler_process_id=0
scheduler_start_time=1766641850.768449
timings=RequestTimings(targeted_start=None, queued=1766641852.279289,
dequeued=None, scheduled_at=None, resolve_start=None, request_start=None,
first_request_iteration=None, first_token_iteration=None,
last_token_iteration=None, last_request_iteration=None, request_iterations=0,
token_iterations=0, request_end=None, resolve_end=None, finalized=None)
error=None started_at=None completed_at=None

 request_info in scheduler  request_id='7562ef0e-8c58-435c-a48a-1292511e8d7f'
status='pending' scheduler_node_id=-1 scheduler_process_id=0
scheduler_start_time=1766641850.768449
timings=RequestTimings(targeted_start=1766641850.768449,
queued=1766641852.279289, dequeued=1766641852.2864509, scheduled_at=None,
resolve_start=None, request_start=None, first_request_iteration=None,
first_token_iteration=None, last_token_iteration=None,
last_request_iteration=None, request_iterations=0, token_iterations=0,
request_end=None, resolve_end=None, finalized=None) error=None started_at=None
completed_at=None

 request_info in scheduler  request_id='7562ef0e-8c58-435c-a48a-1292511e8d7f'
status='in_progress' scheduler_node_id=-1 scheduler_process_id=0
scheduler_start_time=1766641850.768449
timings=RequestTimings(targeted_start=1766641850.768449,
queued=1766641852.279289, dequeued=1766641852.2864509,
scheduled_at=1766641852.2864509, resolve_start=1766641852.286586,
request_start=None, first_request_iteration=None, first_token_iteration=None,
last_token_iteration=None, last_request_iteration=None, request_iterations=0,
token_iterations=0, request_end=None, resolve_end=None, finalized=None)
error=None started_at=1766641852.286586 completed_at=None

 request_info in scheduler  request_id='7562ef0e-8c58-435c-a48a-1292511e8d7f'
status='errored' scheduler_node_id=-1 scheduler_process_id=0
scheduler_start_time=1766641850.768449
timings=RequestTimings(targeted_start=1766641850.768449,
queued=1766641852.279289, dequeued=1766641852.2864509,
scheduled_at=1766641852.2864509, resolve_start=1766641852.286586,
request_start=1766641852.286649, first_request_iteration=None,
first_token_iteration=None, last_token_iteration=None,
last_request_iteration=None, request_iterations=0, token_iterations=0,
request_end=None, resolve_end=1766641852.287371, finalized=1766641852.290621)
error="Invalid type for value. Expected primitive type, got <class 'dict'>:
{'include_usage': True}" started_at=1766641852.286649
completed_at=1766641852.287371
╭─ Benchmarks ─────────────────────────────────────────────────────────────────╮
│ [1… thr… (c… Req:    0.0 req/s,    0.00s Lat,     0.0 Conc,       0 Comp,  … │
│              Tok:    0.0 gen/s,    0.0 tot/s,   0.0ms TTFT,    0.0ms ITL,  … │
╰──────────────────────────────────────────────────────────────────────────────╯
Generating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ (1/1) [ 0:00:02 < 0:00:00 ]
25-12-25 13:50:53|DEBUG            |guidellm.utils.text:load_text:228 - Loading text: https://blog.vllm.ai/guidellm/ui/v0.5.0/index.html


ℹ Run Summary Info
|============|==========|==========|=====|======|======|======|=====|=====|======|=====|=====|
| Benchmark  | Timings                             ||||| Input Tokens   ||| Output Tokens  |||
| Strategy   | Start    | End      | Dur | Warm | Cool | Comp | Inc | Err | Comp | Inc | Err |
|            |          |          | Sec | Sec  | Sec  | Tot  | Tot | Tot | Tot  | Tot | Tot |
|------------|----------|----------|-----|------|------|------|-----|-----|------|-----|-----|
| throughput | 13:50:50 | 13:50:52 | 1.5 | 0.0  | 0.0  | 0.0  | 0.0 | 0.0 | 0.0  | 0.0 | 0.0 |
|============|==========|==========|=====|======|======|======|=====|=====|======|=====|=====|


ℹ Audio Metrics Statistics (Completed Requests)
|============|=======|======|======|======|=======|======|======|======|=======|======|======|======|
| Benchmark  | Input Samples           |||| Input Seconds           |||| Input Bytes             ||||
| Strategy   | Per Request || Per Second || Per Request || Per Second || Per Request || Per Second ||
|            | Mdn   | p95  | Mdn  | Mean | Mdn   | p95  | Mdn  | Mean | Mdn   | p95  | Mdn  | Mean |
|------------|-------|------|------|------|-------|------|------|------|-------|------|------|------|
| throughput | 0.0   | 0.0  | 0.0  | 0.0  | 0.0   | 0.0  | 0.0  | 0.0  | 0.0   | 0.0  | 0.0  | 0.0  |
|============|=======|======|======|======|=======|======|======|======|=======|======|======|======|


ℹ Request Token Statistics (Completed Requests)
|============|======|=====|======|======|======|=====|=======|======|=========|========|
| Benchmark  | Input Tok || Output Tok || Total Tok || Stream Iter || Output Tok      ||
| Strategy   | Per Req   || Per Req    || Per Req   || Per Req     || Per Stream Iter ||
|            | Mdn  | p95 | Mdn  | p95  | Mdn  | p95 | Mdn   | p95  | Mdn     | p95    |
|------------|------|-----|------|------|------|-----|-------|------|---------|--------|
| throughput | 0.0  | 0.0 | 0.0  | 0.0  | 0.0  | 0.0 | 0.0   | 0.0  | 0.0     | 0.0    |
|============|======|=====|======|======|======|=====|=======|======|=========|========|


ℹ Request Latency Statistics (Completed Requests)
|============|=========|========|=====|=====|=====|=====|=====|=====|
| Benchmark  | Request Latency || TTFT     || ITL      || TPOT     ||
| Strategy   | Sec             || ms       || ms       || ms       ||
|            | Mdn     | p95    | Mdn | p95 | Mdn | p95 | Mdn | p95 |
|------------|---------|--------|-----|-----|-----|-----|-----|-----|
| throughput | 0.0     | 0.0    | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
|============|=========|========|=====|=====|=====|=====|=====|=====|


ℹ Server Throughput Statistics
|============|=====|======|=======|======|=======|=======|========|=======|=======|=======|
| Benchmark  | Requests               |||| Input Tokens || Output Tokens || Total Tokens ||
| Strategy   | Per Sec   || Concurrency || Per Sec      || Per Sec       || Per Sec      ||
|            | Mdn | Mean | Mdn   | Mean | Mdn   | Mean  | Mdn    | Mean  | Mdn   | Mean  |
|------------|-----|------|-------|------|-------|-------|--------|-------|-------|-------|
| throughput | 0.0 | 0.0  | 0.0   | 0.0  | 0.0   | 0.0   | 0.0    | 0.0   | 0.0   | 0.0   |
|============|=====|======|=======|======|=======|=======|========|=======|=======|=======|



✔ Benchmarking complete, generated 1 benchmark(s)

From the last request_info, there is one error:
error="Invalid type for value. Expected primitive type, got <class 'dict'>:
{'include_usage': True}"

Expected behavior
A clear and concise description of what you expected to happen.

audio_transcriptions benchmark test can work in mock_server.

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 20.04]: MacOS 12.7.6
  2. Python version [e.g. 3.12.2]:Python 3.11.5
  3. guideLLM version: 0.5.0.dev0

To Reproduce
Exact steps to reproduce the behavior:

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions