Skip to content

Metrics definition and settings used for LLM benchmarks #2012

@vineel96

Description

@vineel96

Hello,
I have query regarding metrics definition and settings used in LLM benchmarks in MLCommons (https://mlcommons.org/benchmarks/inference-datacenter/).
For LLM-Q/A task in the benchmark table in above link:

  1. How TPOT and TTFT are calculated? Can you source code for it? For TPOT how many tokens are generated?
  2. For openocra dataset, input prompt provided is "system_prompt" + "question" or "question" only? Since openocra dataset has both colmuns
  3. For quality metrics in table the values for ROUGE-1 is its precision , recall or fmeasure?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions