Metrics definition and settings used for LLM benchmarks

Hello,
I have query regarding metrics definition and settings used in LLM benchmarks in MLCommons (https://mlcommons.org/benchmarks/inference-datacenter/).
For LLM-Q/A task in the benchmark table in above link:
1. How TPOT and TTFT are calculated? Can you source code for it? For TPOT how many tokens are generated?
2. For openocra dataset, input prompt provided is "system_prompt" + "question" or "question" only? Since openocra dataset has both colmuns
3. For quality metrics in table the values for ROUGE-1 is its precision , recall or fmeasure?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics definition and settings used for LLM benchmarks #2012

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metrics definition and settings used for LLM benchmarks #2012

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions