Evaluation with OpenCompass

Hi, thanks for the great works.

We are opencompass team(https://github.com/internLM/OpenCompass/), and focus on LLM evalaution.


OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:

- **Comprehensive support for models and datasets**: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.

- **Efficient distributed evaluation**: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.

- **Diversified evaluation paradigms**: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue type prompt templates, to easily stimulate the maximum performance of various models.

- **Modular design with high extensibility**: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded!

- **Experiment management and reporting mechanism**: Use config files to fully record each experiment, support real-time reporting of results.


We would like to support the evaluation of open_llama with opencompass. If you have any ideas or suggestions, feel free to raise an issue or contact us with opencompass@pjlab.org.cn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation with OpenCompass #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation with OpenCompass #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions