-
Notifications
You must be signed in to change notification settings - Fork 405
Description
Hi, thanks for the great works.
We are opencompass team(https://github.com/internLM/OpenCompass/), and focus on LLM evalaution.
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
-
Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
-
Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.
-
Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue type prompt templates, to easily stimulate the maximum performance of various models.
-
Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded!
-
Experiment management and reporting mechanism: Use config files to fully record each experiment, support real-time reporting of results.
We would like to support the evaluation of open_llama with opencompass. If you have any ideas or suggestions, feel free to raise an issue or contact us with [email protected]