Modle Evaluation

Use commercial models to evaluate the quality of local models for roleplay or novel generation.

This evaluation simulates how SillyTavern communicates with Text Generation Web UI, which means you need to deploy both tools to evaluate the local models' text quality using commercial models like DeepSeek R1, ChatGPT, or other models.

Why this tool?

There are no datasets specifically evaluating novel text quality. You typically have to do it manually or use commercial models to evaluate the novel outputs or roleplay outputs.

For the first approach, it takes too much time to finish the evaluation. The latter one can batch generate the outputs and evaluate them automatically, saving significant time and effort. :)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
evaluation		evaluation
helper		helper
model		model
.env		.env
README.md		README.md
entry.py		entry.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modle Evaluation

Why this tool?

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modle Evaluation

Why this tool?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages