Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR evaluation should be done with "cleaned" text #63

Open
keighrim opened this issue Jul 18, 2024 · 1 comment
Open

ASR evaluation should be done with "cleaned" text #63

keighrim opened this issue Jul 18, 2024 · 1 comment
Labels
✨N New feature or request

Comments

@keighrim
Copy link
Member

New Feature Summary

Current evaluate.py in the asr_eval subproject is reading the text content from "gold" transcript file directly, but as we've seen, the "gold" files are quite noisy and need some clean-up (clamsproject/clams-utils#2) before being used for asr evaluation.

Since we have a new cleaner implementation (clamsproject/clams-utils#3), it's time to update the eval.py to use the cleaned copies of the transcript files.

Related

No response

Alternatives

No response

Additional context

No response

@keighrim keighrim added the ✨N New feature or request label Jul 18, 2024
@clams-bot clams-bot added this to infra Jul 18, 2024
@github-project-automation github-project-automation bot moved this to Todo in infra Jul 18, 2024
@keighrim
Copy link
Member Author

additionally, we can add more normalization like https://github.com/openai/whisper/tree/main/whisper/normalizers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨N New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant