ASR evaluation should be done with "cleaned" text #63

keighrim · 2024-07-18T16:53:36Z

New Feature Summary

Current evaluate.py in the asr_eval subproject is reading the text content from "gold" transcript file directly, but as we've seen, the "gold" files are quite noisy and need some clean-up (clamsproject/clams-utils#2) before being used for asr evaluation.

Since we have a new cleaner implementation (clamsproject/clams-utils#3), it's time to update the eval.py to use the cleaned copies of the transcript files.

Alternatives

No response

Additional context

No response

keighrim · 2024-09-26T14:29:29Z

additionally, we can add more normalization like https://github.com/openai/whisper/tree/main/whisper/normalizers

keighrim added the ✨N New feature or request label Jul 18, 2024

clams-bot added this to infra Jul 18, 2024

github-project-automation bot moved this to Todo in infra Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR evaluation should be done with "cleaned" text #63

ASR evaluation should be done with "cleaned" text #63

keighrim commented Jul 18, 2024

keighrim commented Sep 26, 2024

ASR evaluation should be done with "cleaned" text #63

ASR evaluation should be done with "cleaned" text #63

Comments

keighrim commented Jul 18, 2024

New Feature Summary

Related

Alternatives

Additional context

keighrim commented Sep 26, 2024