gold format for speaker diarization #96

keighrim · 2024-08-08T16:41:08Z

New Feature Summary

It seems that we could relatively easily create some gold evaluation data for SD problem by combining the time-sync annotation and the speaker turn markers in our "gold" transcript files.

Alternatives

No response

Additional context

No response

keighrim · 2024-08-09T14:39:21Z

Since the text segmentation (lines) used in the sync annotation data doesn't exactly match the speaker turns, we need some additional steps to decide the turn boundaries within the text lines where two speakers' utterances are mixed/overlapping.
A few ideas;

use the majority speaker as "the" speaker. e.g., [ A;[you jim] B:[I am good] ] << mark the whole as B (B's token is more than A's token) For 50-50 situation? Could do some arbitrary assigned, like total random assignment, always the first, etc.
use the number of token to divide the total time duration (e.g., A spoke 2 tokens, B spoke 2 tokens, total annotation is 1s-3s for those 4 tokens << A: 1-2s, B: 2-3s
actually run some forced alignment algorithm to find the best model prediction and use it as "silver"
use FA algorithm, manually review the results and make them fully "gold"

keighrim added the ✨N New feature or request label Aug 8, 2024

clams-bot added this to infra Aug 8, 2024

github-project-automation bot moved this to Todo in infra Aug 8, 2024

selenasong added a commit that referenced this issue Aug 23, 2024

fix issue #96

deecd8c

keighrim linked a pull request Nov 26, 2024 that will close this issue

Generating speaker-id gold annotations from NewsHour transcripts #97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gold format for speaker diarization #96

gold format for speaker diarization #96

keighrim commented Aug 8, 2024

keighrim commented Aug 9, 2024

gold format for speaker diarization #96

gold format for speaker diarization #96

Comments

keighrim commented Aug 8, 2024

New Feature Summary

Related

Alternatives

Additional context

keighrim commented Aug 9, 2024