Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gold format for speaker diarization #96

Open
keighrim opened this issue Aug 8, 2024 · 1 comment · May be fixed by #97
Open

gold format for speaker diarization #96

keighrim opened this issue Aug 8, 2024 · 1 comment · May be fixed by #97
Labels
✨N New feature or request

Comments

@keighrim
Copy link
Member

keighrim commented Aug 8, 2024

New Feature Summary

It seems that we could relatively easily create some gold evaluation data for SD problem by combining the time-sync annotation and the speaker turn markers in our "gold" transcript files.

Related

There's the "cleaner" code that removes the speaker markers (clamsproject/clams-utils#2), and we should be able to "reverse" the functionality to obtain the speaker markers, to associate with the time frames for series of their utterances.

Alternatives

No response

Additional context

No response

@keighrim keighrim added the ✨N New feature or request label Aug 8, 2024
@clams-bot clams-bot added this to infra Aug 8, 2024
@github-project-automation github-project-automation bot moved this to Todo in infra Aug 8, 2024
@keighrim
Copy link
Member Author

keighrim commented Aug 9, 2024

Since the text segmentation (lines) used in the sync annotation data doesn't exactly match the speaker turns, we need some additional steps to decide the turn boundaries within the text lines where two speakers' utterances are mixed/overlapping.
A few ideas;

  1. use the majority speaker as "the" speaker. e.g., [ A;[you jim] B:[I am good] ] << mark the whole as B (B's token is more than A's token) For 50-50 situation? Could do some arbitrary assigned, like total random assignment, always the first, etc.
  2. use the number of token to divide the total time duration (e.g., A spoke 2 tokens, B spoke 2 tokens, total annotation is 1s-3s for those 4 tokens << A: 1-2s, B: 2-3s
  3. actually run some forced alignment algorithm to find the best model prediction and use it as "silver"
  4. use FA algorithm, manually review the results and make them fully "gold"

selenasong added a commit that referenced this issue Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨N New feature or request
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

1 participant