You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that we could relatively easily create some gold evaluation data for SD problem by combining the time-sync annotation and the speaker turn markers in our "gold" transcript files.
Related
There's the "cleaner" code that removes the speaker markers (clamsproject/clams-utils#2), and we should be able to "reverse" the functionality to obtain the speaker markers, to associate with the time frames for series of their utterances.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Since the text segmentation (lines) used in the sync annotation data doesn't exactly match the speaker turns, we need some additional steps to decide the turn boundaries within the text lines where two speakers' utterances are mixed/overlapping.
A few ideas;
use the majority speaker as "the" speaker. e.g., [ A;[you jim] B:[I am good] ] << mark the whole as B (B's token is more than A's token) For 50-50 situation? Could do some arbitrary assigned, like total random assignment, always the first, etc.
use the number of token to divide the total time duration (e.g., A spoke 2 tokens, B spoke 2 tokens, total annotation is 1s-3s for those 4 tokens << A: 1-2s, B: 2-3s
actually run some forced alignment algorithm to find the best model prediction and use it as "silver"
use FA algorithm, manually review the results and make them fully "gold"
New Feature Summary
It seems that we could relatively easily create some gold evaluation data for SD problem by combining the time-sync annotation and the speaker turn markers in our "gold" transcript files.
Related
There's the "cleaner" code that removes the speaker markers (clamsproject/clams-utils#2), and we should be able to "reverse" the functionality to obtain the speaker markers, to associate with the time frames for series of their utterances.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: