-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate “Other” into “None of the above” and “Algorithm can’t decide” #10
Comments
PS: it's neat that the sensemaking-tools/src/tasks/categorization.test.ts Lines 125 to 168 in d722ce8
|
@jucor This is a good call out, and an interesting suggested approach! Thanks for sharing. Clarifying question: from your testing so far, do you see the primary benefit of the 'Algorithm can't decide' category as being cases where the underlying comment is too ambiguous? Linked issue also mentions categorization failure instances, but curious how much this has been an issue in your experience? We can test this out and see how it impacts the resulting category numbers |
Thanks @cianbrassilg -- great to move the convo here. As mentioned in the other thread compdemocracy/polis#1876 (comment) , yes, for example when I ran the BG2018-short "2018 BG with vote tallies (filtered) - comments-with-votes-small" example spreadsheet provided by @metasoarous : more than 70% of the comments ended in "algorithm could not determine". I suspect (but I did not verify) that's because the spreadsheet had a lot of comments that were not filtered out but whose content had been deleted. I remember also @DZNarayanan mentioning that "Other" is often pretty big, and discussing that it's often the biggest category -- so as we're investigating why, I think ruling out "Algorithm could not determine" would be the first thing to check for (and since doing it automatically is just a code change, that'd be easier than doing it manually :) ). |
Dear Jigsaw team
As discussed by email, it would be very useful if the topic analysis part of the Sensemaking tools could make the difference between “Other” (in the sense of “None of the above”) and “The Algorithm Can’t Decide”. See rationale at compdemocracy/polis#1878
I believe this can be achieved in
src/tasks/categorization.ts
in functionassignDefaultCategory
https://github.com/Jigsaw-Code/sensemaking-tools/blame/8eb482e35c44d2399ab68d684a12d51a74472ad4/src/tasks/categorization.ts#L381 called by functioncategorizeWithRetry
.I do note that SenseMaker does in some cases provide a form of distinction by using the
Uncategorized
sub-topic of the categoryOther
. However this:includeSubtopics
ofcategorizeWithRetry
is set toFalse
, due to linesensemaking-tools/src/tasks/categorization.ts
Lines 393 to 395 in 8eb482e
includeSubtopics
isTrue
it is easy to overlook,Would be terrific to thus move the
Uncategorized
as a top-level topic rather than a subtopic.Thanks !
The text was updated successfully, but these errors were encountered: