Separate “Other” into “None of the above” and “Algorithm can’t decide” #10

jucor · 2025-01-21T20:47:55Z

Dear Jigsaw team

As discussed by email, it would be very useful if the topic analysis part of the Sensemaking tools could make the difference between “Other” (in the sense of “None of the above”) and “The Algorithm Can’t Decide”. See rationale at compdemocracy/polis#1878

I believe this can be achieved in src/tasks/categorization.ts in function assignDefaultCategory https://github.com/Jigsaw-Code/sensemaking-tools/blame/8eb482e35c44d2399ab68d684a12d51a74472ad4/src/tasks/categorization.ts#L381 called by function categorizeWithRetry .

I do note that SenseMaker does in some cases provide a form of distinction by using the Uncategorized sub-topic of the category Other . However this:

does not work for conversations which do not need subtopics, i.e. when the argument includeSubtopics of categorizeWithRetry is set to False, due to line

sensemaking-tools/src/tasks/categorization.ts

Lines 393 to 395 in 8eb482e

    
           includeSubtopics 
        
             ? ({ name: "Other", subtopics: [{ name: "Uncategorized" }] } as NestedTopic) 
        
             : ({ name: "Other" } as FlatTopic),

-- in those case, there is no distinction.

Is not documented (but yay open-source, it can be found in the code :) ) ,
Thus even when includeSubtopics is True it is easy to overlook,
and because of that, can very easily confuse the users and the interpretation of the different topics, be it for debugging purposes (“Why is ‘Other’ so big?”) or for actual use.

Would be terrific to thus move the Uncategorized as a top-level topic rather than a subtopic.
Thanks !

The text was updated successfully, but these errors were encountered:

jucor · 2025-01-21T21:55:00Z

PS: it's neat that the Uncategorized feature when includeSubtopics are enabled has its own unit tests in

sensemaking-tools/src/tasks/categorization.test.ts

Lines 125 to 168 in d722ce8

    
           it('should assign "Other" topic and "Uncategorized" subtopic to comments that failed categorization after max retries', async () => { 
        
             const comments: Comment[] = [ 
        
               { id: "1", text: "Comment 1" }, 
        
               { id: "2", text: "Comment 2" }, 
        
               { id: "3", text: "Comment 3" }, 
        
             ]; 
        
             const topics = '[{"name": "Topic 1", "subtopics": []}]'; 
        
             const instructions = "Categorize the comments based on these topics: " + topics; 
        
             const includeSubtopics = true; 
        
             const topicsJson = [{ name: "Topic 1", subtopics: [] }]; 
        
             // Mock the model to always return an empty response. This simulates a 
        
             // categorization failure. 
        
             mockGenerateData.mockReturnValue(Promise.resolve([])); 
        
             const commentRecords = await categorizeWithRetry( 
        
               new VertexModel("project", "location", "gemini-1000"), 
        
               instructions, 
        
               comments, 
        
               includeSubtopics, 
        
               topicsJson 
        
             ); 
        
             expect(mockGenerateData).toHaveBeenCalledTimes(3); 
        
             const expected = [ 
        
               { 
        
                 id: "1", 
        
                 text: "Comment 1", 
        
                 topics: [{ name: "Other", subtopics: [{ name: "Uncategorized" }] }], 
        
               }, 
        
               { 
        
                 id: "2", 
        
                 text: "Comment 2", 
        
                 topics: [{ name: "Other", subtopics: [{ name: "Uncategorized" }] }], 
        
               }, 
        
               { 
        
                 id: "3", 
        
                 text: "Comment 3", 
        
                 topics: [{ name: "Other", subtopics: [{ name: "Uncategorized" }] }], 
        
               }, 
        
             ]; 
        
             expect(commentRecords).toEqual(expected); 
        
           });

:)

cianbrassilg · 2025-01-22T18:15:33Z

@jucor This is a good call out, and an interesting suggested approach! Thanks for sharing. Clarifying question: from your testing so far, do you see the primary benefit of the 'Algorithm can't decide' category as being cases where the underlying comment is too ambiguous? Linked issue also mentions categorization failure instances, but curious how much this has been an issue in your experience? We can test this out and see how it impacts the resulting category numbers

jucor · 2025-01-27T11:11:00Z

Thanks @cianbrassilg -- great to move the convo here. As mentioned in the other thread compdemocracy/polis#1876 (comment) , yes, for example when I ran the BG2018-short "2018 BG with vote tallies (filtered) - comments-with-votes-small" example spreadsheet provided by @metasoarous : more than 70% of the comments ended in "algorithm could not determine". I suspect (but I did not verify) that's because the spreadsheet had a lot of comments that were not filtered out but whose content had been deleted.

I remember also @DZNarayanan mentioning that "Other" is often pretty big, and discussing that it's often the biggest category -- so as we're investigating why, I think ruling out "Algorithm could not determine" would be the first thing to check for (and since doing it automatically is just a code change, that'd be easier than doing it manually :) ).

jucor mentioned this issue Jan 22, 2025

[Topic Models] Understand various shades of “Other/None of the Above” compdemocracy/polis#1876

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate “Other” into “None of the above” and “Algorithm can’t decide” #10

Separate “Other” into “None of the above” and “Algorithm can’t decide” #10

jucor commented Jan 21, 2025 •

edited

Loading

jucor commented Jan 21, 2025

cianbrassilg commented Jan 22, 2025

jucor commented Jan 27, 2025

Separate “Other” into “None of the above” and “Algorithm can’t decide” #10

Separate “Other” into “None of the above” and “Algorithm can’t decide” #10

Comments

jucor commented Jan 21, 2025 • edited Loading

jucor commented Jan 21, 2025

cianbrassilg commented Jan 22, 2025

jucor commented Jan 27, 2025

jucor commented Jan 21, 2025 •

edited

Loading