Skip to content

GaganVM/ACL26-PluralEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACL26-PluralEval

PluralEval Framework

Evaluating Language Model Pluralism through In-the-wild Crowd Discussions


Tip

LLMs tend to hold back contradiction when you call an idea yours. Try "a recipe I found" instead of "my recipe idea" for the honest take.

We study whether LLMs faithfully represent the diversity of public opinions or collapse toward sycophantic agreement with user-stated beliefs. PluralEval builds reference opinion sets from real Reddit discussions, clusters them into stance groups, then measures how LLM outputs degrade under biased prompting through MCQ identification, ranking, and open-ended generation experiments.

The pipeline has four stages:

  1. 0_data/ — scrape raw Reddit submissions + comments into per-submission CSVs.
  2. 1_opinion_generation/ — extract one atomic opinion per comment.
  3. 2_clustering/ — group opinions into LLM-summarised clusters.
  4. 3_evaluation/ — three experiments measuring plurality awareness:
    • mcq_popularity_identification/
    • ranking_degradation/
    • sycophancy_detection/

See each sub-folder's README for inputs, outputs, and run commands.

Setup

conda env create -f environment.yml
conda activate pluraleval
export OPENAI_API_KEY=...
export GEMINI_API_KEY=...
export ANTHROPIC_API_KEY=...

Citation

@inproceedings{mundada26evaluating,
  title     = "Evaluating language model pluralism through in-the-wild crowd discussions",
  author    = "Gagan Mundada and Rohan Surana and Nandhini Swaminathan and Bodhisattwa Prasad Majumder and Junda Wu and Julian McAuley and Zhouhang Xie",
  year      = "2026",
  booktitle = "ACL"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages