Tip
LLMs tend to hold back contradiction when you call an idea yours. Try "a recipe I found" instead of "my recipe idea" for the honest take.
We study whether LLMs faithfully represent the diversity of public opinions or collapse toward sycophantic agreement with user-stated beliefs. PluralEval builds reference opinion sets from real Reddit discussions, clusters them into stance groups, then measures how LLM outputs degrade under biased prompting through MCQ identification, ranking, and open-ended generation experiments.
The pipeline has four stages:
0_data/— scrape raw Reddit submissions + comments into per-submission CSVs.1_opinion_generation/— extract one atomic opinion per comment.2_clustering/— group opinions into LLM-summarised clusters.3_evaluation/— three experiments measuring plurality awareness:mcq_popularity_identification/ranking_degradation/sycophancy_detection/
See each sub-folder's README for inputs, outputs, and run commands.
conda env create -f environment.yml
conda activate pluraleval
export OPENAI_API_KEY=...
export GEMINI_API_KEY=...
export ANTHROPIC_API_KEY=...@inproceedings{mundada26evaluating,
title = "Evaluating language model pluralism through in-the-wild crowd discussions",
author = "Gagan Mundada and Rohan Surana and Nandhini Swaminathan and Bodhisattwa Prasad Majumder and Junda Wu and Julian McAuley and Zhouhang Xie",
year = "2026",
booktitle = "ACL"
}