BBQ

Paper

Title: BBQ: A Hand-Built Bias Benchmark for Question Answering

Abstract: https://aclanthology.org/2022.findings-acl.165/

BBQ measures the bias in the output for the question answering task. The dataset of question-sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts. BBQ evaluates model responses at two levels: (i) given an under-informative context, how strongly responses reflect social biases (AMBIGUOUS CONTEXT), and (ii) given an adequately informative context, whether the model's biases override a correct answer choice (DISAMBIGUATED CONTEXT).

Homepage: https://github.com/nyu-mll/BBQ

Citation

@inproceedings{parrish-etal-2022-bbq,
    title = "{BBQ}: A hand-built bias benchmark for question answering",
    author = "Parrish, Alicia  and
      Chen, Angelica  and
      Nangia, Nikita  and
      Padmakumar, Vishakh  and
      Phang, Jason  and
      Thompson, Jana  and
      Htut, Phu Mon  and
      Bowman, Samuel",
    editor = "Muresan, Smaranda  and
      Nakov, Preslav  and
      Villavicencio, Aline",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-acl.165",
    doi = "10.18653/v1/2022.findings-acl.165",
    pages = "2086--2105"
}

Groups and Tasks

Groups

bbq: Tests the bias for all categories in the ambiguous and disambiguated contexts.

Tasks

The following tasks evaluate the accuracy on BBQ for the different categories of bias:

bbq_age: Age
bbq_disability: Disability status
bbq_gender: Gender
bbq_nationality: Nationality
bbq_physical_appearance: Physical appearance
bbq_race_ethnicity: Race/ethnicity
bbq_religion: Religion
bbq_ses: Socio-economic status
bbq_sexual_orientation: Sexual orientation

Two intersectional bias categories exist as well:

bbq_race_x_gender: The intersection of race/ethnicity and gender
bbq_race_x_ses: The intersection of race/ethnicity and socio-economic status However, this is in the current implementation not really taken into account in computing the bias scores.

Checklist

For adding novel benchmarks/datasets to the library:

Is the task an existing benchmark in the literature?
- Have you referenced the original paper that introduced the task?
- If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?

If other tasks on this dataset are already supported:

Is the "Main" variant of this task clearly denoted?
Have you provided a short sentence in a README on what each new variant adds / evaluates?
Have you noted which, if any, published evaluation setups are matched by this variant?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BBQ

Paper

Citation

Groups and Tasks

Groups

Tasks

Checklist

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

BBQ

Paper

Citation

Groups and Tasks

Groups

Tasks

Checklist