Disagreement Matters: Preserving Label Diversity by Jointly Modeling Item and Annotator Label Distributions with DisCo

Jul 1, 2023·

Tharindu Cyril Weerasooriya

Alexander G. Ororbia II

Raj Bhensadadia

Ashiqur KhudaBukhsh

Christopher M. Homan

· 0 min read

PDF

Abstract

Annotator disagreement is common whenever human judgment is needed for supervised learning. It is conventional to assume that one label per item represents ground truth. However, this obscures minority opinions, if present. We regard ``ground truth″ as the distribution of all labels that a population of annotators could produce, if asked (and of which we only have a small sample). We next introduce DisCo (Distribution from Context), a simple neural model that learns to predict this distribution. The model takes annotator-item pairs, rather than items alone, as input, and performs inference by aggregating over all annotators. Despite its simplicity, our experiments show that, on six benchmark datasets, our model is competitive with, and frequently outperforms, other, more complex models that either do not model specific annotators or were not designed for label distribution learning.

Type

Publication

Findings of the Association for Computational Linguistics: ACL 2023

Last updated on Jul 1, 2023

Label Distribution Learning Annotator Disagreement Neural Networks Deep Learning NLP

← Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive Dec 2, 2023

Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning Jul 1, 2023 →