Annotator Response Distributions as a Sampling Frame

Jan 1, 2022·

Christopher Homan

Tharindu Cyril Weerasooriya

Lora Aroyo

Chris Welty

· 0 min read

Abstract

Annotator disagreement is often dismissed as noise or the result of poor annotation process quality. Others have argued that it can be meaningful. But lacking a rigorous statistical foundation, the analysis of disagreement patterns can resemble a high-tech form of tea-leaf-reading. We contribute a framework for analyzing the variation of per-item annotator response distributions to data for humans-in-the-loop machine learning. We provide visualizations for, and use the framework to analyze the variance in, a crowdsourced dataset of hard-to-classify examples of the OpenImages archive.

Type

Publication

Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022

Last updated on Jan 1, 2022

Annotator Disagreement Statistical Analysis Crowdsourcing NLP

← Vicarious Offense and Noise Audit of Offensive Speech Classifiers Feb 1, 2023

Improving Label Quality by Joint Probabilistic Modeling of Items and Annotators Jan 1, 2022 →