Tharindu Cyril Weerasooriya
Tharindu Cyril Weerasooriya
Home
Posts
Projects
Talks
Publications
Contact
Light
Dark
Automatic
1
Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive
Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper …
Tharindu Cyril Weerasooriya
,
Sujan Dutta
,
Tharindu Ranasinghe
,
Marcos Zampieri
,
Christopher M. Homan
,
Ashiqur R. KhudaBukhsh
PDF
Cite
Dataset
Disagreement Matters: Preserving Label Diversity by Jointly Modeling Item and Annotator Label Distributions with DisCo
Annotator disagreement is common whenever human judgment is needed for supervised learning. It is conventional to assume that one label …
Tharindu Cyril Weerasooriya
,
Alexander G. Ororbia II
,
Raj Bhensadadia
,
Ashiqur KhudaBukhsh
,
Christopher M. Homan
PDF
Cite
Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning
Human-annotated data plays a critical role in the fairness of AI systems, including those that deal with life-altering decisions or …
Tharindu Cyril Weerasooriya
,
Sarah Luger
,
Saloni Poddar
,
Ashiqur KhudaBukhsh
,
Christopher M. Homan
PDF
Cite
Findings from the Bambara - French Machine Translation Competition (BFMT 2023)
Orange Silicon Valley hosted a low-resource machine translation (MT) competition with monetary prizes. The goals of the competition …
Ninoh Agostinho Da Silva
,
Tunde Oluwaseyi Ajayi
,
Alexander Antonov
,
Panga Azazia Kamate
,
Moussa Coulibaly
,
Mason Del Rio
,
Yacouba Diarra
,
Sebastian Diarra
,
Chris Emezue
,
Joel Hamilcaro
,
Christopher M. Homan
,
Alexander Most
,
Joseph Mwatukange
,
Peter Ohue
,
Michael Pham
,
Abdoulaye Sako
,
Sokhar Samb
,
Yaya Sy
,
Tharindu Cyril Weerasooriya
,
Yacine Zahidi
,
Sarah Luger
PDF
Cite
Annotator Response Distributions as a Sampling Frame
Annotator disagreement is often dismissed as noise or the result of poor annotation process quality. Others have argued that it can be …
Christopher Homan
,
Tharindu Cyril Weerasooriya
,
Lora Aroyo
,
Chris Welty
PDF
Cite
Improving Label Quality by Joint Probabilistic Modeling of Items and Annotators
We propose a fully Bayesian framework for learning ground truth labels from noisy annotators. Our framework ensures scalability by …
Tharindu Cyril Weerasooriya
,
Alexander G Ororbia
,
Christopher M Homan
PDF
Cite
A framework for automated corpus compilation for KeyXtract: Twitter model
The corpus is a limiting factor for a keyword extraction process with a word matching stage. This paper proposes a framework to …
Tharindu Weerasooriya
,
Nandula Perera
,
S. R. Liyanage
PDF
Cite
DOI
A method to extract essential keywords from a tweet using NLP tools
A tweet is an authentic use of Natural Language where the user has to deliver the message in 140 characters or less. According to …
Tharindu Weerasooriya
,
Nandula Perera
,
S. R. Liyanage
PDF
Cite
DOI
KeyXtract Twitter Model - An Essential Keywords Extraction Model for Twitter Designed using NLP Tools
Since a tweet is limited to 140 characters, it is ambiguous and difficult for traditional Natural Language Processing (NLP) tools to …
Tharindu Weerasooriya
,
Nandula Perera
,
S. R. Liyanage
PDF
Cite
Cite
×