Hello there. I study how we can predict human disagreements during human annotation using machine learning. This work is helpful when we want to model human disagreements, which is conventionally considered annotation noise. Following recent breakthroughs in machine learning research has shown instances where the algorithms being biased towards specific groups. I’m PhD student at the Lab for Population Intelligence at RIT led by Professor Christopher Homan.
I’ve interned at Meta (Facebook) in Summer 2022 and RPI (IBM Watson Project) in Summer 2019.
In parallel, I’m also working with University of Kelaniya in Sri Lanka to build an electronic medical record system for the entity of Sri Lanka.
My previous research also comes from sociolinguistics, studying the evolution of Sri Lankan English across multiple generations.
I enjoy DevOPS side of systems and building systems that are end to end.
When I’m not at my desk, I envy traveling.
PhD in Computer Science, Current
Rochester Institute of Technology
BSc in Computer Science, 2017
University of Kelaniya
This paper examines social web content moderation from two key perspectives: automated methods (machine moderators) and human evaluators (human moderators). We conduct a noise audit at an unprecedented scale using nine machine moderators trained on well-known offensive speech data sets evaluated on a corpus sampled from 92 million YouTube comments discussing a multitude of issues relevant to US politics. We introduce a first-of-its-kind data set of vicarious offense. We ask annotators: (1) if they find a given social media post offensive; and (2) how offensive annotators sharing different political beliefs would find the same content. Our experiments with machine moderators reveal that moderation outcomes wildly vary across different machine moderators. Our experiments with human moderators suggest that (1) political leanings considerably affect first-person offense perspective; (2) Republicans are the worst predictors of vicarious offense; (3) predicting vicarious offense for the Republicans is most challenging than predicting vicarious offense for the Independents and the Democrats; and (4) disagreement across political identity groups considerably increases when sensitive issues such as reproductive rights or gun control/rights are discussed. Both experiments suggest that offense, is indeed, highly subjective and raise important questions concerning content moderation practices.
Annotator disagreement is often dismissed as noise or the result of poor annotation process quality. Others have argued that it can be meaningful. But lacking a rigorous statistical foundation, the analysis of disagreement patterns can resemble a high-tech form of tea-leaf-reading. We contribute a framework for analyzing the variation of per-item annotator response distributions to data for humans-in-the-loop machine learning. We provide visualizations for, and use the framework to analyze the variance in, a crowdsourced dataset of hard-to-classify examples of the OpenImages archive.
We propose a fully Bayesian framework for learning ground truth labels from noisy annotators. Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model. Earlier research along these lines has neither fully incorporated label distributions nor explored clustering by annotators only or data only. Our framework incorporates all of these properties within a graphical model designed to provide better ground truth estimates of annotator responses as input to any black box supervised learning algorithm. We conduct supervised learning experiments with variations of our models and compare them to the performance of several baseline models.
Supervised machine learning often requires human-annotated data. While annotator disagreement is typically interpreted as evidence of noise, population-level label distribution learning (PLDL) treats the collection of annotations for each data item as a sample of the opinions of a population of human annotators, among whom disagreement may be proper and expected, even with no noise present. From this perspective, a typical training set may contain a large number of very small-sized samples, one for each data item, none of which, by itself, is large enough to be considered representative of the underlying population’s beliefs about that item. We propose an algorithmic framework and new statistical tests for PLDL that account for sampling size. We apply them to previously proposed methods for sharing labels across similar data items. We also propose new approaches for label sharing, which we call neighborhood-based pooling.
Sri Lankan English (SLE) has unique phonological, morphological, lexical and syntactic features which have gradually developed since the introduction of English to Sri Lanka. Vocabulary is one of the first features to develop in SLE. Although the SLE vocabulary has been studied and recorded, its generational difference has not been examined. The objective of the study was to investigate if the ‘generational change’ observable in the SLE vocabulary could be considered an evolution. This was done through a qualitative, comparative analysis of the vocabulary used in the decades 1955 – 1965 and 2005 – 2015. The theoretical base of the research was defined using two theories of language evolution: the apparent-time hypothesis and age-gradedness. The primary data was taken from the Ceylon Observer of the decade 1955 – 1965 and the Sunday Observer of the decade 2005 - 2015. The words were used in a questionnaire survey of 60 participants of which 30 were of the age 15 – 25 years and 30 were of the age 65- 75 years. The results of the survey were then analyzed in detail through 10 interviews. The surveys and the interviews were conducted to prove/disprove the age-gradedness of the SLE vocabulary and to prove/disprove the apparent-time hypothesis in relation to the SLE vocabulary. Most of the vocabulary used disproved age-gradedness. The usages of these terms were found to be generation specific, which supported that the SLE vocabulary is not age-graded. The interviews supported the apparent-time hypotheses as the older generation showed that their vocabulary has not changed significantly over the years. From these observations, it could be concluded that within the scope of the research, the generational difference observable in the SLE vocabulary over 60 years could be termed an evolution.
Worked at the Facebook Creators Wellbeing Team on Public Conversations. Overlooked models for improving the comments recommendation and ranking models on Facebook Pages with varying populations of followers from around the globe.
Project - Introduction of a Multi-Label Multi-Task model for assisting page administrations for comment management.
Collaborative project with the Faculty of Medicine, and Colombo North Teaching Hospital, Sri Lanka.