Hello there. I study how we can predict human disagreements during human annotation using machine learning. This work is helpful when we want to model human disagreements, which is conventionally considered annotation noise. Following recent breakthroughs in machine learning research has shown instances where the algorithms being biased towards specific groups. I’m PhD student at the Lab for Population Intelligence at RIT led by Professor Christopher Homan.
In parallel, I’m also working with University of Kelaniya in Sri Lanka to build an electronic medical record system for the entity of Sri Lanka.
My previous research also comes from sociolinguistics, studying the evolution of Sri Lankan English across multiple generations.
I enjoy DevOPS side of systems and building systems that are end to end.
When I’m not at my desk, I envy traveling.
PhD in Computer Science, Current
Rochester Institute of Technology
BSc in Computer Science, 2017
University of Kelaniya
Supervised machine learning often requires human-annotated data. While annotator disagreement is typically interpreted as evidence of noise, population-level label distribution learning (PLDL) treats the collection of annotations for each data item as a sample of the opinions of a population of human annotators, among whom disagreement may be proper and expected, even with no noise present. From this perspective, a typical training set may contain a large number of very small-sized samples, one for each data item, none of which, by itself, is large enough to be considered representative of the underlying population’s beliefs about that item. We propose an algorithmic framework and new statistical tests for PLDL that account for sampling size. We apply them to previously proposed methods for sharing labels across similar data items. We also propose new approaches for label sharing, which we call neighborhood-based pooling.
Sri Lankans made over 100 million visits to public and private outpatient departments (OPD) during 2015, which is estimated to double in 2027. However, these visits have no records, either paper or electronic. Medical records are essential to provide continuity of care, and computer-based medical records were identified as essential technology in 1990 by the Institute of Medicine. The main initiative of the Ministry of Health addresses either OPD health information system or inward system, but it is limited to a few selected hospitals. There are no electronic health records (EHR) that can track patients as they crisscross between different primary care providers in public and private sectors, which is the normal behaviour of the majority of our patients. This paper gives a snapshot of the current healthcare system in Sri Lanka, notes the existing projects related to primary care health information systems, briefly reviews the current status of the global primary care EHR and describes our solution of a generic, cloud-based, open source EHR for use across public and private sectors focusing on a patient-centred electronic ‘personal health record’. We opted to modify a time-tested software solution OpenEMR-https://www.open-emr.org/OpenEMR is a free and open source, ONC certified, electronic health records and medical practice management application featuring fully integrated electronic health records, practice management, scheduling, electronic billing, internationalization, and multi-lingual support. Sri Lanka OpenEMR (SLOEMR) is now used at the University Family Medicine Centre, Faculty of Medicine, University of Kelaniya at Ragama. Paper medical records of more than a decade were converted to the electronic format. We are in the planning process of piloting the SLOEMR in the Ragama Medical Officer of Health Area with a population of 70,000, with a single electronic record for each person across all private and public sector healthcare providers.
Sri Lankan English (SLE) has unique phonological, morphological, lexical and syntactic features which have gradually developed since the introduction of English to Sri Lanka. Vocabulary is one of the first features to develop in SLE. Although the SLE vocabulary has been studied and recorded, its generational difference has not been examined. The objective of the study was to investigate if the ‘generational change' observable in the SLE vocabulary could be considered an evolution. This was done through a qualitative, comparative analysis of the vocabulary used in the decades 1955 – 1965 and 2005 – 2015. The theoretical base of the research was defined using two theories of language evolution: the apparent-time hypothesis and age-gradedness. The primary data was taken from the Ceylon Observer of the decade 1955 – 1965 and the Sunday Observer of the decade 2005 - 2015. The words were used in a questionnaire survey of 60 participants of which 30 were of the age 15 – 25 years and 30 were of the age 65- 75 years. The results of the survey were then analyzed in detail through 10 interviews. The surveys and the interviews were conducted to prove / disprove the age-gradedness of the SLE vocabulary and to prove / disprove the apparent-time hypothesis in relation to the SLE vocabulary. Most of the vocabulary used disproved age-gradedness. The usages of these terms were found to be generation specific, which supported that the SLE vocabulary is not age-graded. The interviews supported the apparent-time hypotheses as the older generation showed that their vocabulary has not changed significantly over the years. From these observations, it could be concluded that within the scope of the research, the generational difference observable in the SLE vocabulary over 60 years could be termed an evolution.
Collaborative project with the Faculty of Medicine, and Colombo North Teaching Hospital, Sri Lanka.