Cyril Weerasooriya (සිරිල් වීරසූරිය)

Cyril Weerasooriya (සිරිල් වීරසූරිය)

PhD Student

Lab for Population Intelligence

Rochester Institute of Technology

Doctoral Student Association (RIT)

I’ll be at EMNLP 2023 presenting work on VOICED dataset. Recent work at ACL 2023 presenting two papers, DisCo and CrowdOpinion.

Hello there. I study how we can predict human disagreements during human annotation using machine learning. This work is helpful when we want to model human disagreements, which is conventionally considered annotation noise. Following recent breakthroughs in machine learning research has shown instances where the algorithms being biased towards specific groups. I’m PhD student at the Lab for Population Intelligence at RIT led by Professor Christopher Homan.

Currently in the job market. I’ve interned at Amazon Ads as an Applied Scientist Intern (2023), Meta (Facebook) in Summer 2022 and RPI (IBM Watson Project) in Summer 2019.

In parallel, I’m also working with University of Kelaniya in Sri Lanka to build an electronic medical record system for the entity of Sri Lanka.

My previous research also comes from sociolinguistics, studying the evolution of Sri Lankan English across multiple generations.

I enjoy DevOPS side of systems and building systems that are end to end.

When I’m not at my desk, I envy traveling.

Interests
  • Label Distribution Learning
  • Computational Linguistics & Natural Language Processing
  • DevOPS
  • Data Science
  • Machine Learning
  • Photography
Education
  • PhD in Computer Science, Current

    Rochester Institute of Technology

  • BSc in Computer Science, 2017

    University of Kelaniya

Recent Publications

Quickly discover relevant content by filtering publications.

Experience

 
 
 
 
 
Meta (Formerly known as Facebook)
Machine Learning Research Engineer (Summer Intern)
May 2022 – Aug 2022 Menlo Park, CA

Worked at the Facebook Creators Wellbeing Team on Public Conversations. Overlooked models for improving the comments recommendation and ranking models on Facebook Pages with varying populations of followers from around the globe.

Project - Introduction of a Multi-Label Multi-Task model for assisting page administrations for comment management.

  • Experimented with real-time data from Facebook users for model building, millions of actions per day.
  • Big data pipelines with Presto (similar to SQL) for collecting and processing data for the model.
  • The model bypassed the existing individual action-based models used by 40% based on ROC and PR AUC scores.
  • Evaluated the model in production with A/B testing on 4% of overall global Facebook users.
 
 
 
 
 
Rochester Institute of Technology
Research Assistant
Aug 2018 – Present Rochester, NY

Responsibilities include:

  • Working on research for predicting human disagreements on natural language social media datasets.
  • Build research pipelines for deploying on Google Cloud using Python Machine Learning stack and MongoDB.
  • Presented work at ECAI 2020,LREC 2022.
 
 
 
 
 
Rochester Institute of Technology
Adjunct Lecturer
May 2020 – Aug 2020 Rochester, NY, NY
  • Taught CS635 - Introduction to Machine Learning virtually.
 
 
 
 
 
Tech Lead
Mar 2018 – Present Sri Lanka

Collaborative project with the Faculty of Medicine, and Colombo North Teaching Hospital, Sri Lanka.

  • Web based electronic patient record management system, tailored for Sri Lanka using PHP backend, and MySQL database on AWS.
  • Currently been used to identify potential patients with COVID-19, as it is the only EMR widely used in Sri Lanka.
  • The work also contributes to open sourced EMR Project, Open-EMR.
  • Research work published in NITC 2019 and workshop organizing committee for WONCA 2020.
 
 
 
 
 
HEALS Project (part of IBM AI Horizon) at Rensselaer Polytechnic Institute
Intern Research Assistant
May 2019 – Aug 2019 Troy, NY
  • Research based on natural language processing and information retrieval.
  • Conducted research on aggregating food related data sources for food knowledge graph, which is used with IBM Watson.
  • The tool was able to generate the nutritional content per FDA guidelines from a crowdsourced recipe.

Contact