A method to extract essential keywords from a tweet using NLP tools

Abstract

A tweet is an authentic use of Natural Language where the user has to deliver the message in 140 characters or less. According to previous researchers, this restriction increases the possible ambiguity of a tweet making it difficult for traditional Natural Language Processing (NLP) tools to analyze it. This research enhances the machine learning based Stanford CoreNLP Part-of-Speech (POS) tagger with the Twitter model to extract essential keywords from a tweet. The system was enhanced using two rule-based parsers and a corpus. The research was conducted using tweets of customer service requests sent to a telecommunication company. A domain specific corpus was compiled after analyzing the tweets. The POS tagger extracted the keywords while the parsers removed any possible noise and extracted any other keywords missed by the POS tagger. The evaluation of the system was done using the Turing Test. The proposed system was tested and compared against the Stanford CoreNLP. The testing was conducted using 6 test cases, each consisting of a human keyword generator and a supervisor. In order to ensure the impartiality and intellectual diversity, the response generators and supervisors were representatives of 6 different fields. As a result of the enhancements, the Turing Test score of the system increased from 50.00% to 83.33%. The accuracy of the system could be further improved by using a complete domain specific corpus. Since the approach used theoretical linguistic features of a sentence, the same method could be employed for other NLP tools.

Publication
16th International Conference on Advances in ICT for Emerging Regions, ICTer 2016 - Conference Proceedings