Automating Twitter Data Annotation Process for Sentiment Analysis

Main Article Content

Hasanein Alharbi

Abstract

Background:


Sentiment analysis algorithms require high-quality annotated data during the training phase. However, this requirement has led to complex, time-consuming and costly manual data annotation process. To address these challenges, this research proposes an automatic data annotation process for sentiment analysis.


Materials and Methods:


Three semantic orientation measures (Pointwise Mutual Information, latent Semantic Analysis, and Word2Vec), five classification algorithms (K-Nearest Neighbors, Logistic Regression, naïve Bayes, Random Forest, Support Vector Machine) and NRC lexicon thesaurus are used to automate the process of tweet annotation for sentiment analysis.


Results:


Tweets were annotated using five classifiers and three semantic measures, forming fifteen combinations. The Inter-Annotator Agreement (IAA) among these combinations was evaluated using Cohen’s Kappa statistic. The obtained results show that (Pointwise Mutual Information + Logistic Regression) and (Pointwise Mutual Information + Naïve Bayes) achieved the highest agreement score of 0.7008.  


Conclusion:


These results have shown that the corpus-based semantic orientation measures have provided substantive results. However, it can still be enhanced through the use of a broader vocabulary, the application of contextual information and the implementation of the newest deep learning algorithms.

Article Details

Section

Articles

How to Cite

[1]
“Automating Twitter Data Annotation Process for Sentiment Analysis”, JUBPAS, vol. 33, no. 4, pp. 114–136, Jan. 2026, doi: 10.29196/jubpas.v33i4.6146.

Similar Articles

You may also start an advanced similarity search for this article.