This study aims to validate the NAYALex emotion lexicon, a comprehensive lexicon containing 245,822 emotion−word relationships across 6,469 English words, each mapped to at least one of 38 distinct emotions. It addresses the critical gap in existing emotion lexicons like National Research Council (NRC), which are limited in capturing emotions reflecting personality traits.
This study uses a quantitative approach, conducting experiments on two data sets: one with 11,880 Instagram posts used as a test set, and another with 26,600 sentence emotion pairs evaluated by human judges as the validation set. The analysis incorporates machine learning algorithms, including Naive Bayes, support vector machines (SVM) and K-nearest neighbors, to assess the lexicon’s performance.
The results demonstrate that NAYALex achieves an average validation rate of 77% and outperforms existing lexicons by extracting approximately four times more emotions, with a 24.7% coverage rate compared to NRC’s 6.5%. Among the tested algorithms, SVM achieved the highest classification accuracy of 93% on the validation data set, confirming the lexicon’s applicability for personality analysis.
This research offers a novel contribution by introducing the most comprehensive emotion lexicon to date, significantly enhancing the capacity for emotion and personality trait analysis from text. The findings pave the way for advanced applications in computational personality profiling, social media analytics and future emotion-based research.
