Additional information
Author | Daraje Kaba Gurmessa |
---|---|
ISBN | 978-1-63902-174-1 |
Language | |
Number of pages | 114 |
Publisher | |
Publication year |
The designed system for Afaan Oromo fake news detection system involves preprocessing like tokenization, Normalization, stop word removing and abbreviation resolving, feature extraction using like Term-Frequency-inverted document frequency, term frequency, and hash to know word importance that appears in the news and word appears in the corpus and N-grams which are a powerful Natural Language […]
ISBN: 978-1-63902-174-1
€29.50
Author | Daraje Kaba Gurmessa |
---|---|
ISBN | 978-1-63902-174-1 |
Language | |
Number of pages | 114 |
Publisher | |
Publication year |
The designed system for Afaan Oromo fake news detection system involves preprocessing like tokenization, Normalization, stop word removing and abbreviation resolving, feature extraction using like Term-Frequency-inverted document frequency, term frequency, and hash to know word importance that appears in the news and word appears in the corpus and N-grams which are a powerful Natural Language Processing technique in order to capture semantic and syntactic sequences was also used. Based on features extracted different classification algorisms like multinomial Naive Bayes, random forest, gradient boosting, and passive-aggressive are used. The final model was created by combining the n-grams, features extracted and transformed, and classifiers. The performance of the models was accessed and compared on the same news dataset using the most significant metrics by which a machine learning model performance is measured like classification accuracy, Error matrix, Classification Report (precision and recall) and area under Receiver Operating Characteristics Curve. Linguistic-based features were applied to the dataset to determine the news truth confidence score. Even though the dataset is a great issue the model Linear Passive-Aggressive with Term Frequency-Inverted Document Frequency vector and unigram performs unexpected with the highest accuracy of 97.2%, sensitivity of 97.9% and ROC AUC score of 97.5%. Passive aggressive performed more the other algorithms. Because the passive-aggressive algorithm is an online algorithm and fake news detection is also an online challenging problem both fits each other. The model generated some errors. Indeed, it is possible to anticipate such considerable contributions and positive effects of the system since Afaan Oromo is one of the morphologically rich and complex languages. The error rate was about 2.8%. This shows that the system can be performed with low error rates in high inflected languages such as Afaan Oromo.