Offensive and abusive language detection in Arabic texts.

مؤلف

Al-Mandhari, Salim Mohammed Salim.

الناشر

University of Wolverhampton

ميلادي

2019

اللغة

الأنجليزية

الموضوع

Arabic texts

Social media

الملخص الإنجليزي

Document classification tends to be one of the most popular applications in Sentiment Analysis. Among many potential subjects studied in this area, offensive and abusive language publicity on social media triggered the observers interest to reduce its risks on users; children in particular. Despite the fact many studies have been conducted on this issue in English texts, the number of Sentiment Analysis studies on Arabic texts remains small. Therefore, this paper focuses on offensive and abusive language detection in Arabic social media texts by using three machine learning algorithms; namely Naïve Bayes (NB), Support Vector Machine (SVM) and fastText. The algorithms were implemented on a corpus of YouTube comments consisting of 47K comments. The results demonstrated that fastText outperformed the other algorithms by reaching F-score of 82% and it is viable for Arabic social media classification. It was also shown that word feature of tri-gram enhances classification performance, though other features were applied such as TF-IDF and grid-search. Similarly, the results illustrated a positive correlation with the difference between definitions of offence and abuse.

المجموعة

الرسائل والأطروحات

URL المصدر

https://hdl.handle.net/20.500.12408/4167

قالب العنصر

الرسائل والأطروحات الجامعية