وثيقة

Offensive and abusive language detection in Arabic texts.

الناشر
University of Wolverhampton
ميلادي
2019
اللغة
الأنجليزية
الموضوع
الملخص الإنجليزي
Document classification tends to be one of the most popular applications in Sentiment Analysis. Among many potential subjects studied in this area, offensive and abusive language publicity on social media triggered the observers interest to reduce its risks on users; children in particular. Despite the fact many studies have been conducted on this issue in English texts, the number of Sentiment Analysis studies on Arabic texts remains small. Therefore, this paper focuses on offensive and abusive language detection in Arabic social media texts by using three machine learning algorithms; namely Naïve Bayes (NB), Support Vector Machine (SVM) and fastText. The algorithms were implemented on a corpus of YouTube comments consisting of 47K comments. The results demonstrated that fastText outperformed the other algorithms by reaching F-score of 82% and it is viable for Arabic social media classification. It was also shown that word feature of tri-gram enhances classification performance, though other features were applied such as TF-IDF and grid-search. Similarly, the results illustrated a positive correlation with the difference between definitions of offence and abuse.
قالب العنصر
الرسائل والأطروحات الجامعية

مواد أخرى لنفس الموضوع

الرسائل والأطروحات الجامعية
1
0
Al-Hariziyah, Shamma Nasser Mohammed.
Sultan Qaboos University.
2022
الرسائل والأطروحات الجامعية
0
0
Al-Riyami, Fawzi Abdullah.
Sultan Qaboos University
2016
مؤتمرات وورش عمل
1
1
Segumpan, Reynaldo Gacho.
كلية الآداب و العلوم الاجتماعية، جامعة السلطان قابوس.
2016-12