Classifying Hate Speech Using a Two-Layer Model
Yiwen Tang and
Nicole Dalzell
Statistics and Public Policy, 2019, vol. 6, issue 1, 80-86
Abstract:
Social media and other online sites are being increasingly scrutinized as platforms for cyberbullying and hate speech. Many machine learning algorithms, such as support vector machines, have been adopted to create classification tools to identify and potentially filter patterns of negative speech. While effective for prediction, these methodologies yield models that are difficult to interpret. In addition, many studies focus on classifying comments as either negative or neutral, rather than further separating negative comments into subcategories. To address both of these concerns, we introduce a two-stage model for classifying text. With this model, we illustrate the use of internal lexicons, collections of words generated from a pre-classified training dataset of comments that are specific to several subcategories of negative comments. In the first stage, a machine learning algorithm classifies each comment as negative or neutral, or more generally target or nontarget. The second stage of model building leverages the internal lexicons (called L2CLs) to create features specific to each subcategory. These features, along with others, are then used in a random forest model to classify the comments into the subcategories of interest. We demonstrate our approach using two sets of data. Supplementary materials for this article are available online.
Date: 2019
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/2330443X.2019.1660285 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:usppxx:v:6:y:2019:i:1:p:80-86
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/uspp20
DOI: 10.1080/2330443X.2019.1660285
Access Statistics for this article
Statistics and Public Policy is currently edited by Eric Sampson
More articles in Statistics and Public Policy from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().