EconPapers    
Economics at your fingertips  
 

Gender classification of microblog text based on authorial style

Shubhadeep Mukherjee () and Pradip Kumar Bala ()
Additional contact information
Shubhadeep Mukherjee: Indian Institute of Management Ranchi
Pradip Kumar Bala: Indian Institute of Management Ranchi

Information Systems and e-Business Management, 2017, vol. 15, issue 1, No 6, 117-138

Abstract: Abstract Gender profiling of unstructured text data has several applications in areas such as marketing, advertising, legal investigation, and recommender systems. The automatic detection of gender in microblogs, like twitter, is a difficult task. It requires a system that can use knowledge to interpret the linguistic styles being used by the genders. In this paper, we try to provide this knowledge for such a system by considering different sets of features, which are relatively independent of the text, such as function words and part of speech n-grams. We test a range of different feature sets using two different classifiers; namely Naïve Bayes and maximum entropy algorithms. Our results show that the gender detection task benefits from the inclusion of features that capture the authorial style of the microblog authors. We achieve an accuracy of approximately 71 %, which outperforms the classification accuracy of commercially available gender detection software like Gender Genie and Gender Guesser.

Keywords: Text mining; Twitter; Natural language processing; Gender classification; Knowledge discovery; Supervised learning; Artificial intelligence; Business intelligence (search for similar items in EconPapers)
Date: 2017
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s10257-016-0312-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:infsem:v:15:y:2017:i:1:d:10.1007_s10257-016-0312-0

Ordering information: This journal article can be ordered from
http://www.springer. ... ystems/journal/10257

DOI: 10.1007/s10257-016-0312-0

Access Statistics for this article

Information Systems and e-Business Management is currently edited by Jörg Becker and Michael J. Shaw

More articles in Information Systems and e-Business Management from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:infsem:v:15:y:2017:i:1:d:10.1007_s10257-016-0312-0