EconPapers    
Economics at your fingertips  
 

Improving author verification based on topic modeling

Nektaria Potha and Efstathios Stamatatos

Journal of the Association for Information Science & Technology, 2019, vol. 70, issue 10, 1074-1088

Abstract: Authorship analysis attempts to reveal information about authors of digital documents enabling applications in digital humanities, text forensics, and cyber‐security. Author verification is a fundamental task where, given a set of texts written by a certain author, we should decide whether another text is also by that author. In this article we systematically study the usefulness of topic modeling in author verification. We examine several author verification methods that cover the main paradigms, namely, intrinsic (attempt to solve a one‐class classification task) and extrinsic (attempt to solve a binary classification task) methods as well as profile‐based (all documents of known authorship are treated cumulatively) and instance‐based (each document of known authorship is treated separately) approaches combined with well‐known topic modeling methods such as Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). We use benchmark data sets and demonstrate that LDA is better combined with extrinsic methods, while the most effective intrinsic method is based on LSI. Moreover, topic modeling seems to be particularly effective for profile‐based approaches and the performance is enhanced when latent topics are extracted by an enriched set of documents. The comparison to state‐of‐the‐art methods demonstrates the great potential of the approaches presented in this study. It is also demonstrates that even when genre‐agnostic external documents are used, the proposed extrinsic models are very competitive.

Date: 2019
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1002/asi.24183

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jinfst:v:70:y:2019:i:10:p:1074-1088

Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=2330-1635

Access Statistics for this article

More articles in Journal of the Association for Information Science & Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jinfst:v:70:y:2019:i:10:p:1074-1088