Improving author verification based on topic modeling
Nektaria Potha and
Efstathios Stamatatos
Journal of the Association for Information Science & Technology, 2019, vol. 70, issue 10, 1074-1088
Abstract:
Authorship analysis attempts to reveal information about authors of digital documents enabling applications in digital humanities, text forensics, and cyber‐security. Author verification is a fundamental task where, given a set of texts written by a certain author, we should decide whether another text is also by that author. In this article we systematically study the usefulness of topic modeling in author verification. We examine several author verification methods that cover the main paradigms, namely, intrinsic (attempt to solve a one‐class classification task) and extrinsic (attempt to solve a binary classification task) methods as well as profile‐based (all documents of known authorship are treated cumulatively) and instance‐based (each document of known authorship is treated separately) approaches combined with well‐known topic modeling methods such as Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). We use benchmark data sets and demonstrate that LDA is better combined with extrinsic methods, while the most effective intrinsic method is based on LSI. Moreover, topic modeling seems to be particularly effective for profile‐based approaches and the performance is enhanced when latent topics are extracted by an enriched set of documents. The comparison to state‐of‐the‐art methods demonstrates the great potential of the approaches presented in this study. It is also demonstrates that even when genre‐agnostic external documents are used, the proposed extrinsic models are very competitive.
Date: 2019
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/asi.24183
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jinfst:v:70:y:2019:i:10:p:1074-1088
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=2330-1635
Access Statistics for this article
More articles in Journal of the Association for Information Science & Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().