Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents

Dhar, Ankita; Mukherjee, Himadri; Sen, Shibaprasad; Sk, Md Obaidullah; Biswas, Amitabha; Gonçalves, Teresa; Roy, Kaushik

Author Identification from Literary Articles with Visual Features: A Case Study with Bangla Documents

Ankita Dhar, Himadri Mukherjee, Shibaprasad Sen, Md Obaidullah Sk, Amitabha Biswas, Teresa Gonçalves and Kaushik Roy ()
Additional contact information
Ankita Dhar: Department of Computational Science, Brainware University, Kolkata 700125, India
Himadri Mukherjee: Department of Computer Science, West Bengal State University, Kolkata 700126, India
Shibaprasad Sen: Techno Main Saltlake, Kolkata 700091, India
Md Obaidullah Sk: Department of Computer Science and Engineering, Aliah University, Kolkata 700156, India
Amitabha Biswas: Department of Computer Science, West Bengal State University, Kolkata 700126, India
Teresa Gonçalves: Department of Computer Science, University of Évora, 7000-671 Évora, Portugal
Kaushik Roy: Department of Computer Science, West Bengal State University, Kolkata 700126, India

Future Internet, 2022, vol. 14, issue 10, 1-20

Abstract: Author identification is an important aspect of literary analysis, studied in natural language processing (NLP). It aids identify the most probable author of articles, news texts or social media comments and tweets, for example. It can be applied to other domains such as criminal and civil cases, cybersecurity, forensics, identification of plagiarizer, and many more. An automated system in this context can thus be very beneficial for society. In this paper, we propose a convolutional neural network (CNN)-based author identification system from literary articles. This system uses visual features along with a five-layer convolutional neural network for the identification of authors. The prime motivation behind this approach was the feasibility to identify distinct writing styles through a visualization of the writing patterns. Experiments were performed on 1200 articles from 50 authors achieving a maximum accuracy of 93.58%. Furthermore, to see how the system performed on different volumes of data, the experiments were performed on partitions of the dataset. The system outperformed standard handcrafted feature-based techniques as well as established works on publicly available datasets.

Keywords: author identification; statistical-based features; image-based features; deep learning; CNN (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/10/272/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/10/272/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:10:p:272-:d:923732

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().