Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to “Shakespeare Authorship Question”
Refat Aljumily
Additional contact information
Refat Aljumily: School of English Literature, Language and Linguistics, University of Newcastle, Newcastle upon Tyne, Tyne and Wear NE1 7RU, UK
Social Sciences, 2015, vol. 4, issue 3, 1-42
Abstract:
A few literary scholars have long claimed that Shakespeare did not write some of his best plays (history plays and tragedies) and proposed at one time or another various suspect authorship candidates. Most modern-day scholars of Shakespeare have rejected this claim, arguing that strong evidence that Shakespeare wrote the plays and poems being his name appears on them as the author. This has caused and led to an ongoing scholarly academic debate for quite some long time. Stylometry is a fast-growing field often used to attribute authorship to anonymous or disputed texts. Stylometric attempts to resolve this literary puzzle have raised interesting questions over the past few years. The following paper contributes to “the Shakespeare authorship question” by using a mathematically-based methodology to examine the hypothesis that Shakespeare wrote all the disputed plays traditionally attributed to him. More specifically, the mathematically based methodology used here is based on Mean Proximity, as a linear hierarchical clustering method, and on Principal Components Analysis, as a non-hierarchical linear clustering method. It is also based, for the first time in the domain, on Self-Organizing Map U-Matrix and Voronoi Map, as non-linear clustering methods to cover the possibility that our data contains significant non-linearities. Vector Space Model (VSM) is used to convert texts into vectors in a high dimensional space. The aim of which is to compare the degrees of similarity within and between limited samples of text (the disputed plays). The various works and plays assumed to have been written by Shakespeare and possible authors notably, Sir Francis Bacon, Christopher Marlowe, John Fletcher, and Thomas Kyd, where “similarity” is defined in terms of correlation/distance coefficient measure based on the frequency of usage profiles of function words, word bi-grams, and character triple-grams. The claim that Shakespeare authored all the disputed plays traditionally attributed to him is falsified in favor of the alternative authors according to the stylistic criteria and analytic methodology used. The result of this validated analysis is empirically-based, objective, and involves replicable evidence which can be used in conjunction with existing arguments to resolve the question of whether or not Shakespeare of Stratford-upon-Avon wrote all the disputed plays traditionally attributed to him.
Keywords: stylometry; text-length normalization; dimensionality-reduction; dendrogram; word bi-grams; character triple-grams; correlation matrix; centroid analysis; clustering tendency test; vector space (search for similar items in EconPapers)
JEL-codes: A B N P Y80 Z00 (search for similar items in EconPapers)
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2076-0760/4/3/758/pdf (application/pdf)
https://www.mdpi.com/2076-0760/4/3/758/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jscscx:v:4:y:2015:i:3:p:758-799:d:55888
Access Statistics for this article
Social Sciences is currently edited by Ms. Yvonne Chu
More articles in Social Sciences from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().