Conditional complexity of compression for authorship attribution
Mikhail B. Malyutov,
Chammi Irosha Wickramasinghe and
Sufeng Li
No 2007-057, SFB 649 Discussion Papers from Humboldt University Berlin, Collaborative Research Center 649: Economic Risk
Abstract:
We introduce new stylometry tools based on the sliced conditional compression complexity of literary texts which are inspired by the nearly optimal application of the incomputable Kolmogorov conditional complexity (and presumably approximates it). Whereas other stylometry tools can occasionally be very close for different authors, our statistic is apparently strictly minimal for the true author, if the query and training texts are sufficiently large, compressor is sufficiently good and sampling bias is avoided (as in the poll samplings). We tune it and test its performance on attributing the Federalist papers (Madison vs. Hamilton). Our results confirm the previous attribution of Federalist papers by Mosteller and Wallace (1964) to Madison using the Naive Bayes classifier and the same attribution based on alternative classifiers such as SVM, and the second order Markov model of language. Then we apply our method for studying the attribution of the early poems from the Shakespeare Canon and the continuation of Marlowe's poem 'Hero and Leander' ascribed to G. Chapman.
Keywords: compression complexity; authorship attribution (search for similar items in EconPapers)
JEL-codes: C12 C15 C63 (search for similar items in EconPapers)
Date: 2007
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.econstor.eu/bitstream/10419/25229/1/558616682.PDF (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:zbw:sfb649:sfb649dp2007-057
Access Statistics for this paper
More papers in SFB 649 Discussion Papers from Humboldt University Berlin, Collaborative Research Center 649: Economic Risk Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().