EconPapers    
Economics at your fingertips  
 

Mathematical Data Models and Context-Based Features for Enhancing Historical Degraded Manuscripts Using Neural Network Classification

Pasquale Savino () and Anna Tonazzini
Additional contact information
Pasquale Savino: Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via G. Moruzzi, 1, 56124 Pisa, Italy
Anna Tonazzini: Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via G. Moruzzi, 1, 56124 Pisa, Italy

Mathematics, 2024, vol. 12, issue 21, 1-13

Abstract: A common cause of deterioration in historic manuscripts is ink transparency or bleeding from the opposite page. Philologists and paleographers can significantly benefit from minimizing these interferences when attempting to decipher the original text. Additionally, computer-aided text analysis can also gain from such text enhancement. In previous work, we proposed the use of neural networks (NNs) in combination with a data model that characterizes the damage when both sides of a page have been digitized. This approach offers the distinct advantage of allowing the creation of an artificial training set that teaches the NN to differentiate between clean and damaged pixels. We tested this concept using a shallow NN, which proved effective in categorizing texts with varying levels of deterioration. In this study, we adapt the NN design to tackling remaining classification uncertainties caused by areas of text overlap, inhomogeneity, and peaks of degradation. Specifically, we introduce a new output class for pixels within overlapping text areas and incorporate additional features related to the pixel context information to promote the same classification for pixels adjacent to each other. Our experiments demonstrate that these enhancements significantly improve the classification accuracy. This improvement is evident in the quality of both binarization, which aids in text analysis, and virtual restoration, aimed at recovering the manuscript’s original appearance. Tests conducted on a public dataset, using standard quality indices, reveal that the proposed method outperforms both our previous proposals and other notable methods found in the literature.

Keywords: ancient manuscript virtual restoration; degraded document binarization; shallow multilayer neural networks (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/21/3402/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/21/3402/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:21:p:3402-:d:1510653

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:12:y:2024:i:21:p:3402-:d:1510653