EconPapers    
Economics at your fingertips  
 

An End-to-End Formula Recognition Method Integrated Attention Mechanism

Mingle Zhou, Ming Cai, Gang Li and Min Li ()
Additional contact information
Mingle Zhou: Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
Ming Cai: Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
Gang Li: Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
Min Li: Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

Mathematics, 2022, vol. 11, issue 1, 1-17

Abstract: Formula recognition is widely used in document intelligent processing, which can significantly shorten the time for mathematical formula input, but the accuracy of traditional methods could be higher. In order to solve the complexity of formula input, an end-to-end encoder-decoder framework with an attention mechanism is proposed that converts formulas in pictures into LaTeX sequences. The Vision Transformer (VIT) is employed as the encoder to convert the original input picture into a set of semantic vectors. Due to the two-dimensional nature of mathematical formula, in order to accurately capture the formula characters’ relative position and spatial characteristics, positional embedding is introduced to ensure the uniqueness of the character position. The decoder adopts the attention-based Transformer, in which the input vector is translated into the target LaTeX character. The model adopts joint codec training and Cross-Entropy as a loss function, which is evaluated on the im2latex-100k dataset and CROHME 2014. The experiment shows that BLEU reaches 92.11, MED is 0.90, and Exact Match(EM) is 0.62 on the im2latex-100k dataset. This paper’s contribution is to introduce machine translation to formula recognition and realize the end-to-end transformation from the trajectory point sequence of formula to latex sequence, providing a new idea of formula recognition based on deep learning.

Keywords: formula recognition; neural network; vision transformer; encoder-decoder; attention mechanism (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/1/177/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/1/177/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2022:i:1:p:177-:d:1019082

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:11:y:2022:i:1:p:177-:d:1019082