EconPapers    
Economics at your fingertips  
 

Neural Architecture Comparison for Bibliographic Reference Segmentation: An Empirical Study

Rodrigo Cuéllar Hidalgo, Raúl Pinto Elías, Juan-Manuel Torres-Moreno (), Osslan Osiris Vergara Villegas, Gerardo Reyes Salgado and Andrea Magadán Salazar
Additional contact information
Rodrigo Cuéllar Hidalgo: Biblioteca Daniel Cosío Villegas, El Colegio de México, Carretera Picacho Ajusco 20, Mexico City 14110, Mexico
Raúl Pinto Elías: Tecnológico Nacional de México/CENIDET, Cuernavaca 62490, Mexico
Juan-Manuel Torres-Moreno: Laboratoire Informatique d’Avignon, Université d’Avignon, 339 Chemin des Meinajariès, CEDEX 9, 84911 Avignon, France
Osslan Osiris Vergara Villegas: Industrial and Manufacturing Engineering Department, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez 32310, Mexico
Gerardo Reyes Salgado: Departamento de Informática y Estadística, Universidad Rey Juan Carlos, Av. del Alcalde de Móstoles, 28933 Madrid, Spain
Andrea Magadán Salazar: Tecnológico Nacional de México/CENIDET, Cuernavaca 62490, Mexico

Data, 2024, vol. 9, issue 5, 1-24

Abstract: In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication.

Keywords: reference mining; BiLSTM; transformers; byte-pair encoding; Conditional Random Fields (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/9/5/71/pdf (application/pdf)
https://www.mdpi.com/2306-5729/9/5/71/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:9:y:2024:i:5:p:71-:d:1397326

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:9:y:2024:i:5:p:71-:d:1397326