EconPapers    
Economics at your fingertips  
 

Structured Element Extraction from Official Documents Based on BERT-CRF and Knowledge Graph-Enhanced Retrieval

Siyuan Chen, Liyuan Niu, Jinning Li, Xiaomin Zhu, Xuebin Zhuang and Yanqing Ye ()
Additional contact information
Siyuan Chen: School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou 510220, China
Liyuan Niu: Strategic Assessment and Consultation Institute, Military Academy of Sciences, Beijing 100071, China
Jinning Li: School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou 510220, China
Xiaomin Zhu: Strategic Assessment and Consultation Institute, Military Academy of Sciences, Beijing 100071, China
Xuebin Zhuang: School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou 510220, China
Yanqing Ye: Strategic Assessment and Consultation Institute, Military Academy of Sciences, Beijing 100071, China

Mathematics, 2025, vol. 13, issue 17, 1-24

Abstract: The growth of e-government has rendered automated element extraction from official documents a critical bottleneck for administrative efficiency. The core challenge lies in unifying deep semantic understanding with the structured domain knowledge required to interpret complex formats and specialized terminology. To address the limitations of existing methods, we propose a hybrid framework. Our approach leverages a BERT-CRF model for robust sequence labeling, a knowledge graph (KG)-driven retrieval system to ground the model in verifiable facts, and a large language model (LLM) as a reasoning engine to resolve ambiguities and identify complex relationships. Validated on the DovDoc-CN dataset, our framework achieves a macro-average F1 score of 0.850, outperforming the BiLSTM-CRF baseline by 2.41 percentage points, and demonstrates high consistency, with a weighted F1 score of 0.984. The low standard deviation in the validation set further indicates the model’s stable performance across different subsets. These results confirm that our integrated approach provides an efficient and reliable solution for intelligent document processing, effectively handling the format diversity and specialized knowledge characteristic of government documents.

Keywords: document element extraction; BERT-CRF; knowledge graph; hybrid retrieval; large language model (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/17/2779/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/17/2779/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:17:p:2779-:d:1736943

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-10-04
Handle: RePEc:gam:jmathe:v:13:y:2025:i:17:p:2779-:d:1736943