Structured Element Extraction from Official Documents Based on BERT-CRF and Knowledge Graph-Enhanced Retrieval
Siyuan Chen,
Liyuan Niu,
Jinning Li,
Xiaomin Zhu,
Xuebin Zhuang and
Yanqing Ye ()
Additional contact information
Siyuan Chen: School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou 510220, China
Liyuan Niu: Strategic Assessment and Consultation Institute, Military Academy of Sciences, Beijing 100071, China
Jinning Li: School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou 510220, China
Xiaomin Zhu: Strategic Assessment and Consultation Institute, Military Academy of Sciences, Beijing 100071, China
Xuebin Zhuang: School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou 510220, China
Yanqing Ye: Strategic Assessment and Consultation Institute, Military Academy of Sciences, Beijing 100071, China
Mathematics, 2025, vol. 13, issue 17, 1-24
Abstract:
The growth of e-government has rendered automated element extraction from official documents a critical bottleneck for administrative efficiency. The core challenge lies in unifying deep semantic understanding with the structured domain knowledge required to interpret complex formats and specialized terminology. To address the limitations of existing methods, we propose a hybrid framework. Our approach leverages a BERT-CRF model for robust sequence labeling, a knowledge graph (KG)-driven retrieval system to ground the model in verifiable facts, and a large language model (LLM) as a reasoning engine to resolve ambiguities and identify complex relationships. Validated on the DovDoc-CN dataset, our framework achieves a macro-average F1 score of 0.850, outperforming the BiLSTM-CRF baseline by 2.41 percentage points, and demonstrates high consistency, with a weighted F1 score of 0.984. The low standard deviation in the validation set further indicates the model’s stable performance across different subsets. These results confirm that our integrated approach provides an efficient and reliable solution for intelligent document processing, effectively handling the format diversity and specialized knowledge characteristic of government documents.
Keywords: document element extraction; BERT-CRF; knowledge graph; hybrid retrieval; large language model (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/17/2779/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/17/2779/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:17:p:2779-:d:1736943
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().