From Proposals to Outcomes: Concept-Aligned Chunking for Cross-Document Relevance Assessment in Research Funding Review
Fengchi Yuan (),
Keqin Guan (),
Siyu Chen (),
Bokui Chen () and
Wai Kin Victor Chan ()
Additional contact information
Fengchi Yuan: Tsinghua Shenzhen International Graduate School, Tsinghua University, Institute of Data and Information
Keqin Guan: Tsinghua Shenzhen International Graduate School, Tsinghua University, Institute of Data and Information
Siyu Chen: Tsinghua Shenzhen International Graduate School, Tsinghua University, Institute of Data and Information
Bokui Chen: Tsinghua Shenzhen International Graduate School, Tsinghua University, Institute of Data and Information
Wai Kin Victor Chan: Tsinghua Shenzhen International Graduate School, Tsinghua University, Institute of Data and Information
A chapter in AI, Society and Digital Transformation, 2026, pp 66-77 from Springer
Abstract:
Abstract Government-funded science and technology innovation projects are vital for driving industrial development and supporting talent cultivation. However, evaluating their outcomes remains a significant challenge, especially when some researchers misattribute unrelated publications to funding projects, raising concerns about research integrity and transparency. This paper focuses on the challenging task of assessing the relevance between project proposals and research outputs, formulated as a long-text matching problem. Due to the fact that even valid research outputs often address only subtopics of the original project objectives, traditional methods, which typically compare entire documents, often fail to provide accurate relevance assessments. To address this, we propose ConceptSplitter, a concept-based chunking method inspired by long-text structuring strategies. As part of a retrieval-augmented generation (RAG) pipeline, ConceptSplitter serves as the chunking module that improves retrieval precision and contextual relevance in large language model inference. To support robust evaluation, we also construct a domain-diverse dataset that mirrors real-world funding scenarios. Experiments on this dataset show that ConceptSplitter outperforms traditional methods by enhancing chunking quality, improving the accuracy of relevance classification, and providing more reliable confidence estimation in large language model outputs.
Keywords: Research funding evaluation; Long-text matching; Retrieval-augmented generation; Pretrained language models (search for similar items in EconPapers)
Date: 2026
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:lnopch:978-3-032-13116-4_6
Ordering information: This item can be ordered from
http://www.springer.com/9783032131164
DOI: 10.1007/978-3-032-13116-4_6
Access Statistics for this chapter
More chapters in Lecture Notes in Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().