AORO: Auto-Optimizing Reasoning Order for Multi-Hop Question Answering

Li, Shaobo; Cao, Ziyi; Bu, Kun; Ji, Zhenzhou

AORO: Auto-Optimizing Reasoning Order for Multi-Hop Question Answering

Shaobo Li (), Ziyi Cao, Kun Bu and Zhenzhou Ji
Additional contact information
Shaobo Li: Department of Computer Science and Technology, Harbin Institute of Technology (Weihai), Weihai 264209, China
Ziyi Cao: Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Kun Bu: Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Zhenzhou Ji: Department of Computer Science and Technology, Harbin Institute of Technology (Weihai), Weihai 264209, China

Mathematics, 2025, vol. 13, issue 21, 1-22

Abstract: Answering multi-hop questions requires first retrieving a sequence of supporting facts, and the order in which these facts are retrieved significantly affects retriever performance. To achieve a clearer reasoning order, it is beneficial to address the easier facts first then move to the more difficult ones. However, current orders are usually pre-defined during data construction or specified manually, which restricts the model’s reasoning potential. This paper proposes Auto-Optimizing Reasoning Order (AORO), a method to automatically optimize the reasoning order for each sample, where difficulty is determined by a retrieval model trained with carefully curated data. First, a retriever is trained using data that encompasses all combinations of the possible reasoning orders. The trained retriever is then used to assess the difficulty of each fact, placing the fact with the least difficulty at the beginning of the sequence. Next, the retrieval model is retrained based on these optimized sequences, which are empirically better suited to its capabilities. This process creates an iterative self-debiasing paradigm, and these steps are repeated until all facts are reordered. Experiments conducted on two multi-hop QA benchmarks, QASC and MultiRC, demonstrate the effectiveness of AORO, which outperforms strong baselines using the same PTM, and further enables advanced PTMs to achieve improvements of up to 1.6 points in Recall@10 and 3.7 points in F1 score. Additional case analyses reveal empirical patterns in the optimal reasoning order: the pattern appears independent of the dataset and the underlying pre-trained model; and the sequence proceeds by confirming the truth of the question, answering the question, and filling in any gaps, which aligns with human reasoning.

Keywords: natural language processing; question answering; information retrieval; neural network; machine learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/21/3489/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/21/3489/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:21:p:3489-:d:1785086

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().