Web document ranking via active learning and kernel principal component analysis

Cai, Fei; Chen, Honghui; Shu, Zhen

Web document ranking via active learning and kernel principal component analysis

Fei Cai (), Honghui Chen () and Zhen Shu ()
Additional contact information
Fei Cai: Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, P. R. China
Honghui Chen: Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, P. R. China
Zhen Shu: Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, P. R. China

International Journal of Modern Physics C (IJMPC), 2015, vol. 26, issue 04, 1-18

Abstract: Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs ontop-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@Kand NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.

Keywords: Information retrieval; document ranking; learning to rank; active learning; noise reduction (search for similar items in EconPapers)
Date: 2015
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0129183115500412
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:ijmpcx:v:26:y:2015:i:04:n:s0129183115500412

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0129183115500412

Access Statistics for this article

International Journal of Modern Physics C (IJMPC) is currently edited by H. J. Herrmann

More articles in International Journal of Modern Physics C (IJMPC) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().