Efficient List Intersection Algorithm for Short Documents by Document Reordering
Lianyin Jia,
Dongyang Li,
Haihe Zhou and
Fengling Xia ()
Additional contact information
Lianyin Jia: Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Dongyang Li: Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Haihe Zhou: Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
Fengling Xia: Faculty of Civil Aviation and Aeronautics, Kunming University of Science and Technology, Kunming 650500, China
Mathematics, 2024, vol. 12, issue 9, 1-13
Abstract:
List intersection plays a pivotal role in various domains such as search engines, database systems, and social networks. Efficient indexes and query strategies can significantly enhance the efficiency of list intersection. Existing inverted index-based algorithms fail to utilize the length information of documents and require excessive list intersections, resulting in lower efficiency. To address this issue, in this paper, we propose the LDRpV (Length-based Document Reordering plus Verification) algorithm. LDRpV filters out documents that are unlikely to satisfy the intersection results by reordering documents based on their length, thereby reducing the number of candidates. Additionally, to minimize the number of list intersection operations, an intersection and verification strategy is designed, where only the first m lists are intersected, and the resulting candidate set is directly verified. This approach effectively improves the efficiency of list intersection. Experimental results on four real datasets demonstrate that LDRpV can achieve a maximum efficiency improvement of 46.69% compared to the most competitive counterparts.
Keywords: LDRpV; list intersection; inverted index; document reordering; verification (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/12/9/1328/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/9/1328/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:9:p:1328-:d:1384103
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().