EconPapers    
Economics at your fingertips  
 

A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

Ziyun Deng and Tingqin He
Additional contact information
Ziyun Deng: College of Economics and Trade, Changsha Commerce & Tourism College, Changsha 410116, China
Tingqin He: National Supercomputing Center in Changsha, Hunan University, Changsha 410116, China

Future Internet, 2018, vol. 10, issue 12, 1-12

Abstract: To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website “www.jd.com”, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS).

Keywords: method for filtering pages; similarity degree; dynamic programming; combination method (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/10/12/124/pdf (application/pdf)
https://www.mdpi.com/1999-5903/10/12/124/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:10:y:2018:i:12:p:124-:d:190446

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jftint:v:10:y:2018:i:12:p:124-:d:190446