Scaled Pearson’s Correlation Coefficient for Evaluating Text Similarity Measures
Issa Atoum
Modern Applied Science, 2019, vol. 13, issue 10, 26
Abstract:
Despite the ever-increasing interest in the field of text similarity methods, the development of adequate text similarity methods is lagging. Some methods are decent in entailment while others are reasonable to the degree to which two texts are similar. Very often, these methods are compared using Pearson’s correlation; however, Pearson’s correlation is bound to outliers that could affect the final correlation coefficient figure. As a result, the Pearson correlation is inadequate to find which text similarity method is better in situations where data items are very similar or are unrelated. This paper borrows the scaled Pearson correlation from the finance domain and builds a metric that can evaluate the performance of similarity methods over cross-sectional datasets. Results showed that the new metric is fine-grained with the benchmark dataset scores range as a promising alternative to Pearson’s correlation. Moreover, extrinsic results from the application of the System Usability Scale (SUS) questionnaire on the scaled Pearson correlation revealed that the proposed metric is attaining attention from scholars which implicate its usage in the academia.
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://ccsenet.org/journal/index.php/mas/article/download/0/0/40746/42038 (application/pdf)
https://ccsenet.org/journal/index.php/mas/article/view/0/40746 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ibn:masjnl:v:13:y:2019:i:10:p:26
Access Statistics for this article
More articles in Modern Applied Science from Canadian Center of Science and Education Contact information at EDIRC.
Bibliographic data for series maintained by Canadian Center of Science and Education ().