Human assessments of document similarity

Westerman, S.J.; Cribbin, T.; Collins, J.

Human assessments of document similarity

S.J. Westerman, T. Cribbin and J. Collins

Journal of the American Society for Information Science and Technology, 2010, vol. 61, issue 8, 1535-1542

Abstract: Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n‐gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n‐gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N‐gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n‐gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems.

Date: 2010
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.21361

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:61:y:2010:i:8:p:1535-1542

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().