New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping
Jean-Charles Lamirel (),
Claire Francois,
Shadi Al Shehabi and
Martial Hoffmann
Additional contact information
Jean-Charles Lamirel: LORIA Vandoeuvre-lès-Nancy (
Claire Francois: URI/INIST-CNRS Vandoeuvre-lès-Nancy (
Shadi Al Shehabi: LORIA Vandoeuvre-lès-Nancy (
Martial Hoffmann: URI/INIST-CNRS Vandoeuvre-lès-Nancy (
Scientometrics, 2004, vol. 60, issue 3, No 16, 445-562
Abstract:
Abstract The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonen's SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM. Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.
Date: 2004
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://link.springer.com/10.1023/B:SCIE.0000034386.05278.e8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:60:y:2004:i:3:d:10.1023_b:scie.0000034386.05278.e8
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1023/B:SCIE.0000034386.05278.e8
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().