EconPapers    
Economics at your fingertips  
 

Unsupervised Classification of Chemical Compounds

P. Guttiérrez Toscano and F. H. C. Marriott

Journal of the Royal Statistical Society Series C, 1999, vol. 48, issue 2, 153-163

Abstract: Clustering chemical compounds of similar structure is important in the pharmaceutical industry. One way of describing the structure is the chemical `fingerprint'. The fingerprint is a string of binary digits, and typical data sets consist of very large numbers of fingerprints; a suitable clustering procedure must take account of the properties of this method of coding, and must be able to handle large data sets. This paper describes the analysis of a set of fingerprint data. The analysis was based on an appropriate distance measure derived from the fingerprints, followed by metric scaling into a low‐dimensional space. An approximation to metric scaling, suitable for very large data sets, was investigated. Cluster analysis using two programs, mclust and AutoClass‐C, was carried out on the scaled data.

Date: 1999
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1111/1467-9876.00146

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jorssc:v:48:y:1999:i:2:p:153-163

Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-9876

Access Statistics for this article

Journal of the Royal Statistical Society Series C is currently edited by R. Chandler and P. W. F. Smith

More articles in Journal of the Royal Statistical Society Series C from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jorssc:v:48:y:1999:i:2:p:153-163