The Fundamental Clustering and Projection Suite (FCPS): A Dataset Collection to Test the Performance of Clustering and Data Projection Algorithms
Alfred Ultsch and
Jörn Lötsch
Additional contact information
Alfred Ultsch: DataBionics Research Institute, University of Marburg, 35032 Marburg, Germany
Jörn Lötsch: Institute of Clinical Pharmacology, Goethe - University, 60590 Frankfurt am Main, Germany
Data, 2020, vol. 5, issue 1, 1-9
Abstract:
In the context of data science, data projection and clustering are common procedures. The chosen analysis method is crucial to avoid faulty pattern recognition. It is therefore necessary to know the properties and especially the limitations of projection and clustering algorithms. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). The FCPS contains 10 datasets with the names “Atom”, “Chainlink”, “EngyTime”, “Golfball”, “Hepta”, “Lsun”, “Target”, “Tetra”, “TwoDiamonds”, and “WingNut”. Common clustering methods occasionally identified non-existent clusters or assigned data points to the wrong clusters in the FCPS suite. Likewise, common data projection methods could only partially reproduce the data structure correctly on a two-dimensional plane. In conclusion, the FCPS dataset collection addresses general challenges for clustering and projection algorithms such as lack of linear separability, different or small inner class spacing, classes defined by data density rather than data spacing, no cluster structure at all, outliers, or classes that are in contact. This report describes a collection of datasets that are grouped together in the Fundamental Clustering and Projection Suite (FCPS). It is designed to address specific problems of structure discovery in high-dimensional spaces.
Keywords: clustering; data projection; performance tests; benchmark standards; high dimensional complex data (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/5/1/13/pdf (application/pdf)
https://www.mdpi.com/2306-5729/5/1/13/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:5:y:2020:i:1:p:13-:d:314642
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().