DETECCIÓN DE OUTLIERS USANDO MÉTRICAS DE DISTANCIA Y ANÁLISIS CLUSTER
Santiago Cartagena Agudelo
No ckqng, OSF Preprints from Center for Open Science
Abstract:
In many techniques appropriate for conducting data science and machine learning, it is necessary to be able to measure the separation between different records. For example, in cluster analysis methods it is necessary to obtain a degree of similarity between the records. The way to do this is by using distances or metrics, thus assuming that the data are points in an n-dimensional space. Distance measurements play an important role in grouping data points. Choosing the correct distance measure for a given data set is not a trivial problem, and requires some prior knowledge to carry out this process in a good way. In this work, several of the most well-known distance measurements are studied and implemented today, such as the Mahalanobis distance, the Euclidean distance, the Manhattan distance, and the cosine distance, which attracted the interest of the authors by name, despite not being as well known as the other three, said the analysis was carried out with the aim of observing the application of these distances in real life with a set of real data taken from daily actions of companies from yahoo finance.
Date: 2021-06-03
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/6220647dc0642702ccd911dd/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:osfxxx:ckqng
DOI: 10.31219/osf.io/ckqng
Access Statistics for this paper
More papers in OSF Preprints from Center for Open Science
Bibliographic data for series maintained by OSF ().