EconPapers    
Economics at your fingertips  
 

Robust Learning from Bites for Data Mining

Andreas Christmann, Ingo Steinwart and Mia Hubert

No 2006,03, Technical Reports from Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen

Abstract: Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets.

Keywords: Breakdown point; convex risk minimization; data mining; distributed computing; influence function; logistic regression; robustness; scalability (search for similar items in EconPapers)
Date: 2006
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.econstor.eu/bitstream/10419/22651/1/tr03-06.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:zbw:sfb475:200603

Access Statistics for this paper

More papers in Technical Reports from Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen Contact information at EDIRC.
Bibliographic data for series maintained by ZBW - Leibniz Information Centre for Economics ().

 
Page updated 2025-03-20
Handle: RePEc:zbw:sfb475:200603