Data Science sous Python: Algorithme, Statistique, DataViz, DataMining et Machine-Learning
Data Science with Python: Algorithm, Statistics, DataViz, DataMining and Machine-Learning
Moussa Keita
MPRA Paper from University Library of Munich, Germany
Abstract:
Data Science is a technical discipline that associates statistical concepts to computer algorithms and calculations for processing and modeling mass data derived from observation phenomena (economic, industrial, commercial, financial, managerial, social, etc. ..). In the area of Business Intelligence, the Data Science has become an indispensable tool to help decision making for company managers in the sense that it allows to exploit and valorize the internal and external informational patrimony of the company. In recent years, Python has rapidly become one of the most used programming languages at by Data Scientists to exploit the growing potential of Big Data. The gain of popularity of this language, today, is largely explained by the numerous possibilities offered by its powerful libraries including that of numerical analysis and scientific computing (numpy, scipy, pandas), data visualization ( matplotlib) but also Machine Learning (scikit-learn). Presented in a pedagogical approach, this manuscript revisits the concepts essential for mastering Data Science with Python. The work is organized into seven chapters. The first chapter is is devoted to the presentation of the basics of programming on Python. The second chapter is devoted to the study of strings and regular expressions. The aim of this chapter is to familiarize with the processing and the use of strings values which constitute the values of variables commonly found in unstructured databases. The third chapter is devoted to presenting the methods of file management and text processing. The purpose of this chapter is to deepen the previous chapter by presenting the methods commonly used for the processing of unstructured data which are generally in the form of text files. The fourth chapter is devoted to the presentation of the methods of processing and organization of data originally stored as data tables. The fifth chapter is dedicated to presenting classical statistical analysis methods (descriptive analyzes, statistical tests, linear and logistic regression, ...). The sixth chapter is devoted to presenting of methods of datavisualization: histograms, bars graphs, pie-plots, box-plots, scatter-plots, trend curves, 3D graphs, ...). Finally, the seventh chapter is devoted to presenting of methods of data mining and machine-learning. In this chapter, we present methods such as data dimensions reductions (Principal Components Analysis, Factor Analysis, Multiple Correspondence Analysis) but also of classification methods (Hierarchical Classification, K-Means Clustering, Support Vector Machine, Random Forest).
Keywords: Programmation; langage Python; Data science; Traitement et analyses de données; data visualization. (search for similar items in EconPapers)
JEL-codes: C8 (search for similar items in EconPapers)
Date: 2017-02
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://mpra.ub.uni-muenchen.de/76653/1/MPRA_paper_76653.pdf original version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:pra:mprapa:76653
Access Statistics for this paper
More papers in MPRA Paper from University Library of Munich, Germany Ludwigstraße 33, D-80539 Munich, Germany. Contact information at EDIRC.
Bibliographic data for series maintained by Joachim Winter ().