Analyzing Income Inequalities across Italian regions: Instrumental Variable Panel Data, K-Means Clustering and Machine Learning Algorithms

Antonicelli, Margareth; Drago, Carlo; Costantiello, Alberto; Leogrande, Angelo

Analyzing Income Inequalities across Italian regions: Instrumental Variable Panel Data, K-Means Clustering and Machine Learning Algorithms

Margareth Antonicelli (), Carlo Drago, Alberto Costantiello and Angelo Leogrande
Additional contact information
Alberto Costantiello: LUM - Università LUM Giuseppe Degennaro = University Giuseppe Degennaro

Working Papers from HAL

Abstract: This study examines income inequality across Italian regions by integrating instrumental variable panel data models, k-means clustering, and machine learning algorithms. Using econometric techniques, we address endogeneity and identify causal relationships influencing regional disparities. K-means clustering, optimized with the elbow method, classifies Italian regions based on income inequality patterns, while machine-learning models, including random forest, support vector machines, and decision tree regression, predict inequality trends and key determinants. Informal employment, temporary employment, and overeducation also play a major role in influencing inequality. Clustering results confirm a permanent North-South economic divide and the most disadvantaged regions are Campania, Calabria, and Sicily. Among the machine learning models, the highest income disparities prediction accuracy comes with the use of Random Forest Regression. The findings emphasize the necessity of education-focused and digitally based policies and reforms of the labor market in an effort to enhance economic convergence. The study portrays the use of a combination of econometric and machine learning methods in the analysis of regional disparities and proposes a solid framework of policy-making with the intention of curbing economic disparities in Italy.

Keywords: Income Inequality Regional Disparities Machine Learning Labor Market Digital Divide. JEL Codes: C23 C38 C45 O15 R11 R58. 'Between' variance = 0.480192 'Within' variance = 0.363472 theta used for quasi-demeaning = 0.804265 Joint test on named regressors -Asymptotic test statistic: Chi-square(6) = 2324.38; Income Inequality; Regional Disparities; Machine Learning; Labor Market; Digital Divide. JEL Codes: C23; C38; C45; O15; R11; R58. 'Between' variance = 0.480192 'Within' variance = 0.363472 theta used for quasi-demeaning = 0.804265 Joint test on named regressors -Asymptotic test statistic: Chi-square(6) = 2324.38 (search for similar items in EconPapers)
Date: 2025-05-31
Note: View the original document on HAL open archive server: https://hal.science/hal-05091404v1
References: Add references at CitEc
Citations:

Downloads: (external link)
https://hal.science/hal-05091404v1/document (application/pdf)

Related works:
Working Paper: Analyzing Income Inequalities across Italian regions: Instrumental Variable Panel Data, K-Means Clustering and Machine Learning Algorithms (2025)
Working Paper: Analyzing Income Inequalities across Italian regions: Instrumental Variable Panel Data, K-Means Clustering and Machine Learning Algorithms (2025)
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:hal:wpaper:hal-05091404

Access Statistics for this paper

More papers in Working Papers from HAL
Bibliographic data for series maintained by CCSD ().