Bridging the Gap: Performance Evaluation of Classical and Modern Discretization Techniques in Real Estate Data
Anna Gdakowicz and
Malgorzata Latuszynska
European Research Studies Journal, 2025, vol. XXVIII, issue 3, 1566-1586
Abstract:
Purpose: The aim of this study is to evaluate the effectiveness of selected discretization methods for continuous variables using the example of real estate area—one of the key attributes in property analysis. The study addresses the challenge of transforming continuous variables into discrete forms with minimal information loss, a process crucial for data mining, statistical modeling, and classification tasks. Design/Methodology/Approach: Thirteen discretization methods were applied to a dataset of 3,732 residential real estate listings from the Szczecin housing market between 2017 and 2021. The methods include classical approaches with predefined class parameters (equal width and equal frequency), expert-driven methods, quantile-based techniques, clustering (k-means), and supervised learning approaches such as entropy minimization and 1R. The evaluation criteria included the deviation of grouped results from ungrouped data (arithmetic mean difference and loss function), and the number of classes, treated as a nominant. A linear ordering technique (Hellwig’s method) was used to rank the methods. Findings: The method based on expert-defined class width (Method 4) showed the highest consistency with the original data, followed by Scott’s rule (Method 2) and the entropy-based supervised method (Method 11). Contrary to expectations, quantile-based methods and commonly used rules such as Freedman–Diaconis or square-root yielded unsatisfactory results, either due to oversimplification (too few intervals) or excessive granularity (too many classes). Practical Implications: The results underline the importance of selecting discretization methods tailored to the characteristics of the variable and research context. In particular, they demonstrate the value of domain expertise in guiding discretization decisions in real estate analytics, improving data quality for downstream analysis such as classification, segmentation, or regression. Originality/Value: This study is one of the first to systematically compare a broad spectrum of discretization methods in the context of real estate data. It introduces a comprehensive evaluation framework combining statistical accuracy and interpretability. The findings contribute to both methodological development in data preprocessing and practical decision-making in real estate market research.
Keywords: Discretization methods; real estate market; property area; data preprocessing; variable transformation; grouped frequency distribution (search for similar items in EconPapers)
JEL-codes: C38 C89 R31 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://ersj.eu/journal/4251/download (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ers:journl:v:xxviii:y:2025:i:3:p:1566-1586
Access Statistics for this article
More articles in European Research Studies Journal from European Research Studies Journal
Bibliographic data for series maintained by Marios Agiomavritis ().