Limitations of Influence-Based Dataset Compression for Waste Classification
Julian Aberger,
Lena Brensberger,
Gerald Koinig,
Benedikt Häcker,
Jesús Pestana and
Renato Sarc ()
Additional contact information
Julian Aberger: Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria
Lena Brensberger: Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria
Gerald Koinig: Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria
Benedikt Häcker: Siemens Aktiengesellschaft, 1210 Vienna, Austria
Jesús Pestana: Pro2Future GmbH, 8010 Graz, Austria
Renato Sarc: Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria
Data, 2025, vol. 10, issue 8, 1-16
Abstract:
Influence-based data selection methods, such as TracIn, aim to estimate the impact of individual training samples on model predictions and are increasingly used for dataset curation and reduction. This study investigates whether selecting the most positively influential training examples can be used to create compressed yet effective training datasets for transfer learning in plastic waste classification. Using a ResNet-18 model trained on a custom dataset of plastic waste images, TracIn was applied to compute influence scores across multiple training checkpoints. The top 50 influential samples per class were extracted and used to train a new model. Contrary to expectations, models trained on these highly influential subsets significantly underperformed compared to models trained on either the full dataset or an equally sized random sample. Further analysis revealed that many top-ranked influential images originated from different classes, indicating model biases and potential label confusion. These findings highlight the limitations of using influence scores for dataset compression. However, TracIn proved valuable for identifying problematic or ambiguous samples, class imbalance issues, and issues with fuzzy class boundaries. Based on the results, the utilized TracIn approach is recommended as a diagnostic instrument rather than for dataset curation.
Keywords: waste management; waste classification; TracIn; influence-based compression; transfer learning (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/10/8/127/pdf (application/pdf)
https://www.mdpi.com/2306-5729/10/8/127/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:10:y:2025:i:8:p:127-:d:1719328
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().