EconPapers    
Economics at your fingertips  
 

USC-DCT: A Collection of Diverse Classification Tasks

Adam M. Jones, Gozde Sahin, Zachary W. Murdock, Yunhao Ge, Ao Xu, Yuecheng Li, Di Wu, Shuo Ni, Po-Hsuan Huang, Kiran Lekkala and Laurent Itti ()
Additional contact information
Adam M. Jones: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Gozde Sahin: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Zachary W. Murdock: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Yunhao Ge: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Ao Xu: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Yuecheng Li: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Di Wu: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Shuo Ni: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Po-Hsuan Huang: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Kiran Lekkala: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Laurent Itti: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA

Data, 2023, vol. 8, issue 10, 1-22

Abstract: Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.

Keywords: machine learning; data sharing; classification; computer vision; visual classification; dataset collection; dataset organization; data cleaning (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/8/10/153/pdf (application/pdf)
https://www.mdpi.com/2306-5729/8/10/153/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:8:y:2023:i:10:p:153-:d:1258406

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:8:y:2023:i:10:p:153-:d:1258406