USC-DCT: A Collection of Diverse Classification Tasks
Adam M. Jones,
Gozde Sahin,
Zachary W. Murdock,
Yunhao Ge,
Ao Xu,
Yuecheng Li,
Di Wu,
Shuo Ni,
Po-Hsuan Huang,
Kiran Lekkala and
Laurent Itti ()
Additional contact information
Adam M. Jones: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Gozde Sahin: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Zachary W. Murdock: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Yunhao Ge: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Ao Xu: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Yuecheng Li: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Di Wu: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Shuo Ni: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Po-Hsuan Huang: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Kiran Lekkala: Department of Computer Science, University of Southern California, Los Angeles, CA 90007, USA
Laurent Itti: Neuroscience Graduate Program, University of Southern California, Los Angeles, CA 90007, USA
Data, 2023, vol. 8, issue 10, 1-22
Abstract:
Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.
Keywords: machine learning; data sharing; classification; computer vision; visual classification; dataset collection; dataset organization; data cleaning (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/8/10/153/pdf (application/pdf)
https://www.mdpi.com/2306-5729/8/10/153/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:8:y:2023:i:10:p:153-:d:1258406
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().