Data-Driven Approaches to Selecting Samples for Training Neural Networks
Murthy V. Devarakonda ()
Additional contact information
Murthy V. Devarakonda: AI Innovation Lab, Novartis
A chapter in System Dependability and Analytics, 2023, pp 327-345 from Springer
Abstract:
Abstract Modern neural networks, that are now commonly used for most natural language processing (NLP) tasks, contain many hidden units and parameters. There is a considerable interest in developing strategies for selecting an optimal set of samples to train such large models for biomedical tasks because developing training data is expensive and time consuming in the biomedical space. Lack of sufficient training data is exacerbated by the fact that the ratio of negative samples to positive samples is also highly skewed, i.e., too many negative samples but too few positive samples. Therefore, an important problem, especially for the biomedical space, what is the optimum set of negative samples to use in creating an effective and balanced training data sample. Interestingly though, the insights which may help to decide the most effective sample selection can be found in the data itself (i.e., in the samples themselves). This chapter briefly reviews traditional approaches to selecting training samples and then presents the latest data-driven approaches for selecting samples to effectively train modern neural networks.
Date: 2023
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:ssrchp:978-3-031-02063-6_18
Ordering information: This item can be ordered from
http://www.springer.com/9783031020636
DOI: 10.1007/978-3-031-02063-6_18
Access Statistics for this chapter
More chapters in Springer Series in Reliability Engineering from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().