A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

Li, Yan; Reyes, Kristofer G.; Vazquez-Anderson, Jorge; Wang, Yingfei; Contreras, Lydia M.; Powell, Warren B.

A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

Yan Li (), Kristofer G. Reyes (), Jorge Vazquez-Anderson (), Yingfei Wang (), Lydia M. Contreras () and Warren B. Powell ()
Additional contact information
Yan Li: Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
Kristofer G. Reyes: Department of Materials Design and Innovation, University at Buffalo, Buffalo, New York 14260
Jorge Vazquez-Anderson: Department of Chemical Engineering, University of Texas at Austin, Austin, Texas 78712
Yingfei Wang: Michael G. Foster School of Business, University of Washington, Seattle, Washington 98195
Lydia M. Contreras: Department of Chemical Engineering, University of Texas at Austin, Austin, Texas 78712
Warren B. Powell: Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544

INFORMS Journal on Computing, 2018, vol. 30, issue 4, 750-767

Abstract: We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We perform a regularized, sparse linear model with a log link function where the marginal contribution to the thermodynamic cycle of each nucleotide is purely additive. The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist l 1 regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before exhausting an experimental budget. We also develop two new algorithms: batch SpKG and batch SpKG-LM. The first algorithm generates more suggestions sequentially to run parallel experiments. The second one dynamically adds new alternatives, in the form of types of probes, which are created by inserting, deleting, or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Tetrahymena Group I intron (a midsize RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.

Keywords: simulation: design of experiments; decision analysis: sequential; statistics: Bayesian (search for similar items in EconPapers)
Date: 2018
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://doi.org/10.1287/ijoc.2017.0803 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:30:y:2018:i:4:p:750-767

Access Statistics for this article

More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().