Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
Daniel L Parton,
Patrick B Grinaway,
Sonya M Hanson,
Kyle A Beauchamp and
John D Chodera
PLOS Computational Biology, 2016, vol. 12, issue 6, 1-25
Abstract:
The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences—from a single sequence to an entire superfamily—and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics—such as Markov state models (MSMs)—which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler.Author Summary: Proteins are the workhorses of the human body, and are involved in essentially every biological process. Many diseases are caused by proteins malfunctioning. To understand how a protein functions, it is necessary to know its physical properties. The field of structural biology provides many techniques for determining the three-dimensional structure of a protein. The dynamics of a protein, i.e. the way it moves, are of equal importance, but are more difficult to uncover with traditional experimental techniques. Computer simulations are an effective alternative method for understanding protein dynamics, but require experimental structural information as a starting point. While recent advances in genomics and experimental techniques have provided a wealth of such structural data, the appropriate software for using this data effectively has been lacking. To tackle this problem, we have developed a software package called Ensembler, which allows a user to automatically select appropriate experimentally derived structures for a given protein or family of proteins, and to use them to prepare a series of simulations. The resultant simulation data can then used to investigate the dynamics of the protein(s) in question, and their involvement in disease.
Date: 2016
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004728 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04728&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004728
DOI: 10.1371/journal.pcbi.1004728
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().