An Introduction to Programming for Bioscientists: A Python-Based Primer
Berk Ekmekci,
Charles E McAnany and
Cameron Mura
PLOS Computational Biology, 2016, vol. 12, issue 6, 1-43
Abstract:
Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in molecular biology, biochemistry, and other biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language’s usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a “variable,” the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Author Summary: Contemporary biology has largely become computational biology, whether it involves applying physical principles to simulate the motion of each atom in a piece of DNA, or using machine learning algorithms to integrate and mine “omics” data across whole cells (or even entire ecosystems). The ability to design algorithms and program computers, even at a novice level, may be the most indispensable skill that a modern researcher can cultivate. As with human languages, computational fluency is developed actively, not passively. This self-contained text, structured as a hybrid primer/tutorial, introduces any biologist—from college freshman to established senior scientist—to basic computing principles (control-flow, recursion, regular expressions, etc.) and the practicalities of programming and software design. We use the Python language because it now pervades virtually every domain of the biosciences, from sequence-based bioinformatics and molecular evolution to phylogenomics, systems biology, structural biology, and beyond. To introduce both coding (in general) and Python (in particular), we guide the reader via concrete examples and exercises. We also supply, as Supplemental Chapters, a few thousand lines of heavily-annotated, freely distributed source code for personal study.
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004867 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 04867&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1004867
DOI: 10.1371/journal.pcbi.1004867
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().