EconPapers    
Economics at your fingertips  
 

Comparisons of classification methods for viral genomes and protein families using alignment-free vectorization

Huang Hsin-Hsiung (), Hao Shuai, Alarcon Saul and Yang Jie
Additional contact information
Huang Hsin-Hsiung: Department of Statistics, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA
Yang Jie: Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL, USA

Statistical Applications in Genetics and Molecular Biology, 2018, vol. 17, issue 4, 12

Abstract: In this paper, we propose a statistical classification method based on discriminant analysis using the first and second moments of positions of each nucleotide of the genome sequences as features, and compare its performances with other classification methods as well as natural vector for comparative genomic analysis. We examine the normality of the proposed features. The statistical classification models used including linear discriminant analysis, quadratic discriminant analysis, diagonal linear discriminant analysis, k-nearest-neighbor classifier, logistic regression, support vector machines, and classification trees. All these classifiers are tested on a viral genome dataset and a protein dataset for predicting viral Baltimore labels, viral family labels, and protein family labels.

Keywords: viral genomes; protein; family labels; Natural Vector; statistical classification models (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations: Track citations by RSS feed

Downloads: (external link)
https://doi.org/10.1515/sagmb-2018-0004 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:17:y:2018:i:4:p:12:n:3

Ordering information: This journal article can be ordered from
https://www.degruyter.com/view/j/sagmb

DOI: 10.1515/sagmb-2018-0004

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2021-05-07
Handle: RePEc:bpj:sagmbi:v:17:y:2018:i:4:p:12:n:3