Comparisons of classification methods for viral genomes and protein families using alignment-free vectorization
Huang Hsin-Hsiung (),
Alarcon Saul and
Additional contact information
Huang Hsin-Hsiung: Department of Statistics, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA
Yang Jie: Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Statistical Applications in Genetics and Molecular Biology, 2018, vol. 17, issue 4, 12
In this paper, we propose a statistical classification method based on discriminant analysis using the first and second moments of positions of each nucleotide of the genome sequences as features, and compare its performances with other classification methods as well as natural vector for comparative genomic analysis. We examine the normality of the proposed features. The statistical classification models used including linear discriminant analysis, quadratic discriminant analysis, diagonal linear discriminant analysis, k-nearest-neighbor classifier, logistic regression, support vector machines, and classification trees. All these classifiers are tested on a viral genome dataset and a protein dataset for predicting viral Baltimore labels, viral family labels, and protein family labels.
Keywords: viral genomes; protein; family labels; Natural Vector; statistical classification models (search for similar items in EconPapers)
References: Add references at CitEc
Citations: Track citations by RSS feed
Downloads: (external link)
For access to full text, subscription to the journal or payment for the individual article is required.
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:17:y:2018:i:4:p:12:n:4
Ordering information: This journal article can be ordered from
Access Statistics for this article
Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf
More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().