EconPapers    
Economics at your fingertips  
 

Sequence similarity using composition method

Geetika Munjal, Pooja Sharma and Deepti Gaur

International Journal of Data Science, 2018, vol. 3, issue 1, 19-28

Abstract: Deoxyribo nucleic acid (DNA) has enormous capacity to carry very important information in the form of character strings. Sequence analysis is the process of applying a wide range of methods to DNA sequences for understanding the structure, feature or evolution of these nucleotides strings. The analysis uses mathematical methods to convert these character strings to numerical values, and these numerical values are used to find similarity between the sequences. DNA sequences only contain four nucleotides: A, C, G and T, but in order to find information from these sequences, sequence comparison becomes essential. In this paper, various methods to analyse DNA sequences including usage of entropy, divergence, LZ complexity and the role of hybridisation are explored. A hybrid model based on the composition vector and distance methods is proposed to find dissimilarity between sequences and this hybrid model is tested on sequences of species downloaded from National Center for Biotechnology Information (NCBI).

Keywords: nucleotides; entropy; frequency vector. (search for similar items in EconPapers)
Date: 2018
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=90626 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdsci:v:3:y:2018:i:1:p:19-28

Access Statistics for this article

More articles in International Journal of Data Science from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijdsci:v:3:y:2018:i:1:p:19-28