EconPapers    
Economics at your fingertips  
 

An Approximation Algorithm for Alignment of Multiple Sequences using Motif Discovery

Laxmi Parida (), Aris Floratos and Isidore Rigoutsos
Additional contact information
Laxmi Parida: Computational Biology Center, IBM Thomas J. Watson Research Center
Aris Floratos: Computational Biology Center, IBM Thomas J. Watson Research Center
Isidore Rigoutsos: Computational Biology Center, IBM Thomas J. Watson Research Center

Journal of Combinatorial Optimization, 1999, vol. 3, issue 2, No 7, 247-275

Abstract: Abstract Given a set of N sequence, the Multiple Sequence Alignment problem is to align these N sequences, possibly with gaps, that brings out the best commonality of the N sequences. The quality of the alignment is usually measured by penalizing the mis-matches and gaps, and rewarding the matches with appropriate weight functions. However for larger values of N, additional constraints are required to give meaningful alignments. We identify a user-controlled parameter, an alignment number K (2 ≤ K ≤ N): this additional requirement constrains the alignment to have at least K sequences agree on a character, whenever possible, in the alignment. We identify a natural optimization problem for this approach called the K-MSA problem. We show that the problem is MAX SNP hard. We give a natural extension of this problem that incorporates “biological relevance” by using motifs (common patterns in the sequences) and give an approximation algorithm for this problem in terms of the motifs in the data. MUSCA is an implementation of this approach and our experimental results indicate that this approach is efficient, particularly on large numbers of long sequences, and gives good alignments when tested on biological data such as DNA and protein sequences.

Keywords: multiple sequence alignment; alignment number; protein sequences; motif discovery; MAX SNP hard; approximate algorithm; set covering problem (search for similar items in EconPapers)
Date: 1999
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1023/A:1009841927822 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jcomop:v:3:y:1999:i:2:d:10.1023_a:1009841927822

Ordering information: This journal article can be ordered from
https://www.springer.com/journal/10878

DOI: 10.1023/A:1009841927822

Access Statistics for this article

Journal of Combinatorial Optimization is currently edited by Thai, My T.

More articles in Journal of Combinatorial Optimization from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:jcomop:v:3:y:1999:i:2:d:10.1023_a:1009841927822