Beam search decoder for enhancing sequence decoding speed in single-molecule peptide sequencing data

Kipen, Javier; Jaldén, Joakim

Beam search decoder for enhancing sequence decoding speed in single-molecule peptide sequencing data

Javier Kipen and Joakim Jaldén

PLOS Computational Biology, 2023, vol. 19, issue 11, 1-21

Abstract: Next-generation single-molecule protein sequencing technologies have the potential to significantly accelerate biomedical research. These technologies offer sensitivity and scalability for proteomic analysis. One auspicious method is fluorosequencing, which involves: cutting naturalized proteins into peptides, attaching fluorophores to specific amino acids, and observing variations in light intensity as one amino acid is removed at a time. The original peptide is classified from the sequence of light-intensity reads, and proteins can subsequently be recognized with this information. The amino acid step removal is achieved by attaching the peptides to a wall on the C-terminal and using a process called Edman Degradation to remove an amino acid from the N-Terminal. Even though a framework (Whatprot) has been proposed for the peptide classification task, processing times remain restrictive due to the massively parallel data acquisicion system. In this paper, we propose a new beam search decoder with a novel state formulation that obtains considerably lower processing times at the expense of only a slight accuracy drop compared to Whatprot. Furthermore, we explore how our novel state formulation may lead to even faster decoders in the future.Author summary: Proteomic analyses frequently rely on mass spectrometry, a method characterized by its limited dynamic range, potentially overlooking low-abundant proteins. To address this limitation, single-molecule protein sequencing methods offer a solution. Fluorosequencing is a cutting-edge single-molecule protein sequencing method, which can distinguish peptides or protein molecules massively parallelly. This method has attracted interest from investors, as evidenced by the recent funding of Erisyon, a company developing this technology. This technique contains a challenging classification task: determining the original peptide sequence from light-intensity observations obtained after several Edman cycles. A classifier based on a combination of k Nearest Neighbors (kNN) with Hidden Markov Models (HMM) had been shown to have close-to-optimal accuracy with tractable complexity. We propose in this paper a new algorithm that reduces computation time significantly at the expense of a slight reduction in accuracy compared to state-of-the-art method.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011345 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 11345&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1011345

DOI: 10.1371/journal.pcbi.1011345

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().