EconPapers    
Economics at your fingertips  
 

Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data

Jun Mencius, Wenjun Chen, Youqi Zheng, Tingyi An, Yongguo Yu (), Kun Sun (), Huijuan Feng () and Zhixing Feng ()
Additional contact information
Jun Mencius: Fudan University
Wenjun Chen: Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine
Youqi Zheng: Fudan University
Tingyi An: Fudan University
Yongguo Yu: Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine
Kun Sun: Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine
Huijuan Feng: Fudan University
Zhixing Feng: Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine

Nature Communications, 2025, vol. 16, issue 1, 1-19

Abstract: Abstract As nanopore sequencing has been widely adopted, data accumulation has surged, resulting in over 700,000 public datasets. While these data hold immense potential for advancing genomic research, their utility is compromised by the absence of flowcell type and basecaller configuration in about 85% of the data and associated publications. These parameters are essential for many analysis algorithms, and their misapplication can lead to significant drops in performance. To address this issue, we present LongBow, designed to infer flowcell type and basecaller configuration directly from the base quality value patterns of FASTQ files. LongBow has been tested on 66 in-house basecalled FAST5/POD5 datasets and 1989 public FASTQ datasets, achieving accuracies of 95.33% and 91.45%, respectively. We demonstrate its utility by reanalyzing nanopore sequencing data from the COVID-19 Genomics UK (COG-UK) project. The results show that LongBow is essential for reproducing reported genomic variants and, through a LongBow-based analysis pipeline, we discovered substantially more functionally important variants while improving accuracy in lineage assignment. Overall, LongBow is poised to play a critical role in maximizing the utility of public nanopore sequencing data, while significantly enhancing the reproducibility of related research.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-025-59378-x Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59378-x

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-025-59378-x

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-04
Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-59378-x