EconPapers    
Economics at your fingertips  
 

ConNIS and labeling instability: New statistical methods for improving the detection of essential genes in TraDIS libraries

Moritz Hanke, Theresa Harten and Ronja Foraita

PLOS Computational Biology, 2026, vol. 22, issue 3, 1-19

Abstract: The identification of essential genes in Transposon Directed Insertion Site Sequencing (TraDIS) data relies on the assumption that transposon insertions occur randomly in non-essential regions, leaving essential genes largely insertion-free. While intragenic insertion-free sequences have been considered as a reliable indicator for gene essentiality, so far, no exact probability distribution for these sequences has been proposed. Further, many methods require setting thresholds or parameter values a priori without providing any statistical basis, limiting the comparability of results. Here, we introduce Consecutive Non-Insertion Sites (ConNIS), a novel method for gene essentiality determination. ConNIS provides an analytic solution for the probability of observing insertion-free sequences within genes of given length and considers variation in insertion density across the genome. Based on an extensive simulation study and different real world scenarios, ConNIS was found to be superior to prevalent state-of-the-art methods, particularly when libraries had only a low or medium insertion density. In addition, our results showed that the precision of existing methods can be improved by incorporating a simple weighting factor for the genome-wide insertion density. To set methodically embedded parameter and threshold values of TraDIS methods a subsample based instability criterion was developed. Application of this criterion in real and synthetic data settings demonstrated its effectiveness in selecting well-suited parameter/threshold values across methods. A ready-to-use R package and an interactive web application are provided to facilitate application and reproducibility.Author summary: Identifying essential genes in bacteria is key to understanding their ability to survive, which can, for example, be applied to the development of new treatments. One way to do identify these genes is by creating libraries where small DNA fragments (“insertions”) are randomly placed in the genome: essential genes tend to remain insertion-free because insertions disrupt their function. The challenge is to determine whether a (long) uninterrupted sequence is due to chance or because the gene is truly essential. Here, we present Consecutive Non-Insertion Sites (ConNIS), a statistical method that calculates the probability of such insertion-free sequences. Extensive comparisons on simulated and real datasets show that ConNIS outperforms existing methods, especially when a library is rather sparse in terms of the total number of insertion sites. Since many analysis methods rely on parameter values that have to be set before the analysis and can heavily influence the final results, we also propose a data-driven approach to set these values, making results more comparable across studies. Our methods are freely available as an R package and all results are presented in a web app.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013428 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 13428&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1013428

DOI: 10.1371/journal.pcbi.1013428

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2026-04-05
Handle: RePEc:plo:pcbi00:1013428