EconPapers    
Economics at your fingertips  
 

A data science roadmap for open science organizations engaged in early-stage drug discovery

Kristina Edfeldt, Aled M. Edwards, Ola Engkvist, Judith Günther, Matthew Hartley, David G. Hulcoop, Andrew R. Leach, Brian D. Marsden, Amelie Menge, Leonie Misquitta, Susanne Müller, Dafydd R. Owen, Kristof T. Schütt, Nicholas Skelton, Andreas Steffen, Alexander Tropsha, Erik Vernet, Yanli Wang, James Wellnitz, Timothy M. Willson, Djork-Arné Clevert (), Benjamin Haibe-Kains (), Lovisa Holmberg Schiavone () and Matthieu Schapira ()
Additional contact information
Kristina Edfeldt: Karolinska University Hospital and Karolinska Institutet
Aled M. Edwards: University of Toronto
Ola Engkvist: Chalmers University of Technology
Judith Günther: Computational Molecular Design
Matthew Hartley: Wellcome Genome Campus
David G. Hulcoop: Wellcome Genome Campus
Andrew R. Leach: Wellcome Genome Campus
Brian D. Marsden: University of Oxford
Amelie Menge: Johann Wolfgang Goethe University, Frankfurt am Main, 60438, Germany & Structural Genomics Consortium (SGC), Buchmann Institute for Life Sciences, Johann Wolfgang Goethe University
Leonie Misquitta: National Institutes of Health
Susanne Müller: Johann Wolfgang Goethe University, Frankfurt am Main, 60438, Germany & Structural Genomics Consortium (SGC), Buchmann Institute for Life Sciences, Johann Wolfgang Goethe University
Dafydd R. Owen: Development & Medical
Kristof T. Schütt: Machine Learning & Computational Sciences
Nicholas Skelton: Genentech, Inc.
Andreas Steffen: Machine Learning & Computational Sciences
Alexander Tropsha: University of North Carolina
Erik Vernet: Novo Nordisk A/S
Yanli Wang: National Institutes of Health
James Wellnitz: University of North Carolina
Timothy M. Willson: University of North Carolina at Chapel Hill
Djork-Arné Clevert: Machine Learning & Computational Sciences
Benjamin Haibe-Kains: University of Toronto
Lovisa Holmberg Schiavone: AstraZeneca
Matthieu Schapira: University of Toronto

Nature Communications, 2024, vol. 15, issue 1, 1-10

Abstract: Abstract The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design.

Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.nature.com/articles/s41467-024-49777-x Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49777-x

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-024-49777-x

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49777-x