EconPapers    
Economics at your fingertips  
 

Automated extraction of chemical synthesis actions from experimental procedures

Alain C. Vaucher (), Federico Zipoli, Joppe Geluykens, Vishnu H. Nair, Philippe Schwaller and Teodoro Laino
Additional contact information
Alain C. Vaucher: IBM Research Europe
Federico Zipoli: IBM Research Europe
Joppe Geluykens: IBM Research Europe
Vishnu H. Nair: IBM Research Europe
Philippe Schwaller: IBM Research Europe
Teodoro Laino: IBM Research Europe

Nature Communications, 2020, vol. 11, issue 1, 1-11

Abstract: Abstract Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on manually annotated samples. Predictions on our test set result in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.

Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://www.nature.com/articles/s41467-020-17266-6 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-17266-6

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-020-17266-6

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-17266-6