EconPapers    
Economics at your fingertips  
 

Learning meaningful representations of protein sequences

Nicki Skafte Detlefsen, Søren Hauberg and Wouter Boomsma ()
Additional contact information
Nicki Skafte Detlefsen: Technical University of Denmark
Søren Hauberg: Technical University of Denmark
Wouter Boomsma: University of Copenhagen

Nature Communications, 2022, vol. 13, issue 1, 1-12

Abstract: Abstract How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured.

Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.nature.com/articles/s41467-022-29443-w Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-29443-w

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-022-29443-w

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-03
Handle: RePEc:nat:natcom:v:13:y:2022:i:1:d:10.1038_s41467-022-29443-w