A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction

Lavkush, Sharma; Akshay, Deepak; Ashish, Ranjan; Gopalakrishnan, Krishnasamy

A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction

Sharma Lavkush (), Deepak Akshay (), Ranjan Ashish () and Krishnasamy Gopalakrishnan ()
Additional contact information
Sharma Lavkush: Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, India
Deepak Akshay: Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, India
Ranjan Ashish: Department of Computer Science and Engineering, ITER, Siksha ‘O’ Anusandhan University (Deemed to be University), Bhubaneswar, Odisha, India
Krishnasamy Gopalakrishnan: Department of Mathematics and Computer Science, Central State University, Wilberforce, USA

Statistical Applications in Genetics and Molecular Biology, 2023, vol. 22, issue 1, 18

Abstract: Proteins are the building blocks of all living things. Protein function must be ascertained if the molecular mechanism of life is to be understood. While CNN is good at capturing short-term relationships, GRU and LSTM can capture long-term dependencies. A hybrid approach that combines the complementary benefits of these deep-learning models motivates our work. Protein Language models, which use attention networks to gather meaningful data and build representations for proteins, have seen tremendous success in recent years processing the protein sequences. In this paper, we propose a hybrid CNN + BiGRU – Attention based model with protein language model embedding that effectively combines the output of CNN with the output of BiGRU-Attention for predicting protein functions. We evaluated the performance of our proposed hybrid model on human and yeast datasets. The proposed hybrid model improves the Fmax value over the state-of-the-art model SDN2GO for the cellular component prediction task by 1.9 %, for the molecular function prediction task by 3.8 % and for the biological process prediction task by 0.6 % for human dataset and for yeast dataset the cellular component prediction task by 2.4 %, for the molecular function prediction task by 5.2 % and for the biological process prediction task by 1.2 %.

Keywords: attention technique; CNN; gated recurrent unit; protein language models; protein sequence (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2022-0057 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:22:y:2023:i:1:p:18:n:1007

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.1515/sagmb-2022-0057

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().