A fast (CNN + MCWS-transformer) based architecture for protein function prediction

Abhipsa, Mahala; Ashish, Ranjan; Rojalina, Priyadarshini; Raj, Vikram; Prabhat, Dansena

A fast (CNN + MCWS-transformer) based architecture for protein function prediction

Mahala Abhipsa (), Ranjan Ashish (), Priyadarshini Rojalina (), Vikram Raj () and Dansena Prabhat ()
Additional contact information
Mahala Abhipsa: Department of Computer Science & Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India
Ranjan Ashish: Department of Computer Science & Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India
Priyadarshini Rojalina: Department of Computer Science & Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India
Vikram Raj: Department of Computer Science & Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India
Dansena Prabhat: Department of Computer Science & Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India

Statistical Applications in Genetics and Molecular Biology, 2025, vol. 24, issue 1, 17

Abstract: The transformer model for sequence mining has brought a paradigmatic shift to many domains, including biological sequence mining. However, transformers suffer from quadratic complexity, i.e., O(l 2), where l is the sequence length, which affects the training and prediction time. Therefore, the work herein introduces a simple, generalized, and fast transformer architecture for improved protein function prediction. The proposed architecture uses a combination of CNN and global-average pooling to effectively shorten the protein sequences. The shortening process helps reduce the quadratic complexity of the transformer, resulting in the complexity of O((l/2)2). This architecture is utilized to develop PFP solution at the sub-sequence level. Furthermore, focal loss is employed to ensure balanced training for the hard-classified examples. The multi sub-sequence-based proposed solution utilizing an average-pooling layer (with stride = 2) produced improvements of +2.50 % (BP) and +3.00 % (MF) when compared to Global-ProtEnc Plus. The corresponding improvements when compared to the Lite-SeqCNN are: +4.50 % (BP) and +2.30 % (MF).

Keywords: MCWS-transformer; fast transformer architecture; protein sequence; protein function prediction (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1515/sagmb-2024-0027 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:24:y:2025:i:1:p:17:n:1001

Ordering information: This journal article can be ordered from
https://www.degruyte ... urnal/key/sagmb/html

DOI: 10.1515/sagmb-2024-0027

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().