Protein Prediction using Dictionary Based Regression Learning

Authors

  • T. Sudha Rani Department of Computer Science and Engineering, Aditya Engineering College, ADB Road, Aditya Nagar, Surampalem, Andhra Pradesh, India
  • A. Yesu Babu Department of Computer Science and Engineering, Sir C R Reddy College of Engineering, Eluru, Andhra Pradesh, India
  • D. Haritha Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Kakinada, India

DOI:

https://doi.org/10.13052/jmm1550-4646.1942

Keywords:

protein sequence, amino acid sequence, kernel matrix,fuzzy based genetic algorithm, dictionary based regression learning

Abstract

Research Objectives: Molecular genetic data is managed by the information technology known as bioinformatics. Major concept involved in bioinformatics is a protein sequence. Amino acids bonded with peptide bond constitute the sequence of Protein and it is very essential to lead life. To predict sequence of amino acid, primary sequence obtains amino sequence folding and structures prediction.

Research Novelty: In this manuscript, dictionary based regression learning and fuzzy genetic algorithm is proposed for protein prediction from structural analysis (DRL-FGA-PD-SA). In this input data are taken from Kaggle domain dataset. The extraction of protein features from given data is made through Kernel Matrix (KM) which extracts composition of amino acids, composition of dipeptide, composition of pseudo-amino-acid, composition of functional domain and distance-based features. Then fuzzy based genetic algorithm (FGA) update the selected features for classification of protein and the features are clustered. Finally, dictionary based regression learning (DRL) predicts the class of protein with conversion of values either 0’s or 1’s.

Research Conclusions: The proposed method is executed on MATLAB. Here evaluation metrics as sensitivity, precision, f-measure, specificity, accuracy and error rate are outlined. Then the performance of the proposed DRL-FGA-PD-SA method provides 22.08%, 24.03%, 34.76% higher accuracy, 23.34%, 26.45%, 34.44% higher precision, compared with the existing systems such assdeep learning methods in protein structure prediction (FFNN-RNN-PD-SA), deep learning technique for protein structure prediction and protein design (DNN-PD-SA) and improved protein structure prediction using potentials from deep learning (DNN-SGDA-PD-SA) respectively.

Downloads

Download data is not yet available.

Author Biographies

T. Sudha Rani, Department of Computer Science and Engineering, Aditya Engineering College, ADB Road, Aditya Nagar, Surampalem, Andhra Pradesh, India

T. Sudha Rani, received B.Tech degree in IT from R.VR & JC college of Engineering affiliated to Nagarjuna University, Guntur, and Andhra Pradesh in 2005. M.Tech Degree in CSE from JNTUA, Anantapur, Andhra Pradesh in 2010. She is currently working as Associate Professor, Department of Computer Science and Engineering, Aditya Engineering College, ADB Road, Aditya Nagar, Surampalem, Andhra Pradesh, India and Pursuing Ph.D at JNTUK, Kakinada. Her area of Research are Bioinformatics and Data Mining.

A. Yesu Babu, Department of Computer Science and Engineering, Sir C R Reddy College of Engineering, Eluru, Andhra Pradesh, India

A. Yesu Babu, Currently working as a Professor, Department of Computer Science and Engineering, Sir C R Reddy College of Engineering, Eluru, Andhra Pradesh, India. He is having 31 years of Academic, Research & Academic Administration experience. Published 43 Research Papers in International journals and 6 chapters. Reviewer of Research publications for premier publishing groups like Springer, Elsevier, Inderscience and a number of SCOPUS and SCI indexed journals.

D. Haritha, Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Kakinada, India

D. Haritha, She is working as Associate Professor in Computer science and Engineering Department at Jawaharlal Nehru Technological University Kakinada, India. She has 17+ years of experience. She guided 50 M.Tech students and 15 MCA students for their project. Her research interest is on Image Processing, Data Structures, Software Engineering and Networking. She published 12 research papers in international journals. She published 11 research papers in international conferences.

References

T. Siebenmorgen, M. Zacharias, ‘Computational prediction of protein–protein binding affinities. Wiley Interdisciplinary Reviews’, Computational Molecular Science, vol. 10, no. 3, p. e1448, 2020.

M. Al Quraishi, ‘Machine learning in protein structure prediction’, Current opinion in chemical biology, vol. 65, pp. 1–8, 2021.

M. Torrisi, G. Pollastri, Q. Le, ‘Deep learning methods in protein structure prediction’, Computational and Structural’ Biotechnology Journal, vol. 18, pp. 1301–1310, 2020.

J. Pereira, A.J. Simpkin, M.D. Hartmann, D.J. Rigden, R.M. Keegan, A.N. Lupas, ‘High-accuracy protein structure prediction in CASP14’, Proteins: Structure, Function and Bioinformatics, vol. 89, no. 12, pp. 1687–1699, 2021.

M. Zeng, F. Zhang, F.X. Wu, Y. Li, J. Wang, M. Li, ‘Protein–protein interaction site prediction through combining local and global features with deep neural networks’, Bioinformatics, vol. 36, no. 4, pp. 1114–1120, 2020.

C. Chen, Q. Zhang, B. Yu, Z. Yu, P.J. Lawrence, Q. Ma, Y. Zhang, ‘Improving protein-protein interactions prediction accuracy using XG Boost feature selection and stacked ensemble classifier’, Computers in Biology and Medicine, vol. 123, p. 103899, 2020.

X. Yang, S. Yang, Q. Li, S. Wuchty, Z. Zhang, ‘Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method’, Computational and structural biotechnology journal, vol. 18, pp. 153–161, 2020.

Y. Duan, R. Coreas, Y. Liu, D. Bitounis, Z. Zhang, D. Parviz, M. Strano, P. Demokritou, W. Zhong, ‘Prediction of protein corona on nanomaterials by machine learning using novel descriptors’, NanoImpact, vol. 17, p. 100207, 2020.

P. Rajesh, FH. Shajin, ‘Optimal allocation of EV charging spots and capacitors in distribution network improving voltage and power loss by Quantum-Behaved and Gaussian Mutational Dragonfly Algorithm (QGDA)’, Electric Power Systems Research, vol. 194, pp. 107049, 2021.

P. Rajesh, FH. Shajin, BN. Kommula, ‘An efficient integration and control approach to increase the conversion efficiency of high-current low-voltage DC/DC converter’, Energy Systems, pp. 1–20, 2021.

FH. Shajin, P. Rajesh, MR. Raja, ‘An Efficient VLSI Architecture for Fast Motion Estimation Exploiting Zero Motion Prejudgment Technique and a New Quadrant-Based Search Algorithm in HEVC’, Circuits, Systems and Signal Processing, vol. 41, no. 3, pp. 1751–74, 2022.

FH. Shajin, P. Rajesh, S. Thilaha, ‘Bald eagle search optimization algorithm for cluster head selection with prolong lifetime in wireless sensor network, ‘Journal of Soft Computing and Engineering Applications, vol. 1, no. 1, pp. 7, 2020.

S.C. Pakhrin, B. Shrestha, B. Adhikari, D.B. Kc, ‘Deep learning-based advances in protein structure prediction’, International Journal of Molecular Sciences, vol. 22, no. 11, p. 5553, 2021.

B. Niu, C. Liang, Y. Lu, M. Zhao, Q. Chen, Y. Zhang, L. Zheng, K.C. Chou, ‘Glioma stages prediction based on machine learning algorithm combined with protein-protein interaction networks’, Genomics, vol. 112, no. 1, pp. 837–847, 2020.

R. Chowdhury, N. Bouatta, S. Biswas, C. Floristean, A. Kharkar, K. Roy, C. Rochereau, G. Ahdritz, J. Zhang, GM. Church, PK. Sorger, ‘Single-sequence protein structure prediction using a language model and deep learning’, Nature Biotechnology, pp. 1–7, 2022.

M. Zeng, F. Zhang, F.X. Wu, Y. Li, J. Wang, M. Li, ‘Protein–protein interaction site prediction through combining local and global features with deep neural networks’, Bioinformatics, vol. 36, no. 4, pp. 1114–1120, 2020.

A.H. Mahmoud, M.R. Masters, Y. Yang, M.A. Lill, ‘Elucidating the multiple roles of hydration for accurate protein-ligand binding prediction via deep learning’, Communications Chemistry, vol. 3, no. 1, pp. 1–13, 2020.

K. Sato, M. Akiyama, Y. Sakakibara, ‘RNA secondary structure prediction using deep learning with thermodynamic integration’, Nature communications, vol. 12, no. 1, pp. 1–9, 2021.

A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, ‘Improved protein structure prediction using potentials from deep learning’, Nature, vol. 577, no. 7792, pp. 706–710, 2020.

https://www.kaggle.com/shahir/protein-data-set#pdb_data_seq.csv

P. Hajibabaee, F. Pourkamali-Anaraki, M.A. Hariri-Ardebili, ‘Kernel matrix approximation on class-imbalanced data with an application to scientific simulation’, IEEE Access, vol. 9, pp. 83579–83591, 2021.

A. Rain, M.E. Saritac, ‘HydroPower Plant Planning for Resilience Improvement of Power Systems using Fuzzy-Neural based Genetic Algorithm’, arXiv preprint arXiv: 2106.12042, 2021.

P. Goyal, P. Benner, ‘Discovery of nonlinear dynamical systems using a Runge–Kutta inspired dictionary-based sparse regression approach’, Proceedings of the Royal Society A, vol. 478, no. 2262, pp. 20210883, 2022.

M. Torrisi, G. Pollastri, Q. Le, ‘Deep learning methods in protein structure prediction’, Computational and Structural Biotechnology Journal, vol. 18, pp. 1301–1310, 2020.

R. Pearce, Y. Zhang, ‘Deep learning techniques have significantly impacted protein structure prediction and protein design’, Current opinion in structural biology, vol. 68, pp. 194–207, 2021.

A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, ‘Improved protein structure prediction using potentials from deep learning’, Nature, vol. 577, no. 7792, pp. 706–710, 2020.

A. Kryshtafovych, T. Schwede, M. Topf, K. Fidelis, J. Moult, ‘Critical assessment of methods of protein structure prediction (CASP)—Round XIV’, Proteins: Structure, Function and Bioinformatics, vol. 89, no. 12, pp. 1607–1617, 2021.

J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov, D. Baker, ‘Improved protein structure prediction using predicted interresidue orientations’, Proceedings of the National Academy of Sciences, vol. 117, no. 3, pp. 1496–1503, 2020.

Published

2023-05-04

How to Cite

Rani, T. S. ., Babu, A. Y. ., & Haritha, D. . (2023). Protein Prediction using Dictionary Based Regression Learning. Journal of Mobile Multimedia, 19(04), 963–984. https://doi.org/10.13052/jmm1550-4646.1942

Issue

Section

Articles