Protein Prediction using Dictionary Based Regression Learning
DOI:
https://doi.org/10.13052/jmm1550-4646.1942Keywords:
protein sequence, amino acid sequence, kernel matrix,fuzzy based genetic algorithm, dictionary based regression learningAbstract
Research Objectives: Molecular genetic data is managed by the information technology known as bioinformatics. Major concept involved in bioinformatics is a protein sequence. Amino acids bonded with peptide bond constitute the sequence of Protein and it is very essential to lead life. To predict sequence of amino acid, primary sequence obtains amino sequence folding and structures prediction.
Research Novelty: In this manuscript, dictionary based regression learning and fuzzy genetic algorithm is proposed for protein prediction from structural analysis (DRL-FGA-PD-SA). In this input data are taken from Kaggle domain dataset. The extraction of protein features from given data is made through Kernel Matrix (KM) which extracts composition of amino acids, composition of dipeptide, composition of pseudo-amino-acid, composition of functional domain and distance-based features. Then fuzzy based genetic algorithm (FGA) update the selected features for classification of protein and the features are clustered. Finally, dictionary based regression learning (DRL) predicts the class of protein with conversion of values either 0’s or 1’s.
Research Conclusions: The proposed method is executed on MATLAB. Here evaluation metrics as sensitivity, precision, f-measure, specificity, accuracy and error rate are outlined. Then the performance of the proposed DRL-FGA-PD-SA method provides 22.08%, 24.03%, 34.76% higher accuracy, 23.34%, 26.45%, 34.44% higher precision, compared with the existing systems such assdeep learning methods in protein structure prediction (FFNN-RNN-PD-SA), deep learning technique for protein structure prediction and protein design (DNN-PD-SA) and improved protein structure prediction using potentials from deep learning (DNN-SGDA-PD-SA) respectively.
Downloads
References
T. Siebenmorgen, M. Zacharias, ‘Computational prediction of protein–protein binding affinities. Wiley Interdisciplinary Reviews’, Computational Molecular Science, vol. 10, no. 3, p. e1448, 2020.
M. Al Quraishi, ‘Machine learning in protein structure prediction’, Current opinion in chemical biology, vol. 65, pp. 1–8, 2021.
M. Torrisi, G. Pollastri, Q. Le, ‘Deep learning methods in protein structure prediction’, Computational and Structural’ Biotechnology Journal, vol. 18, pp. 1301–1310, 2020.
J. Pereira, A.J. Simpkin, M.D. Hartmann, D.J. Rigden, R.M. Keegan, A.N. Lupas, ‘High-accuracy protein structure prediction in CASP14’, Proteins: Structure, Function and Bioinformatics, vol. 89, no. 12, pp. 1687–1699, 2021.
M. Zeng, F. Zhang, F.X. Wu, Y. Li, J. Wang, M. Li, ‘Protein–protein interaction site prediction through combining local and global features with deep neural networks’, Bioinformatics, vol. 36, no. 4, pp. 1114–1120, 2020.
C. Chen, Q. Zhang, B. Yu, Z. Yu, P.J. Lawrence, Q. Ma, Y. Zhang, ‘Improving protein-protein interactions prediction accuracy using XG Boost feature selection and stacked ensemble classifier’, Computers in Biology and Medicine, vol. 123, p. 103899, 2020.
X. Yang, S. Yang, Q. Li, S. Wuchty, Z. Zhang, ‘Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method’, Computational and structural biotechnology journal, vol. 18, pp. 153–161, 2020.
Y. Duan, R. Coreas, Y. Liu, D. Bitounis, Z. Zhang, D. Parviz, M. Strano, P. Demokritou, W. Zhong, ‘Prediction of protein corona on nanomaterials by machine learning using novel descriptors’, NanoImpact, vol. 17, p. 100207, 2020.
P. Rajesh, FH. Shajin, ‘Optimal allocation of EV charging spots and capacitors in distribution network improving voltage and power loss by Quantum-Behaved and Gaussian Mutational Dragonfly Algorithm (QGDA)’, Electric Power Systems Research, vol. 194, pp. 107049, 2021.
P. Rajesh, FH. Shajin, BN. Kommula, ‘An efficient integration and control approach to increase the conversion efficiency of high-current low-voltage DC/DC converter’, Energy Systems, pp. 1–20, 2021.
FH. Shajin, P. Rajesh, MR. Raja, ‘An Efficient VLSI Architecture for Fast Motion Estimation Exploiting Zero Motion Prejudgment Technique and a New Quadrant-Based Search Algorithm in HEVC’, Circuits, Systems and Signal Processing, vol. 41, no. 3, pp. 1751–74, 2022.
FH. Shajin, P. Rajesh, S. Thilaha, ‘Bald eagle search optimization algorithm for cluster head selection with prolong lifetime in wireless sensor network, ‘Journal of Soft Computing and Engineering Applications, vol. 1, no. 1, pp. 7, 2020.
S.C. Pakhrin, B. Shrestha, B. Adhikari, D.B. Kc, ‘Deep learning-based advances in protein structure prediction’, International Journal of Molecular Sciences, vol. 22, no. 11, p. 5553, 2021.
B. Niu, C. Liang, Y. Lu, M. Zhao, Q. Chen, Y. Zhang, L. Zheng, K.C. Chou, ‘Glioma stages prediction based on machine learning algorithm combined with protein-protein interaction networks’, Genomics, vol. 112, no. 1, pp. 837–847, 2020.
R. Chowdhury, N. Bouatta, S. Biswas, C. Floristean, A. Kharkar, K. Roy, C. Rochereau, G. Ahdritz, J. Zhang, GM. Church, PK. Sorger, ‘Single-sequence protein structure prediction using a language model and deep learning’, Nature Biotechnology, pp. 1–7, 2022.
M. Zeng, F. Zhang, F.X. Wu, Y. Li, J. Wang, M. Li, ‘Protein–protein interaction site prediction through combining local and global features with deep neural networks’, Bioinformatics, vol. 36, no. 4, pp. 1114–1120, 2020.
A.H. Mahmoud, M.R. Masters, Y. Yang, M.A. Lill, ‘Elucidating the multiple roles of hydration for accurate protein-ligand binding prediction via deep learning’, Communications Chemistry, vol. 3, no. 1, pp. 1–13, 2020.
K. Sato, M. Akiyama, Y. Sakakibara, ‘RNA secondary structure prediction using deep learning with thermodynamic integration’, Nature communications, vol. 12, no. 1, pp. 1–9, 2021.
A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, ‘Improved protein structure prediction using potentials from deep learning’, Nature, vol. 577, no. 7792, pp. 706–710, 2020.
https://www.kaggle.com/shahir/protein-data-set#pdb_data_seq.csv
P. Hajibabaee, F. Pourkamali-Anaraki, M.A. Hariri-Ardebili, ‘Kernel matrix approximation on class-imbalanced data with an application to scientific simulation’, IEEE Access, vol. 9, pp. 83579–83591, 2021.
A. Rain, M.E. Saritac, ‘HydroPower Plant Planning for Resilience Improvement of Power Systems using Fuzzy-Neural based Genetic Algorithm’, arXiv preprint arXiv: 2106.12042, 2021.
P. Goyal, P. Benner, ‘Discovery of nonlinear dynamical systems using a Runge–Kutta inspired dictionary-based sparse regression approach’, Proceedings of the Royal Society A, vol. 478, no. 2262, pp. 20210883, 2022.
M. Torrisi, G. Pollastri, Q. Le, ‘Deep learning methods in protein structure prediction’, Computational and Structural Biotechnology Journal, vol. 18, pp. 1301–1310, 2020.
R. Pearce, Y. Zhang, ‘Deep learning techniques have significantly impacted protein structure prediction and protein design’, Current opinion in structural biology, vol. 68, pp. 194–207, 2021.
A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, H. Penedones, ‘Improved protein structure prediction using potentials from deep learning’, Nature, vol. 577, no. 7792, pp. 706–710, 2020.
A. Kryshtafovych, T. Schwede, M. Topf, K. Fidelis, J. Moult, ‘Critical assessment of methods of protein structure prediction (CASP)—Round XIV’, Proteins: Structure, Function and Bioinformatics, vol. 89, no. 12, pp. 1607–1617, 2021.
J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov, D. Baker, ‘Improved protein structure prediction using predicted interresidue orientations’, Proceedings of the National Academy of Sciences, vol. 117, no. 3, pp. 1496–1503, 2020.