Research on End-to-end Voiceprint Recognition Model Based on Convolutional Neural Network

Authors

DOI:

https://doi.org/10.13052/jwe1540-9589.20511

Keywords:

Convolutional neural network; End-to-end voiceprint recognition; Voiceprint recognition model.

Abstract

Speech signal is a time-varying signal, which is greatly affected by individual and environment. In order to improve the end-to-end voice print recognition rate, it is necessary to preprocess the original speech signal to some extent. An end-to-end voiceprint recognition algorithm based on convolutional neural network is proposed. In this algorithm, the convolution and down-sampling of convolutional neural network are used to preprocess the speech signals in end-to-end voiceprint recognition. The one-dimensional and two-dimensional convolution operations were established to extract the characteristic parameters of Meier frequency cepstrum coefficient from the preprocessed signals, and the classical universal background model was used to model the recognition model of voice print. In this study, the principle of end-to-end voiceprint recognition was firstly analyzed, and the process of end-to-end voice print recognition, end-to-end voice print recognition features and Res-FD-CNN network structure were studied. Then the convolutional neural network recognition model was constructed, and the data were preprocessed to form the convolutional layer in frequency domain and the algorithm was tested.

Downloads

Download data is not yet available.

Author Biographies

Hong Zhao, School of Computer Science, Lanzhou University of Technology, Gansu, Lanzhou, 730050, China

Zhao Hong (1971–), Male, from Xihe, Gansu, Professor, Ph.D., Han nationality, received a bachelor’s degree from Northwest Normal University in 1993 and a doctorate degree from Xinjiang University in 2010. Since 1993, he has entered the School of Computer Science, Lanzhou University of Technology, and became a full professor in 2010. He has authored 4 academic works and more than 30 reviewer papers. Current research interests include deep learning, embedded systems and natural language processing.

Lupeng Yue, School of Computer Science, Lanzhou University of Technology, Gansu, Lanzhou, 730050, China

Yue Lupeng (1995–) Male, from Weihai, Shandong, received a bachelor’s degree from Lanzhou University of Technology in 2018. Current research interests include deep learning and speaker recognition.

Weijie Wang, School of Computer Science, Lanzhou University of Technology, Gansu, Lanzhou, 730050, China

Wang Weijie (1994–), Female, from Qiqihar, Heilongjiang, received a bachelor’s degree from Harbin Finance University in 2016. Current research interests include deep learning and speaker recognition.

Zeng Xiangyan, Department of Mathematics and Computer Science, Fort Valley State University, Fort Valley, GA 31030, Georgia

Zeng Xiangyan received a bachelor’s degree in computer science and information engineering and a master’s degree in computer applications from Hefei University of Technology, China in 1987 and 1990, respectively, and a master’s degree in electrical and electronic engineering and a doctorate degree in computer science from the University of Ryukyu, Japan in 2001, in 2004. He is currently a professor in the Department of Mathematics and Computer Science at Fort Valley State University in the United States. Wrote more than 40 reference papers. His research interests include computer vision, image processing, pattern recognition and machine learning.

References

Q B Nguyen, T T Vu, M L Chi. Improving Acoustic Model for Vietnamese Large Vocabulary Continuous Speech Recognition System Using Deep Bottleneck Features. Advances in Intelligent Systems and Computing, 2015, 326:49–60.

Q Hu, B Y Liu. Speaker recognition algorithm based on convolutional neural network classification. Information Network Security, 2016(04):55–60.

C Zhang, S H Luo, H T Yue, et al. Transformer core voice print pattern recognition method based on MEL time spectrum convolutional neural network. High Voltage Technology, 2020, 327(02):50–60.

Lingfei Yu, Qiang Liu Research and application of voice print recognition method based on deep loop network. Application Research of Computers, 2019, 036(001):153–158.

D Y Du, L J Lu, R Y Fu, et al. Palm vein recognition: An end-to-end convolutional neural network approach. Journal of Southern Medical University, 2019, 039(002):207–214.

C W Sun, C Wen, K Xie, et al. Small sample voice print recognition method based on depth transfer model. Computer Engineering and Design, 2018, 39(12):224–230.

A Nagrani, J S Chung, W Xie, et al. Voxceleb: Large-scale speaker verification in the wild. Computer speech and language, 2020, 60(3):1–15.

J Liu, Y Hu, Huang Heyu. End-to-end deep convolutional neural network speech recognition. Computer Applications and Software, 2020, 037(004):192–196.

Y Zhao, Y Wang, M G Zhang. Recorded speech detection algorithm based on convolutional neural network. Computer Technology and Development, 2020, 274(02):177–183.

Y C Li, Z F Yan, G P Yan. Edge – Based Double Convolutional Neural Network and Its Visualization. Computer Engineering and Science, 2019, 41(10):1837–1845.

Ji S, Xu W, Yang M, et al. 3D Convolutional Neural Networks for Human Action Recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(1):221–231.

Published

2021-08-26

Issue

Section

Advanced Practice in Web Engineering