A Study on the Translation of Spoken English from Speech to Text
DOI:
https://doi.org/10.13052/jicts2245-800X.1244Keywords:
Speech recognition, spoken English, machine translation, speech-to-textAbstract
Rapid translation of spoken English is conducive to international communication. This paper briefly introduces a convolutional neural network (CNN) algorithm for converting English speech to text and a long short-term memory (LSTM) algorithm for machine translation of English text. The two algorithms were combined for spoken English translation. Then, simulation experiments were performed by comparing the speech recognition performance among the CNN algorithm, the hidden Markov model, and the back-propagation neural network algorithm and comparing the machine translation performance with the LSTM algorithm and the recurrent neural network algorithm. Moreover, the performance of the spoken English translation algorithms combining different recognition algorithms was compared. The results showed that the CNN speech recognition algorithm, the LSTM machine translation algorithm and the combined spoken English translation algorithm had the best performance and sufficient anti-noise ability. In conclusion, utilizing a CNN for converting English speech to texts and LSTM for machine translation of the converted English text can effectively enhance the performance of translating spoken English.
Downloads
References
J. Sangeetha, S. Jothilakshmi, ‘Speech translation system for English to Dravidian languages’, Appl. Intell., vol. 46, no. 3, pp. 1–17, 2016.
A. S. Dhanjal, W. Singh, ‘An automatic machine translation system for multi-lingual speech to Indian sign language’, Multimed. Tools Appl., vol. 81, no. 3, pp. 4283–4321, 2022.
Y. Wu, Y. Qin, ‘Machine translation of English speech: Comparison of multiple algorithms’, J. Intell. Syst., vol. 31, no. 1, pp. 159–167, 2022.
V. H. Vu, Q. P. Nguyen, K. H. Nguyen, J. C. Shin, C. Y. Ock, ‘Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags’, IEICE T. Inf. Syst., vol. E103.D, no. 4, pp. 866–873, 2020.
S. Wang, ‘Recognition of English speech – using a deep learning algorithm’, J. Intell. Syst., vol. 32, no. 1, pp. 225–237, 2023.
S. Shimizu, C. Chu, S. Li, S. Kurohashi, ‘Cross-Lingual Transfer Learning for End-to-End Speech Translation’, J. Nat. Lang. Process., vol. 29, no. 2, pp. 611–637, 2022.
A. Slim, A. Melouah, “Low Resource Arabic Dialects Transformer Neural Machine Translation Improvement through Incremental Transfer of Shared Linguistic Features’, Arab. J. Sci. Eng., vol. 49, no. 9, pp. 12393–12409, 2024.
Y. Xiao, L. Wu, J. Guo, J. Li, M. Zhang, T. Qin, T. Y. Liu, “A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond’, IEEE T. Pattern Anal., vol. 45, no. 10, pp. 11407–11427, 2023.
S. Matsuda, T. Hayashi, Y. Ashikari, Y. Shiga, H. Kashioka, K. Yasuda, H. Okuma, M. Uchiyama, E. Sumita, H. Kawai, S. Nakamura, ‘Development of the “VoiceTra” Multi-Lingual Speech Translation System’, IEICE T. Inf. Syst., vol. E100.D, no. 4, pp. 621–632, 2017.
K. Soky, M. Mimura, T. Kawahara, C. Chu, S. Li, C. Ding, S. Sam, ‘TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies’, Int. J. Asian Lang. Process., vol. 31, pp. 2250007:1–2250007:21, 2022.
Y. Zhou, Y. Yuan, X. Shi, ‘A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks’, Neural Comput. Appl., vol. 36, no. 15, pp. 8641–8656, 2024.
T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura, ‘An end-to-end model for cross-lingual transformation of paralinguistic information’, Mach. Transl., vol. 2018, no. 2, pp. 1–16, 2018.
H. H. O. Nasereddin, A. A. R. Omari, ‘Classification Techniques for Automatic Speech Recognition (ASR) Algorithms used with Real Time Speech Translation’, 2017 Computing Conference, vol. 2017, pp. 200–207, 2017.
C. Long, S. Wang, ‘Music classroom assistant teaching system based on intelligent speech recognition’, J. Intell. Fuzzy Syst., vol. 2021, no. 14, pp. 1–10, 2021.
L. M. Lee, H. H. Le, F. R. Jean, ‘Improved hidden Markov model adaptation method for reduced frame rate speech recognition’, Electron. Lett., vol. 53, no. 14, pp. 962–964, 2017.
Y. Wang, Y. Lu, ‘UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation’, Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 2014.
N. Hammami, M. Bedda, F. Nadir, ‘The second-order derivatives of MFCC for improving spoken Arabic digits recognition using Tree distributions approximation model and HMMs’, International Conference on Communications and Information Technology, vol. 2012, pp. 1–5, 2012.