A Study on the Translation of Spoken English from Speech to Text

Authors

  • Ying Zhang School of Automotive Engineering, Henan Mechanical & Electrical Vocational College, Zhengzhou, Henan 451191, China

DOI:

https://doi.org/10.13052/jicts2245-800X.1244

Keywords:

Speech recognition, spoken English, machine translation, speech-to-text

Abstract

Rapid translation of spoken English is conducive to international communication. This paper briefly introduces a convolutional neural network (CNN) algorithm for converting English speech to text and a long short-term memory (LSTM) algorithm for machine translation of English text. The two algorithms were combined for spoken English translation. Then, simulation experiments were performed by comparing the speech recognition performance among the CNN algorithm, the hidden Markov model, and the back-propagation neural network algorithm and comparing the machine translation performance with the LSTM algorithm and the recurrent neural network algorithm. Moreover, the performance of the spoken English translation algorithms combining different recognition algorithms was compared. The results showed that the CNN speech recognition algorithm, the LSTM machine translation algorithm and the combined spoken English translation algorithm had the best performance and sufficient anti-noise ability. In conclusion, utilizing a CNN for converting English speech to texts and LSTM for machine translation of the converted English text can effectively enhance the performance of translating spoken English.

Downloads

Download data is not yet available.

Author Biography

Ying Zhang, School of Automotive Engineering, Henan Mechanical & Electrical Vocational College, Zhengzhou, Henan 451191, China

Ying Zhang, born in June 1984, graduated from Zhengzhou University with a Master’s degree in June 2013. She is working at Henan Mechanical & Electrical Vocational College as a lecturer. She is interested in English education, and British and American literature.

References

J. Sangeetha, S. Jothilakshmi, ‘Speech translation system for English to Dravidian languages’, Appl. Intell., vol. 46, no. 3, pp. 1–17, 2016.

A. S. Dhanjal, W. Singh, ‘An automatic machine translation system for multi-lingual speech to Indian sign language’, Multimed. Tools Appl., vol. 81, no. 3, pp. 4283–4321, 2022.

Y. Wu, Y. Qin, ‘Machine translation of English speech: Comparison of multiple algorithms’, J. Intell. Syst., vol. 31, no. 1, pp. 159–167, 2022.

V. H. Vu, Q. P. Nguyen, K. H. Nguyen, J. C. Shin, C. Y. Ock, ‘Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags’, IEICE T. Inf. Syst., vol. E103.D, no. 4, pp. 866–873, 2020.

S. Wang, ‘Recognition of English speech – using a deep learning algorithm’, J. Intell. Syst., vol. 32, no. 1, pp. 225–237, 2023.

S. Shimizu, C. Chu, S. Li, S. Kurohashi, ‘Cross-Lingual Transfer Learning for End-to-End Speech Translation’, J. Nat. Lang. Process., vol. 29, no. 2, pp. 611–637, 2022.

A. Slim, A. Melouah, “Low Resource Arabic Dialects Transformer Neural Machine Translation Improvement through Incremental Transfer of Shared Linguistic Features’, Arab. J. Sci. Eng., vol. 49, no. 9, pp. 12393–12409, 2024.

Y. Xiao, L. Wu, J. Guo, J. Li, M. Zhang, T. Qin, T. Y. Liu, “A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond’, IEEE T. Pattern Anal., vol. 45, no. 10, pp. 11407–11427, 2023.

S. Matsuda, T. Hayashi, Y. Ashikari, Y. Shiga, H. Kashioka, K. Yasuda, H. Okuma, M. Uchiyama, E. Sumita, H. Kawai, S. Nakamura, ‘Development of the “VoiceTra” Multi-Lingual Speech Translation System’, IEICE T. Inf. Syst., vol. E100.D, no. 4, pp. 621–632, 2017.

K. Soky, M. Mimura, T. Kawahara, C. Chu, S. Li, C. Ding, S. Sam, ‘TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies’, Int. J. Asian Lang. Process., vol. 31, pp. 2250007:1–2250007:21, 2022.

Y. Zhou, Y. Yuan, X. Shi, ‘A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks’, Neural Comput. Appl., vol. 36, no. 15, pp. 8641–8656, 2024.

T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, S. Nakamura, ‘An end-to-end model for cross-lingual transformation of paralinguistic information’, Mach. Transl., vol. 2018, no. 2, pp. 1–16, 2018.

H. H. O. Nasereddin, A. A. R. Omari, ‘Classification Techniques for Automatic Speech Recognition (ASR) Algorithms used with Real Time Speech Translation’, 2017 Computing Conference, vol. 2017, pp. 200–207, 2017.

C. Long, S. Wang, ‘Music classroom assistant teaching system based on intelligent speech recognition’, J. Intell. Fuzzy Syst., vol. 2021, no. 14, pp. 1–10, 2021.

L. M. Lee, H. H. Le, F. R. Jean, ‘Improved hidden Markov model adaptation method for reduced frame rate speech recognition’, Electron. Lett., vol. 53, no. 14, pp. 962–964, 2017.

Y. Wang, Y. Lu, ‘UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation’, Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 2014.

N. Hammami, M. Bedda, F. Nadir, ‘The second-order derivatives of MFCC for improving spoken Arabic digits recognition using Tree distributions approximation model and HMMs’, International Conference on Communications and Information Technology, vol. 2012, pp. 1–5, 2012.

Downloads

Published

2025-02-19

How to Cite

Zhang, Y. . (2025). A Study on the Translation of Spoken English from Speech to Text. Journal of ICT Standardization, 12(04), 429–442. https://doi.org/10.13052/jicts2245-800X.1244

Issue

Section

Articles