Data Analytics on Eco-Conditional Factors Affecting Speech Recognition Rate of Modern Interaction Systems

Authors

  • A. C. Kaladevi Department of Computer Science and Engineering, Sona College of Technology, Salem, India https://orcid.org/0000-0002-7890-7601
  • R. Saravanakumar Department of CSE, Dayananda Sagar Academy of Technology and Management, Bangalore 560082, India
  • K. Veena Department of Computer Science, J.K.K.Nataraja College of Arts & Science, Komarapalayam, Namakkal Dt., India
  • V. Muthukumaran Department of Mathematics, School of Applied Sciences, REVA University, Bengaluru, Karnataka
  • N. Thillaiarasu School of Computing and Information Technology, REVA University, Bengaluru, India
  • S. Satheesh Kumar Department of Computer Science, School of Applied Sciences, REVA University, Bengaluru

DOI:

https://doi.org/10.13052/jmm1550-4646.1849

Keywords:

Interaction System, Eco-conditional factors, Recognition rate, ambient noise, human noise, utterance speed, frequency

Abstract

Speech-based Interaction systems contribute to the growing class of contemporary interactive techniques (Human-Computer Interactive system), which have emerged quickly in the last few years. Versatility, multi-channel synchronization, sensitivity, and timing are all notable characteristics of speech recognition. In addition, several variables influence the precision of voice interaction recognition. However, few researchers have done a significant study on the five eco-condition variables that tend to affect speech recognition rate (SRR): ambient noise, human noise, utterance speed, and frequency. The principal strategic goal of this research is to analyze the influence of the four variables mentioned earlier on SRR, and it includes many stages of experimentation on mixed noise speech data. The sparse representation-based analyzing technique is utilized to analyze the effects. Speech recognition is not noticeably affected by a person’s usual speaking pace. As a result, high-frequency voice signals are more easily recognized (∼∼98.12%) than low-frequency speech signals in noisy environments. By performing the experiments, the test results may help design the distributive controlling and commanding systems.

Downloads

Download data is not yet available.

Author Biographies

A. C. Kaladevi, Department of Computer Science and Engineering, Sona College of Technology, Salem, India

A. C. Kaladevi working as Professor in the Department of Computer Science and Engineering at Sona College of Technology, Salem, India has more than 25 years of teaching experience. She obtained her B.Sc degree in Computer Science from Cauvery College for Women, Tiruchirapalli followed by MCA at PSG College of Technology, Coimbatore. She completed M.Phil Computer Science from Manonmaniam Sundaranar University, Tirunelveli and M.E Computer Science and Engineering from V.M. K.V. Engineering College, Salem which was then affiliated to Anna University, Chennai. She was awarded Ph.D degree in Information and Communication Engineering during 2014 by Anna University, Chennai. Her research interest includes Data Analytics, Cloud Computing and Image Processing. She has published 21 papers in various International Journals and presented 32 papers in both national and international conferences. She has co-authored 3 books in computer science discipline. She has conducted 2 national workshops one on “Big data and Cloud for bigger transformations” funded by Department of Science and Technology (DST), New Delhi under BDI Scheme and the other one on “Empowering the Tribal Women in and around Yercaud Hills, Salem by inculcating self-employment opportunities using innovative ICT based skill development techniques” funded by Tamil Nadu State Council for Science and Technology (TNSCST), Chennai under Dissemination of Innovative Technology Scheme. She has guided more than 25 PG and 40 UG projects out of which 3 UG projects were funded by TNSCST under Students Project Scheme. As an enthusiastic student counselor she has given a great moral support to students who are now placed in a much renowned positions in their career.

R. Saravanakumar, Department of CSE, Dayananda Sagar Academy of Technology and Management, Bangalore 560082, India

R. Saravanakumar currently working as an Associate Professor in the Department of Computer Science and Engineering, Dayananda Sagar Academy of Technology and Management, Bangalore 560082. He has also served as an Assistant Professor at Jayam College of Engineering and Technology, Dharmapuri from June 2006 to December 2014. Obtained his B.E., in Computer Science and Engineering from Bharathiyar University in 2003 and received his M.E., in Computer Science and Engineering from Anna University, Chennai in 2007. He received his Ph.D., Degree from Anna University, Chennai in 2015, he has published more than 20 research papers in refereed, Springer, and IEEE Xplore conferences. he has organized several workshops, summer internships, and expert lectures for students. He has worked as a session chair, conference steering committee member, editorial board member, and reviewer in Springer Journal and IEEE Conferences. His area of interest includes Machine learning, and Deep Learning.

K. Veena, Department of Computer Science, J.K.K.Nataraja College of Arts & Science, Komarapalayam, Namakkal Dt., India

K. Veena currently working as an Assistant Professor in the Department of Computer Science, J.K.K.Nataraja College of Arts & Science, Namakkal(Dt). She has completed M.Phil. – Computer Science in 2006 and Pursuing Ph.D. in Computer Science at Periyar University, Salem. She has been working as Assistant Professor at J.K.K.Nataraja College of Arts & Science, Namakkal Dt. since 2005. She has completed one Minor Research Project funded by UGC. She has published 2 research papers in International Journals. Her area of interest includes Data Analytics, Neural Networks, Machine Learning and Deep Learning.

V. Muthukumaran, Department of Mathematics, School of Applied Sciences, REVA University, Bengaluru, Karnataka

V. Muthukumaran was born in Vellore, Tamilnadu, India, in 1988. He received the B.Sc. degree in Mathematics from the Thiruvalluvar University Serkkadu, Vellore, India, in 2009, and the M. Sc. degrees in Mathematics from the Thiruvalluvar University Serkkadu, Vellore, India, in 2012. The M. Phil. Mathematics from the Thiruvalluvar University Serkkadu, Vellore, India, in 2014 and Ph.D. degrees in Mathematics from the School of Advanced Sciences, Vellore Institute of Technology, Vellore in 2019. He has 4 years of teaching experience and 8 years of research experience, and he has published various research papers in high-quality journals Springer, Elsevier, IGI Global, Emerald, River etc. At present, he has a working Assistant Professor in the Department of Mathematics, REVA University Bangalore, India. His current research interests include Algebraic cryptography, Fuzzy Image Processing, Machine learning, and Data mining. His current research interests include Fuzzy Algebra, Fuzzy Image Processing, Data Mining, and Cryptography. Dr. V. Muthukumaran is a Fellow of the International Association for Cryptologic Research (IACR), India; He is a Life Member of the IEEE. He has published more than 40 research articles and 4 book chapters in peer-reviewed international journals. He has published 6 IPR patents in algebraic with IoT applications. He also presented 25 papers presented at national and international conferences. He has also been a guest editor of several international journals including, Journal of Intelligent Manufacturing (Springer), International Journal of Intelligent Computing and Cybernetics, International Journal of e-Collaboration (IJeC), International Journal of Pervasive Computing and Communications (IJPCC), International Journal of System of Assurance Engineering(IJSA), International Journal Speech Technology (IJST)-Springer, Journal of Reliable Intelligent Environments (JRIE).

N. Thillaiarasu, School of Computing and Information Technology, REVA University, Bengaluru, India

N. Thillaiarasu currently working as an Associate Professor in the School of Computing and Information Technology, REVA University, Bengaluru, He has also served as an Assistant Professor at Galgotias University, Greater Noida from July 2019 to December 2020. He worked 7.3 Years as an Assistant Professor in the Department of Computer Science and Engineering, SNS College of Engineering, Coimbatore. Obtained his B.E., in Computer Science and Engineering from Selvam College of Technology in 2010 and received his M.E., in Software Engineering from Anna University Regional Centre, Coimbatore in 2012. He received his Ph.D., Degree from Anna University, Chennai in 2019, he has published more than 22 research papers in refereed, Springer, and IEEE Xplore conferences. he has organized several workshops, summer internships, and expert lectures for students. He has worked as a session chair, conference steering committee member, editorial board member, and reviewer in Springer Journal and IEEE Conferences. He is an Editor board Member of editing books titled “Machine Learning Methods for Engineering Application Development” Bentham Science. He is also working as editor for the title, “Cyber Security for Modern Engineering Operations Management: Towards Intelligent Industry”, Design Principle, Modernization and Techniques in Artificial Intelligence for IoT: Advance Technologies, Developments, and Challenges” CRC Press Tylor and Francis, His area of interest includes Cloud Computing, Security, IoT, and Machine Learning.

S. Satheesh Kumar, Department of Computer Science, School of Applied Sciences, REVA University, Bengaluru

S. Satheesh Kumar currently working as an Assistant Professor & Coordinator in the Department of Computer Science, School of Applied Sciences, REVA University, Bangalore. Currently Pursuing Ph.D. in Computer Applications at Visvesvaraya Technological University, Karnataka. He has also served as an Assistant Professor at Acharya Bangalore B-School Bangalore, from Jan 2018 to July 2019. He worked 3.5 Years as an Assistant Professor in the Department of MCA, Dayananda Sagar Academy of Technology and Management Bangalore from August 2014 to Jan 2018. He worked 4 Years as an Assistant Professor in the Department of MCA, Sri Nandhanam College of Engineering & Technology Tirupattur, Tamilnadu from August 2009 to Aug 2013. Obtained his B.Sc. in Electronics from MGR Arts & Science College in 2005 and received his MCA from Priyadarshini Engineering College, Anna University Chennai in 2008. He has published more than 7 research papers and 3 Book Chapters in refereed, Springer, Elsevier, IGI Global and IEEE Explore conferences. He is also a reviewer in Springer, Elsevier, Emerald Group publishers, IGI global. He has organized several workshops, student development program, and expert lectures for students. His area of interest includes Data Security, Cloud Computing, IoT, and Machine Learning, Network Security.

References

“NOISEX-92.” http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html. [Online] Accessed: 2017-03-30. S18

Ozerov and C. Févotte, “Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 3, pp. 550–563, 2010. S41

Aron, J. (2011). How innovative is Apple’s new voice assistant, Siri? In: Elsevier. M10

B. Laperre, J. Amaya, and G. Lapenta, “Dynamic Time Warping as a New Evaluation for Dst Forecast With Machine Learning,” Frontiers in Astronomy and Space Sciences, vol. 7, Jul. 2020. DWt

B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh, “Non-negative matrix factorization based compensation of music for automatic speech recognition,” in INTERSPEECH, pp. 717–720, 2010. S77

Bellegarda, J. R. (2014). Spoken language understanding for natural interaction: The Siri experience. In Natural Interaction with Robots, Knowbots and Smartphones (pp. 3–14): Springer. M11

C. Couvreur, V. Fontaine, P. Gaunard, and C. G. Mubikangiey, “Automatic classification of environmental noise events by hidden Markov models,” Applied Acoustics, vol. 54, no. 3, pp. 187–206, 1998. S2

C. Joder and B. Schuller, “Exploring nonnegative matrix factorization for audio classification: Application to speaker recognition,” in Speech Communication; 10. ITG Symposium; Proceedings of, pp. 1–4, VDE, 2012. S74

C. Müller, Speaker Classification II. Springer, 2007. S1

C. Tzagkarakis and A. Mouchtaris, “Sparsity based robust speaker identification using a discriminative dictionary learning approach,” in Signal Processing Conference (EUSIPCO), 2013 Proceedings of the 21st European, pp. 1–5, IEEE, 2013. S73

D. O’Shaughnessy (1989), “Enhancing speech degraded by additive noise or interfering speakers”. IEEE Commun. Mag., February 1989, pp. 46–52.

Deng, L., Li, J., Huang, J.-T., Yao, K., Yu, D., Seide, F., Williams, J. (2013). Recent advances in deep learning for speech research at Microsoft. Paper presented at the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. M9

G. J. Mysore, P. Smaragdis, and B. Raj, “Non-negative hidden Markov modeling of audio with application to source separation,” in International Conference on Latent Variable Analysis and Signal Separation, pp. 140–148, Springer, 2010. S75

J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Transactions on Information theory, vol. 50, no. 10, pp. 2231–2242, 2004. S20

-J. F. Gemmeke, T. Virtanen, and A. Hurmalainen, “Exemplar-based sparse representations for noise robust automatic speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2067–2080, 2011. S76

J. Hernando and C. Nadeu (19941, “Speech recognition in noisy car environment based on OSALPC representation and robust similarity measuring techniques”, Proc. IEEE Internat. Con& Acoust. Speech Signal Process., Adelaide, Australia, April 1994, Vol. II, pp. 69-72.

J. Laroche, “Frequency-domain techniques for high-quality voice modification,” in Proc. of the 6th Int. Conference on Digital Audio Effects, Citeseer, 2003. S13

J. Le Roux, F. Weninger, and J. R. Hershey, “Sparse NMF–half-baked or well done?,” Mitsubishi Electric Research Labs (MERL), Cambridge, MA, USA, Tech. Rep., no. TR2015-023, 2015. S80

J. Nikunen and T. Virtanen, “Object-based audio coding using non-negative matrix factorization for the spectrogram representation,” in Audio Engineering Society Convention 128, Audio Engineering Society, 2010. S9

J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1,” NASA STI/Recon technical report, vol. 93, 1993. S43

J.M. Salavedra, E. Masgrau, A. Moreno and X. Jove (1993), “A speech enhancement system using higher order AR estimation in real environments”, Proc. European Con& Speech Technology, Berlin, 1993, Vol. 1, pp. 223–226.

J.S. Lim and A.V. Oppenheim (1978), “All pole modeling of degraded speech”, IEEE Trans. Acoust. Speech Signal Process., Vol. 26, pp. 197–210.

J.S. Lim and A.V. Oppenheim (1983), “Ah pole modeling of degraded speech”, in Speech Enhancement, ed. by J. Lim (Prentice-Hall, Englewood Cliffs, NJ). pp. 101–114.

K. S. Rao and B. Yegnanarayana, “Prosody modification using instants of significant excitation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 972–980, 2006. S10

K. V. V. Girish, T. V. Ananthapadmanabha, and A. G. Ramakrishnan, “Cosine similarity based dictionary learning and source recovery for classification of diverse audio sources,” in India Conference (INDICON), 2016 IEEE Annual, IEEE, 2016. S44

L. Lee and R. C. Rose, “Speaker normalization using efficient frequency warping procedures,” in Acoustics, Speech, and Signal Processing (ICASSP), 1996 IEEE International Conference on, vol. 1, pp. 353–356, IEEE, 1996. S11

L.M. Arslan and J.H.L. Hansen (19941, “Minimum cost based phoneme class detection for improved iterative speech enhancement”, Proc. IEEE Internat. Conf Acoust. Speech Signal Process., Adelaide, Australia, April 1994, Vol. II, pp. 45–48

M. D. Plumbley, T. Blumensath, L. Daudet, R. Gribonval, and M. E. Davies, “Sparse representations in audio and music: from coding to source separation,” Proceedings of the IEEE, vol. 98, no. 6, pp. 995–1005, 2010. S8

M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer Publishing Com‘pany, Incorporated, 1st ed., 2010. S19

M. Feder, A.V. Oppenheim and E. Weinstein (19891, “Maximum likelihood noise cancellation using the EM algorithm”, IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-37, No. 2, pp. 204–216.

N. Mohammadiha, P. Smaragdis, and A. Leijon, “Supervised and unsupervised speech enhancement using nonnegative matrix factorization,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2140–2151, 2013. S6

P. C. Loizou, “Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 857–869, 2005. S4

Pandiyan, Sanjeevi, Ashwin M., Manikandan R., Karthick Raghunath K.M., and Anantha Raman G.R. “Heterogeneous Internet of Things Organization Predictive Analysis Platform for Apple Leaf Diseases Recognition.” Computer Communications 154 (March 2020): 99–110.

R. G. Malkin, Machine listening for context-aware computing. PhD thesis, Carnegie Mellon University Pittsburgh, PA, 2006. S12

Rabiner, L. R., Juang, B.-H., and Rutledge, J. C. (1993). Fundamentals of speech recognition (Vol. 14): PTR Prentice Hall Englewood Cliffs. M1

S. Nandkumar and J.H.L. Hansen (1994), “Speech enhancement based on a new set of auditory constrained parameters”, Proc. IEEE Internal. Conf. Acoust. Speech Signal Process.. Adelaide, Australia, April 1994. Vol. I, pp. 1–4.

S. Zubair, F. Yan, and W. Wang, “Dictionary learning based sparse coefficients for audio classification with max and average pooling,” Digital Signal Processing, vol. 23, no. 3, pp. 960–970, 2013. S79

S.F. Boll (19791, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. Acoust. Speech Signal Process., April 1979, Vol. ASSP-27, No. 2, pp. 113–120.

T. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria,” IEEE transactions on audio, speech, and language processing, vol. 15, no. 3, pp. 1066–1074, 2007. S3

V. Zue, S. Seneff, and J. Glass, “Speech database development at MIT: TIMIT and beyond,” Speech Communication, vol. 9, no. 4, pp. 351–356, 1990. S16

W. B. Kleijn and K. K. Paliwal, eds., Speech Coding and Synthesis. New York, NY, USA: Elsevier Science Inc., 1995. S14

Wagner, P., Malisz, Z., and Kopp, S. (2014). Gesture and speech in interaction: An overview. In: Elsevier. M2

Y. Ephraim and D. Malah (19841, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator”, ZEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-32, pp. 1109–1112.

Y. Hu and P. C. Loizou, “Subjective comparison and evaluation of speech enhancement algorithms,” Speech communication, vol. 49, no. 7, pp. 588–601, 2007. S5

Y.-C. Cho and S. Choi, “Nonnegative features of spectro-temporal sounds for classification,” Pattern Recognition Letters, vol. 26, no. 9, pp. 1327–1336, 2005. S78

Published

2022-03-16

How to Cite

Kaladevi, A. C. ., Saravanakumar, R. ., Veena, K. ., Muthukumaran, V. ., Thillaiarasu, . N. ., & Kumar, S. S. . (2022). Data Analytics on Eco-Conditional Factors Affecting Speech Recognition Rate of Modern Interaction Systems. Journal of Mobile Multimedia, 18(04), 1153–1176. https://doi.org/10.13052/jmm1550-4646.1849

Issue

Section

Enabling AI Technologies Towards Multimedia Data Analytics for Smart Healthcare