FOREX Press I. J. of Electrical & Electronics Research
Support Open Access

Research Article |

A Systematic Approach of Advanced Dilated Convolution Network for Speaker Identification

Author(s): Hema Kumar Pentapati1 and Sridevi K2

Publisher : FOREX Publication

Published : 05 February 2023

e-ISSN : 2347-470X

Page(s) : 25-30




Hema Kumar Pentapati*, Research Scholar, Department of EECE, GITAM School of Technology, Visakhapatnam, India; Email: hpentapa@gitam.in

Sridevi K, Associate Professor, Department of EECE, GITAM School of Technology, Visakhapatnam, India, Email: skataman@gitam.edu.

    [1] M. M. Kabir, M. F. Mridha, J. Shin, I. Jahan, and A. Q. Ohi, “A Survey of Speaker Recognition: Fundamental Theories, Recognition Methods and Opportunities,” IEEE Access, vol. 9, pp. 79236–79263, 2021, doi: 10.1109/ACCESS.2021.3084299. [Cross Ref]
    [2] A. Chowdhury and A. Ross, “Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals,” IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 1616–1629, 2020, doi: 10.1109/TIFS.2019.2941773. [Cross Ref]
    [3] R. Jahangir et al., “Text-Independent Speaker Identification through Feature Fusion and Deep Neural Network,” IEEE Access, vol. 8, pp. 32187–32202, 2020, doi: 10.1109/ACCESS.2020.2973541. [Cross Ref]
    [4] S. Nainan and V. Kulkarni, “Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN,” Int. J. Speech Technol., vol. 24, no. 4, pp. 809–822, 2021, doi: 10.1007/s10772-020-09771-2. [Cross Ref]
    [5] H. Meng, T. Yan, F. Yuan, and H. Wei, “Speech Emotion Recognition from 3D Log-Mel Spectrograms with Deep Learning Network,” IEEE Access, vol. 7, pp. 125868–125881, 2019, doi: 10.1109/ACCESS.2019.2938007. [Cross Ref]
    [6] Mahesh K. Singh, S. Manusha, K.V. Balaramakrishna and Sridevi Gamini (2022), Speaker Identification Analysis Based on Long-Term Acoustic Characteristics with Minimal Performance. IJEER 10(4), 848-852. DOI: 10.37391/IJEER.100415. [Cross Ref]
    [7] Z. Liu, Z. Wu, T. Li, J. Li, and C. Shen, “GMM and CNN Hybrid Method for Short Utterance Speaker Recognition,” IEEE Trans. Ind. Informatics, vol. 14, no. 7, pp. 3244–3252, 2018, doi: 10.1109/TII.2018.2799928.
    [8] R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar, and T. Alhussain, “Speech Emotion Recognition Using Deep Learning Techniques: A Review,” IEEE Access, vol. 7, pp. 117327–117345, 2019, doi: 10.1109/ACCESS.2019.2936124. [Cross Ref]
    [9] X. Wang, F. Xue, W. Wang, and A. Liu, “A network model of speaker identification with new feature extraction methods and asymmetric BLSTM,” Neurocomputing, vol. 403, pp. 167–181, 2020, doi: 10.1016/j.neucom.2020.04.041. [Cross Ref]
    [10] Mahesh K. Singh, P. Mohana Satya, Vella Satyanarayana and Sridevi Gamini (2022), Speaker Recognition Assessment in a Continuous System for Speaker Identification. IJEER 10(4), 862-867. DOI: 10.37391/IJEER.100418. [Cross Ref]
    [11] M. Farooq, F. Hussain, N. K. Baloch, F. R. Raja, H. Yu, and Y. Bin Zikria, “Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network,” Sensors (Switzerland), vol. 20, no. 21, pp. 1–18, 2020, doi: 10.3390/s20216008. [Cross Ref]
    [12] T. W. Sun, “End-to-End Speech Emotion Recognition with Gender Information,” IEEE Access, vol. 8, pp. 152423–152438, 2020, doi: 10.1109/ACCESS.2020.3017462. [Cross Ref]
    [13] S. Hourri and J. Kharroubi, “A deep learning approach for speaker recognition,” Int. J. Speech Technol., vol. 23, no. 1, pp. 123–131, 2020, doi: 10.1007/s10772-019-09665-y. [Cross Ref]
    [14] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digit. Signal Process. A Rev. J., vol. 10, no. 1, pp. 19–41, 2000, doi: 10.1006/dspr.1999.0361. [Cross Ref]
    [15] S. S. Tirumala, S. R. Shahamiri, A. S. Garhwal, and R. Wang, “Speaker identification features extraction methods: A systematic review,” Expert Syst. Appl., vol. 90, pp. 250–271, 2017, doi: 10.1016/j.eswa.2017.08.015. [Cross Ref]
    [16] S. Hourri, N. S. Nikolov, and J. Kharroubi, “Convolutional neural network vectors for speaker recognition,” Int. J. Speech Technol., vol. 24, no. 2, pp. 389–400, 2021, doi: 10.1007/s10772-021-09795-2. [Cross Ref]
    [17] T. Lin and Y. Zhang, “Speaker recognition based on long-term acoustic features with analysis sparse representation,” IEEE Access, vol. 7, pp. 87439–87447, 2019, doi: 10.1109/ACCESS.2019.2925839. [Cross Ref]
    [18] A. Q. Ohi, M. F. Mridha, M. A. Hamid, and M. M. Monowar, “Deep Speaker Recognition: Process, Progress, and Challenges,” IEEE Access, vol. 9, pp. 89619–89643, 2021, doi: 10.1109/ACCESS.2021.3090109. [Cross Ref]
    [19] M. Chen, X. He, J. Yang, and H. Zhang, “3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition,” IEEE Signal Process. Lett., vol. 25, no. 10, pp. 1440–1444, 2018, doi: 10.1109/LSP.2018.2860246. [Cross Ref]
    [20] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2015-August, pp. 5206–5210, Aug. 2015, doi: 10.1109/ICASSP.2015.7178964. [Cross Ref]
    [21] R. Jahangir, Y. W. Teh, F. Hanif, and G. Mujtaba, Deep learning approaches for speech emotion recognition: state of the art and research challenges, vol. 80, no. 16. Multimedia Tools and Applications, 2021. doi: 10.1007/s11042-020-09874-7. [Cross Ref]
    [22] T. J. Sefara and T. B. Mokgonyane, “Emotional Speaker Recognition based on Machine and Deep Learning,” 2020 2nd Int. Multidiscip. Inf. Technol. Eng. Conf. IMITEC 2020, 2020, doi: 10.1109/IMITEC50163.2020.9334138. [Cross Ref]
    [23] S. Chakraborty and R. Parekh, “An improved approach to open set text-independent speaker identification (OSTI-SI),” in 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), 2017, pp. 51–56. doi: 10.1109/ICRCICN.2017.8234480. [Cross Ref]

Hema Kumar Pentapati and Sridevi K (2023), A Systematic Approach of Advanced Dilated Convolution Network for Speaker Identification. IJEER 11(1), 25-30. DOI: 10.37391/IJEER.110104.