A Robust Deep Learning-Based Speaker Identification System Using Hybrid Model on KUI Dataset

Subrat Kumar Nayak; Ajit Kumar Nayak; Suprava Ranjan Laha; Nrusingha Tripathy; Takialddin AI Smadi

doi:10.37391/ijeer.120446

Research Article |

A Robust Deep Learning-Based Speaker Identification System Using Hybrid Model on KUI Dataset

Author(s): Subrat Kumar Nayak, Ajit Kumar Nayak, Suprava Ranjan Laha^*, Nrusingha Tripathy, and Takialddin AI Smadi

Published In : International Journal of Electrical and Electronics Research (IJEER) Volume 12, Issue 4

Publisher : FOREX Publication

Published : 30 December 2024

e-ISSN : 2347-470X

Page(s) : 1502-1507

DOI: https://doi.org/10.37391/IJEER.120446

Abstract

Background: Speaker identification, detecting human voices using speech characteristics and acoustics, is essential in security, biometrics, IoT, and human-computer interaction (HCI). As technology advances, more innovative software and robust hardware enhance these applications. This study evaluates feature extraction, pre-processing, and deep learning methods for speaker identification in natural settings. Methods: We compared deep learning algorithms, including Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and a proposed Hybrid model. Audio files were processed using different feature extraction and pre-processing techniques. Results: The proposed Hybrid model achieved the highest accuracy at 95%, surpassing other models. LSTM followed with an accuracy of 93%. Performance metrics, including accuracy, recall, and F1 score, were used to evaluate the models. Conclusions: The study demonstrates that the Hybrid model is the most effective for speaker identification in natural settings, highlighting its potential for improved human-computer interaction and security applications.

Keywords: Mel Frequency Cepstral Coefficients, CNN, LSTM, ANN, RNN, Speaker Identification.

Subrat Kumar Nayak, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Email: subratsilicon28@gmail.com

Ajit Kumar Nayak, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Email: ajitnayak@soa.ac.in

Suprava Ranjan Laha^*, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Brainware University, Barasat, Kolkata, India; Email: supravalaha@gmail.com

Nrusingha Tripathy, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Email: nrusinghatripathy654@gmail.com

Takialddin AI Smadi, Faculty of Engineering, Jerash University, Jordan

[1] Hansen, J. H., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine, 32(6), 74-99.

[2] Nassif, A. B., Shahin, I., Hamsa, S., Nemmour, N., & Hirose, K. (2021). CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Applied Soft Computing, 103, 107141.

[3] Simić, N., Suzić, S., Nosek, T., Vujović, M., Perić, Z., Savić, M., & Delić, V. (2022). Speaker recognition using constrained convolutional neural networks in emotional speech. Entropy, 24(3), 414.

[4] Meftah, A. H., Mathkour, H., Kerrache, S., & Alotaibi, Y. A. (2020). Speaker identification in different emotional states in Arabic and English. IEEE Access, 8, 60070-60083.

[5] Nayak, S. K., Nayak, A. K., Mishra, S., & Mohanty, P. (2023). Deep learning approaches for speech command recognition in a low resource KUI language. International Journal of Intelligent Systems and Applications in Engineering, 11(2), 377-386.

[6] Bimbot, F., Bonastre, J., Fredouille, C. et al. A Tutorial on Text-Independent Speaker Verification. EURASIP J. Adv. Signal Process. 2004, 101962 (2004). https://doi.org/10.1155/S1110865704310024

[7] Sztah´o, D´avid, Gy¨orgySzasz´ak, and Andr´as Beke.” Deep learning methodsin speaker recognition: a review.” arXiv preprint arXiv:1911.06615(2019).

[8] Tripathi, S., & Bhatnagar, S. (2012, November). Speaker recognition. In 2012 Third International Conference on Computer and Communication Technology (pp. 283-287). IEEE.

[9] Wang, M., Sirlapu, T., Kwasniewska, A., Szankin, M., Bartscherer, M., & Nicolas, R. (2018, July). Speaker recognition using convolutional neural network with minimal training data for smart home solutions. In 2018 11th International Conference on Human System Interaction (HSI) (pp. 139-145). IEEE.

[10] Prachi, N. N., Nahiyan, F. M., Habibullah, M., & Khan, R. (2022, February). Deep Learning Based Speaker Recognition System with CNN and LSTM Techniques. In 2022 Interdisciplinary Research in Technology and Management (IRTM) (pp. 1-6). IEEE.

[11] Pentapati, H. K., & Sridevi, K. (2022). Dilated Convolution and MelSpectrum for Speaker Identification using Simple Deep Network. In 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS) (Vol. 1, pp. 1169-1173). IEEE.

[12] Chowdhury, A., & Ross, A. (2017, October). Extracting sub-glottal and supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals. In 2017 IEEE International Joint Conference on Biometrics (IJCB) (pp. 608-617). IEEE.

[13] Gade, V. S. R., & Sumathi, M. (2023, May). Hybrid Deep Convolutional Neural Network based Speaker Recognition for Noisy Speech Environments. In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) (pp. 920-926). IEEE.

[14] Nainan, S., & Kulkarni, V. (2021). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24, 809-822.

[15] Shahin, I., Nassif, A. B., & Hindawi, N. (2021). Speaker identification in stressful talking environments based on convolutional neural network. International Journal of Speech Technology, 24, 1055-1066.

[16] Kabir, M. M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access, 9, 79236-79263.

[17] Abbood, Z. A., Yasen, B. T., Ahmed, M. R., & Duru, A. D. (2022). Speaker identification model based on deep neural networks. Iraqi Journal for Computer Science and Mathematics, 3(1), 108-114.

[18] Tripathy, N., Hota, S., Mishra, D., Satapathy, P., & Nayak, S. K. (2024). Empirical Forecasting Analysis of Bitcoin Prices: A Comparison of Machine learning, Deep learning, and Ensemble learning Models. International journal of electrical and computer engineering systems, 15(1), 21-29.

[19] Bai, Z., & Zhang, X. L. (2021). Speaker recognition based on deep learning: An overview. Neural Networks, 140, 65-99.

[20] Hourri, S., Nikolov, N. S., & Kharroubi, J. (2021). Convolutional neural network vectors for speaker recognition. International Journal of Speech Technology, 24(2), 389-400.

[21] Lukic, Y., Vogt, C., Dürr, O., & Stadelmann, T. (2016, September). Speaker identification and clustering using convolutional neural networks. In 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP) (pp. 1-6). IEEE.

[22] Costantini, G., Cesarini, V., & Brenna, E. (2023). High-Level CNN and Machine Learning Methods for Speaker Recognition. Sensors, 23(7), 3461.

[23] Tomar, S., & Koolagudi, S. G. (2023, April). CNN-MFCC Model for Speaker Recognition using Emotive Speech. In 2023 IEEE 8th International Conference for Convergence in Technology (I2CT) (pp. 1-7). IEEE.

[24] El-Moneim, S. A., Nassar, M. A., Dessouky, M. I., Ismail, N. A., El-Fishawy, A. S., & Abd El-Samie, F. E. (2020). Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools and Applications, 79, 24013-24028.

[25] Dua, M., Jain, C., & Kumar, S. (2022). LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. Journal of Ambient Intelligence and Humanized Computing, 13(4), 1985-2000.

[26] Prachi, N. N., Nahiyan, F. M., Habibullah, M., & Khan, R. (2022, February). Deep Learning Based Speaker Recognition System with CNN and LSTM Techniques. In 2022 Interdisciplinary Research in Technology and Management (IRTM) (pp. 1-6). IEEE.

[27] Nayak, S. K., Nayak, A. K., Mishra, S., Mohanty, P., Tripathy, N., Pati, A., & Panigrahi, A. (2024). Original Research Article Speech data collection system for KUI, a Low resourced tribal. Journal of Autonomous Intelligence, 7(1).

[28] Prabakaran, D., &Sriuppili, S. (2021). Speech processing: MFCC based feature extraction techniques-an investigation. In Journal of Physics: Conference Series (Vol. 1717, No. 1, p. 012009). IOP Publishing

Subrat Kumar Nayak, Ajit Kumar Nayak, Suprava Ranjan Laha, Nrusingha Tripathy, and Takialddin AI Smadi (2024), A Robust Deep Learning-Based Speaker Identification System Using Hybrid Model on KUI Dataset. IJEER 12(4), 1502-1507. DOI: 10.37391/ijeer.120446.

I. J. of Electrical & Electronics Research Support Open Access

A Robust Deep Learning-Based Speaker Identification System Using Hybrid Model on KUI Dataset

Abstract

I. J. of Electrical & Electronics Research
Support Open Access