Research Article |
A Robust Deep Learning-Based Speaker Identification System Using Hybrid Model on KUI Dataset
Author(s): Subrat Kumar Nayak, Ajit Kumar Nayak, Suprava Ranjan Laha*, Nrusingha Tripathy, and Takialddin AI Smadi
Published In : International Journal of Electrical and Electronics Research (IJEER) Volume 12, Issue 4
Publisher : FOREX Publication
Published : 30 December 2024
e-ISSN : 2347-470X
Page(s) : 1502-1507
Abstract
Background: Speaker identification, detecting human voices using speech characteristics and acoustics, is essential in security, biometrics, IoT, and human-computer interaction (HCI). As technology advances, more innovative software and robust hardware enhance these applications. This study evaluates feature extraction, pre-processing, and deep learning methods for speaker identification in natural settings. Methods: We compared deep learning algorithms, including Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and a proposed Hybrid model. Audio files were processed using different feature extraction and pre-processing techniques. Results: The proposed Hybrid model achieved the highest accuracy at 95%, surpassing other models. LSTM followed with an accuracy of 93%. Performance metrics, including accuracy, recall, and F1 score, were used to evaluate the models. Conclusions: The study demonstrates that the Hybrid model is the most effective for speaker identification in natural settings, highlighting its potential for improved human-computer interaction and security applications.
Keywords: Mel Frequency Cepstral Coefficients
, CNN
, LSTM
, ANN
, RNN
, Speaker Identification
.
Subrat Kumar Nayak, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Email: subratsilicon28@gmail.com
Ajit Kumar Nayak, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Email: ajitnayak@soa.ac.in
Suprava Ranjan Laha*, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Brainware University, Barasat, Kolkata, India; Email: supravalaha@gmail.com
Nrusingha Tripathy, Siksha ‘O’ Anusandhan (deemed to be University), Bhubaneswar, India; Email: nrusinghatripathy654@gmail.com
Takialddin AI Smadi, Faculty of Engineering, Jerash University, Jordan
-
[1] Hansen, J. H., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine, 32(6), 74-99.
-
[2] Nassif, A. B., Shahin, I., Hamsa, S., Nemmour, N., & Hirose, K. (2021). CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Applied Soft Computing, 103, 107141.
-
[3] Simić, N., Suzić, S., Nosek, T., Vujović, M., Perić, Z., Savić, M., & Delić, V. (2022). Speaker recognition using constrained convolutional neural networks in emotional speech. Entropy, 24(3), 414.
-
[4] Meftah, A. H., Mathkour, H., Kerrache, S., & Alotaibi, Y. A. (2020). Speaker identification in different emotional states in Arabic and English. IEEE Access, 8, 60070-60083.
-
[5] Nayak, S. K., Nayak, A. K., Mishra, S., & Mohanty, P. (2023). Deep learning approaches for speech command recognition in a low resource KUI language. International Journal of Intelligent Systems and Applications in Engineering, 11(2), 377-386.
-
[6] Bimbot, F., Bonastre, J., Fredouille, C. et al. A Tutorial on Text-Independent Speaker Verification. EURASIP J. Adv. Signal Process. 2004, 101962 (2004). https://doi.org/10.1155/S1110865704310024
-
[7] Sztah´o, D´avid, Gy¨orgySzasz´ak, and Andr´as Beke.” Deep learning methodsin speaker recognition: a review.” arXiv preprint arXiv:1911.06615(2019).
-
[8] Tripathi, S., & Bhatnagar, S. (2012, November). Speaker recognition. In 2012 Third International Conference on Computer and Communication Technology (pp. 283-287). IEEE.
-
[9] Wang, M., Sirlapu, T., Kwasniewska, A., Szankin, M., Bartscherer, M., & Nicolas, R. (2018, July). Speaker recognition using convolutional neural network with minimal training data for smart home solutions. In 2018 11th International Conference on Human System Interaction (HSI) (pp. 139-145). IEEE.
-
[10] Prachi, N. N., Nahiyan, F. M., Habibullah, M., & Khan, R. (2022, February). Deep Learning Based Speaker Recognition System with CNN and LSTM Techniques. In 2022 Interdisciplinary Research in Technology and Management (IRTM) (pp. 1-6). IEEE.
-
[11] Pentapati, H. K., & Sridevi, K. (2022). Dilated Convolution and MelSpectrum for Speaker Identification using Simple Deep Network. In 2022 8th International Conference on Advanced Computing and Communication Systems (ICACCS) (Vol. 1, pp. 1169-1173). IEEE.
-
[12] Chowdhury, A., & Ross, A. (2017, October). Extracting sub-glottal and supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals. In 2017 IEEE International Joint Conference on Biometrics (IJCB) (pp. 608-617). IEEE.
-
[13] Gade, V. S. R., & Sumathi, M. (2023, May). Hybrid Deep Convolutional Neural Network based Speaker Recognition for Noisy Speech Environments. In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) (pp. 920-926). IEEE.
-
[14] Nainan, S., & Kulkarni, V. (2021). Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. International Journal of Speech Technology, 24, 809-822.
-
[15] Shahin, I., Nassif, A. B., & Hindawi, N. (2021). Speaker identification in stressful talking environments based on convolutional neural network. International Journal of Speech Technology, 24, 1055-1066.
-
[16] Kabir, M. M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access, 9, 79236-79263.
-
[17] Abbood, Z. A., Yasen, B. T., Ahmed, M. R., & Duru, A. D. (2022). Speaker identification model based on deep neural networks. Iraqi Journal for Computer Science and Mathematics, 3(1), 108-114.
-
[18] Tripathy, N., Hota, S., Mishra, D., Satapathy, P., & Nayak, S. K. (2024). Empirical Forecasting Analysis of Bitcoin Prices: A Comparison of Machine learning, Deep learning, and Ensemble learning Models. International journal of electrical and computer engineering systems, 15(1), 21-29.
-
[19] Bai, Z., & Zhang, X. L. (2021). Speaker recognition based on deep learning: An overview. Neural Networks, 140, 65-99.
-
[20] Hourri, S., Nikolov, N. S., & Kharroubi, J. (2021). Convolutional neural network vectors for speaker recognition. International Journal of Speech Technology, 24(2), 389-400.
-
[21] Lukic, Y., Vogt, C., Dürr, O., & Stadelmann, T. (2016, September). Speaker identification and clustering using convolutional neural networks. In 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP) (pp. 1-6). IEEE.
-
[22] Costantini, G., Cesarini, V., & Brenna, E. (2023). High-Level CNN and Machine Learning Methods for Speaker Recognition. Sensors, 23(7), 3461.
-
[23] Tomar, S., & Koolagudi, S. G. (2023, April). CNN-MFCC Model for Speaker Recognition using Emotive Speech. In 2023 IEEE 8th International Conference for Convergence in Technology (I2CT) (pp. 1-7). IEEE.
-
[24] El-Moneim, S. A., Nassar, M. A., Dessouky, M. I., Ismail, N. A., El-Fishawy, A. S., & Abd El-Samie, F. E. (2020). Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools and Applications, 79, 24013-24028.
-
[25] Dua, M., Jain, C., & Kumar, S. (2022). LSTM and CNN based ensemble approach for spoof detection task in automatic speaker verification systems. Journal of Ambient Intelligence and Humanized Computing, 13(4), 1985-2000.
-
[26] Prachi, N. N., Nahiyan, F. M., Habibullah, M., & Khan, R. (2022, February). Deep Learning Based Speaker Recognition System with CNN and LSTM Techniques. In 2022 Interdisciplinary Research in Technology and Management (IRTM) (pp. 1-6). IEEE.
-
[27] Nayak, S. K., Nayak, A. K., Mishra, S., Mohanty, P., Tripathy, N., Pati, A., & Panigrahi, A. (2024). Original Research Article Speech data collection system for KUI, a Low resourced tribal. Journal of Autonomous Intelligence, 7(1).
-
[28] Prabakaran, D., &Sriuppili, S. (2021). Speech processing: MFCC based feature extraction techniques-an investigation. In Journal of Physics: Conference Series (Vol. 1717, No. 1, p. 012009). IOP Publishing