Research Article | ![]()
Fundamental Frequency Extraction by Utilizing the Combination of Spectrum in Noisy Speech
Author(s): Foujia Islam1, Nargis Parvin2, Moinur Rahman3, Md. Tofael Ahmed4, Dulal Chakraborty5, and Md. Saifur Rahman6*
Published In : International Journal of Electrical and Electronics Research (IJEER) Volume 13, Issue 4
Publisher : FOREX Publication
Published : 10 December 2025
e-ISSN : 2347-470X
Page(s) : 730-734
Abstract
Speech is the audible acoustic signal generated by the articulatory system (lungs, vocal folds, vocal tract, tongue, lips) to communicate language. The fundamental frequency (F_0) is the lowest frequency component of a speech waveform and corresponds to the vibration rate of the vocal folds during voiced speech. It also determines the pitch of the speaker’s voice. In speech signal processing, this acoustic waveform is captured and analyzed to extract information about what is being said, how it is being said and who is saying it. In various speech processing applications such as voice synthesis, speaker recognition and emotion analysis accurate extraction of the fundamental frequency (F_0) is a vital task. However, the vocal tract’s formants can sometimes significantly alter the glottal waveform’s shape, making it difficult to identify the true pitch. Additionally, in the presence of background noise, traditional pitch detection techniques often experience a considerable decline in performance. This work proposes a robust method for extracting the fundamental frequency by utilizing the complementary advantages of the power spectrum and logarithmic spectrum in noisy speech environments. The power spectrum mitigates noise effects, while the logarithmic operation effectively separates vocal tract characteristics from the source excitation. The proposed approach integrates autocorrelation-based power spectral analysis with cepstral techniques derived from the log spectrum to improve pitch estimation accuracy under adverse conditions. Experimental results on noisy speech datasets show that the proposed hybrid method achieves lower gross pitch error and greater robustness than traditional methods such as BaNa, autocorrelation and cepstral techniques.
Keywords: Speech Enhancement, Power Spectrum, Logarithm Spectrum, Hybrid Method.
Foujia Islam, Department of ICT, Comilla University, Bangladesh; Email: foujiaislam4567@gmail.com
Nargis Parvin, Assistant Professor, Department of CSE, Bangladesh Army International University of Science and Technology, Bangladesh; Email: nargis.cse@baiust.ac.bd
Moinur Rahman, Lecturer, Department of ICT, Comilla University, Bangladesh; Email: moinur.rahman@cou.ac.bd
Md. Tofael Ahmed, Professor, Department of ICT, Comilla University, Bangladesh; Email: tofael@cou.ac.bd
Dulal Chakraborty, Associate Professor, Department of ICT, Comilla University, Bangladesh; Email: dulal.ict.cou@gmail.com
Md. Saifur Rahman*, Professor, Department of ICT, Comilla University, Bangladesh; Email: saifurice@cou.ac.bd
-
[1] H. C. Mahendru, Quick review of human speech production mechanism, International Journal of Engineering Research and Development 9 (10) (2014) 48–54.
-
[2] C. Shahnaz, Pitch extraction of noisy speech using dominant frequency of the harmonic speech model (2002).
-
[3] L. Sukhostat, Y. Imamverdiyev, A comparative analysis of pitch detection methods under the influence of different noise conditions, Journal of voice 29 (4) (2015) 410–417.
-
[4] M. J. Carey, E. S. Parris, H. Lloyd-Thomas, S. Bennett, Robust prosodic features for speaker identification, in: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, Vol. 3, IEEE, 1996, pp. 1800–1803.
-
[5] A. G. Adami, R. Mihaescu, D. A. Reynolds, J. J. Godfrey, Modeling prosodic dynamics for speaker recognition, in: 2003 IEEE International Conference on Acoustics, Speech and Signal Processing, 2003. Proceedings.(ICASSP’03)., Vol. 4, IEEE, 2003, pp. IV–788.
-
[6] J. A. Moorer, The optimum comb method of pitch period analysis of continuous digitized speech (1973).
-
[7] Y. Medan, E. Yair, D. Chazan, Super resolution pitch determination of speech signals, IEEE transactions on signal processing 39 (1) (1991) 40–48.
-
[8] L. Rabiner, On the use of autocorrelation analysis for pitch detection, IEEE transactions on acoustics, speech and signal processing 25 (1) (1977) 24–33.
-
[9] M. Ross, H. Shaffer, A. Cohen, R. Freudberg, H. Manley, Average magnitude difference function pitch extractor, IEEE Transactions on Acoustics, Speech and Signal Processing 22 (5) (1974) 353–362.
-
[10] T. Shimamura, H. Kobayashi, Weighted autocorrelation for pitch extraction of noisy speech, IEEE transactions on speech and audio processing 9 (7) (2001) 727–730.
-
[11] A. De Cheveigné, H. Kawahara, Yin, a fundamental frequency estimator for speech and music, The Journal of the Acoustical Society of America 111 (4) (2002) 1917–1930.
-
[12] S. Seneff, Real-time harmonic pitch detector, IEEE Transactions on Acoustics, Speech and Signal Processing 26 (4) (1978) 358–365.
-
[13] T. Sreenivas, P. Rao, Pitch extraction from corrupted harmonics of the power spectrum, The Journal of the Acoustical Society of America 65 (1) (1979) 223–228.
-
[14] M. Lahat, R. Niederjohn, D. Krubsack, A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech, IEEE transactions on acoustics, speech and signal processing 35 (6) (1987) 741–750.
-
[15] A. M. Noll, Cepstrum pitch determination, The journal of the acoustical society of America 41 (2) (1967) 293–309.
-
[16] H. Kobayashi, T. Shimamura, A modified cepstrum method for pitch extraction, in: IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No. 98EX242), IEEE, 1998, pp. 299–302.
-
[17] R. H. MAFM, M. S. Rahman, T. Shimamura, Windowless-autocorrelation-based cepstrum method for pitch extraction of noisy speech, Journal of Signal Processing 16 (3) (2012) 231–239.
-
[18] N. Yang, H. Ba, W. Cai, I. Demirkol, W. Heinzelman, Bana: A noise resilient fundamental frequency detection algorithm for speech and music, IEEE/ACM Transactions on Audio, Speech and Language Processing 22 (12) (2014) 1833–1848.
-
[19] S. Gonzalez, M. Brookes, Pefac - a pitch estimation algorithm robust to high levels of noise, IEEE/ACM Transactions on Audio, Speech and Language Processing 22 (2) (2014) 518–530. doi:10.1109/TASLP.2013.2295918.
-
[20] K. Kasi, Yet another algorithm for pitch tracking (yaapt) (2002).
-
[21] L. N. Tan, A. Alwan, Multi-band summary correlogram-based pitch detection for noisy speech, Speech communication 55 (7-8) (2013) 841–856.
-
[22] 20 countries language database, NTT Advanced Technology Corp., Japan (1988).
-
[23] F. Plante, G. Meyer, W. Ainsworth, A fundamental frequency extraction reference database, in: Proc. Eurospeech, 1995, pp. 837–840.
-
[24] A. Varga, H. J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication 12 (3) (1993) 247–251.

I. J. of Electrical & Electronics Research