Details

Title

Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition

Journal title

Bulletin of the Polish Academy of Sciences: Technical Sciences

Yearbook

2012

Numer

No 1 March

Publication authors

Divisions of PAS

Nauki Techniczne

Publisher

Polish Academy of Sciences

Date

2012

Identifier

ISSN 0239-7528, eISSN 2300-1917

References

Benesty J. (2008), Springer Handbook of Speech Processing, doi.org/10.1007/978-3-540-49127-9 ; Demenko G. (2010), Implementation of Polish speech synthesis for the BOSS system, Bull. Pol. Ac.: Tech, 58, 3, 371. ; Goodwin M. (2008), Springer Handbook of Speech Processing, 229, doi.org/10.1007/978-3-540-49127-9_12 ; U. Glavitsch: "Speaker normalization with respect to <i>F</i>0: a perceptual approach", in: <i>TIK-Report No. 185</i>, Eidgenössische Technische Hochschule Zürich, Zürich, 2003. ; O'Shaughnessy D. (2008), Springer Handbook of Speech Processing, 213, doi.org/10.1007/978-3-540-49127-9_11 ; Schafer R. (2008), Springer Handbook of Speech Processing, 161, doi.org/10.1007/978-3-540-49127-9_9 ; Hess W. (1992), Advances in Speech Signal Processing, 3. ; A. de Cheveign'e (2001), Comparative evaluation of F0 estimation algorithms, null, 1, 2451. ; Unoki M. (2008), Estimation of fundamental frequency of reverberant speech by utilizing complex cepstrum analysis, J. Signal Processing, 12, 1, 31. ; Kawahara H. (1999), Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, null, 2781. ; A. de Cheveign'e (2002), Yin, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am, 111, 4, 1917, doi.org/10.1121/1.1458024 ; Miwa T. (1998), The pitch estimation of different musical instruments sounds using comb filters for transcription, IEICE Trans, D-2, 9, 1965. ; Nakatani T. (2004), Robust and accurate fundamental frequency estimation based on dominant harmonic components, J. Acoust. Soc. Am, 116, 6, 3690, doi.org/10.1121/1.1787522 ; Ishimoto Y. (2001), A fundamental frequency estimation method for noisy speech based on instantaneous amplitude and frequency, null, 2439. ; Atake Y. (2000), Robust estimation of fundamental frequency using instantaneous frequencies of harmonic components, IEICE Proc, D-2, 11, 2077. ; Dubois C. (2007), Joint detection and tracking of time-varying harmonic components: a flexible bayesian approach, IEEE Trans. on Audio Speech and Language Processing, 15, 4, 1283, doi.org/10.1109/TASL.2007.894522 ; Kim S. (2008), Multiharmonic tracking using sigmapoint Kalman filter, IEEE EMBC, 8. ; Nishi K. (1988), Multiple pitch tracking and harmonic segregation algorithm for auditory scene analysis, The Society of Instrument and Control Engineers, 34, 6, 483, doi.org/10.9746/sicetr1965.34.483 ; Hainsworth S. (2003), Beat tracking with particle filtering algorithms, null, 1, 91. ; Tomoike S. (2008), Estimation of local peaks based on particle filter in advance environments, J. Signal Processing, 12, 4, 303. ; Lee L. (1998), A frequency warping approach to speaker normalization, IEEE Trans. on Speech and Audio Processing, 6, 1, 49, doi.org/10.1109/89.650310 ; P. Dognin, "A bandpass transform for speaker normalization", <i>Ph. D. Dissertation</i>, University of Pittsburgh, Pittsburgh, 2003. ; Traunmüller H. (1987), Perceptual relativity in identification of two-formant vowels, Speech Communication, 6, 143, doi.org/10.1016/0167-6393(87)90037-9 ; Eide E. (1996), A parametric approach to vocal tract length normalization, Proc. ICASSP, 1, 346. ; Laroche J. (1999), New phase-vocoder techniques for real-time pitch shifting, chorusing, harmonizing, and other exotic audio modifications, J. Audio Eng. Soc, 47, 11, 928. ; Rabiner L. (1997), On the use of autocorrelation analysis for pitch, IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-25, 1, 24. ; Shimamura T. (2001), Weighted autocorrelation for pitch extraction of noisy speech, IEEE Trans. on Speech and Audio Processing, 9, 7, 727, doi.org/10.1109/89.952490 ; Ying G. (1994), A probabilistic approach to AMDF pitch detection, J. Acoust. Soc. Am, 95, 5, 2817, doi.org/10.1121/1.409712 ; Miyamoto T. (1983), A real time PARCOR analysis of speech by high- performance signal processors, IEICE, J66-A, 7, 625. ; Sakai T. (1995), Improvement of pitch extraction method in noisy environment based on cepstrum, Electronics, Information, and Communication Engineers, 1, 299. ; Haward D. (1989), Peak-picking fundamental period estimation for hearing prostheses, J. Acoust. Soc. Am, 86, 3, 902, doi.org/10.1121/1.398725 ; Ristic B. (2004), Beyond the Kalman Filter. Particle Filters for Tracking. ; Medan Y. (1991), Super resolution pitch determination of speech, IEEE Trans. on Signal Processing, 39, 1, doi.org/10.1109/78.80763 ; Veprek P. (2002), Analysis, enhancement and evaluation of five pitch determination techniques, Speech Comm, 37, 249, doi.org/10.1016/S0167-6393(01)00017-6 ; Adamczyk B. (2000), Robot's vocabluary, IAiR Bulletin, 12. ; Hu G.-N. (2004), Monaural speech segregation based on pitch tracking and amplitude modulation, IEEE Trans. on Neural Networks, 15, 5, 1135, doi.org/10.1109/TNN.2004.832812 ; Kasprzak W. (2010), Relaxing the WDO assumption in blind extraction of speakers from speech mixtures, J. Telecom. and Information Technology, 4, 50. ; Okazaki F. (2005), A two-step approach to blind deconvolution of speech and sound sources in the time domain, Bull. Pol. Ac.: Tech, 53, 1, 49.

DOI

10.2478/v10175-012-0011-z

×