Search for: [Keywords = "speech"] - PAS Journals

advanced search

Search results

Search for: [Keywords = "speech"]

Filters

Search results

Number of results: 87

items per page: 25 50 75

Sort by:

of 4

The Influence of the Semantic Material on the Assessment of Speech Reception Threshold

Magdalena Krenz Andrzej Wicher Aleksander Sęk

Archives of Acoustics | 2015 | vol. 40 | No 1 | 41-50 | DOI: 10.1515/aoa-2015-0006

Keywords speech intelligibility speech test speech reception threshold

Download PDF Download RIS Download Bibtex

Abstract

To determine speech intelligibility using the test suggested by Ozimek et al. (2009), the subject composed sentences with the words presented on a computer screen. However, the number and the type of these words were chosen arbitrarily. The subject was always presented with 18, similarly sounding words. Therefore, the aim of this study was to determine whether the number and the type of alternative words used by Ozimek et al. (2009), had a significant influence on the speech intelligibility. The aim was also to determine an optimal number of alternative words: i.e., the number that did not affect the speech reception threshold (SRT) and not unduly lengthened the duration of the test. The study conducted using a group of 10 subjects with normal hearing showed that an increase in the number of words to choose from 12 to 30 increased the speech intelligibility by about 0.3 dB/6 words. The use of paronyms as alternative words as opposed to random words, leads to an increase in the speech intelligibility by about 0.6 dB, which is equivalent to a decrease in intelligibility by 15 percentage points. Enlarging the number of words to choose from, and switching alternative words to paronyms, led to an increase in response time from approximately 11 to 16 s. It seems that the use of paronyms as alternative words as well as using 12 or 18 words to choose from is the best choice when using the Polish Sentence Test (PST).

Authors and Affiliations

Magdalena Krenz

Andrzej Wicher

Aleksander Sęk

Effects of Speech Intensity on the Callsign Acquisition Test (CAT) and Modified Rhyme Test (MRT) Presented in Noise

Misty Blue-Terry Maranda McBride Tomasz Letowski

Archives of Acoustics | 2012 | vol. 37 | No 2 | 199-203 | DOI: 10.2478/v10168-012-0026-3

Keywords speech intelligibility speech intensity speech-to-noise ratio

Download PDF Download RIS Download Bibtex

Abstract

This study sought to evaluate the effect of speech intensity on performance of the Callsign Acquisition Test (CAT) and Modified Rhyme Test (MRT) presented in noise. Fourteen normally hearing listeners performed both tests in 65 dB A white background noise. Speech intensity varied while background noise remained constant to form speech-to-noise ratios (SNRs) of -18, -15, -12, -9, and -6 dB. Results showed that CAT recognition scores were significantly higher than MRT scores at the same SNRs; however, the scores from both tests were highly correlated and their relationship for the SNRs tested can be expressed by a simple linear function. The concept of CAT can be easily ported to other languages for testing speech communication under adverse listening conditions.

Authors and Affiliations

Misty Blue-Terry

Maranda McBride

Tomasz Letowski

Audio Feature Space Analysis for Emotion Recognition from Spoken Sentences

Lukasz Smietanka Tomasz Maka

Archives of Acoustics | 2021 | vol. 46 | No 2 | 271-277 | DOI: 10.24425/aoa.2021.136581

Keywords speech analysis classification emotional speech

Download PDF Download RIS Download Bibtex

Abstract

An analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to reduce and tune our initial feature set and to configure the classification stage. In the process of analysis of the audio feature space, we have employed the univariate feature selection using the chi-squared test. Then, in the first stage of classification, a default set of parameters was selected for every classifier. For the classifier that obtained the best results with the default settings, the hyperparameter tuning using cross-validation was exploited. In the result, we compared the classification results for two different languages to find out the difference between emotional states expressed in spoken sentences. The results show that from an initial feature set containing 3198 attributes we have obtained the dimensionality reduction about 80% using feature selection algorithm. The most dominant attributes selected at this stage based on the mel and bark frequency scales filterbanks with its variability described mainly by variance, median absolute deviation and standard and average deviations. Finally, the classification accuracy using tuned SVM classifier was equal to 72.5% and 88.27% for emotional spoken sentences in Polish and German languages, respectively.

Authors and Affiliations

Lukasz Smietanka

1

Tomasz Maka

1

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, Poland

Phoneme Segmentation Based on Wavelet Spectra Analysis

Bartosz Ziółko Mariusz Ziółko Suresh Manandhar Richard Wilson

Archives of Acoustics | 2011 | vol. 36 | No 1 | 29-47 | DOI: 10.2478/v10168-011-0003-2

Keywords speech recognition speech segmentation discrete wavelet transform

Download PDF Download RIS Download Bibtex

Abstract

A phoneme segmentation method based on the analysis of discrete wavelet transform spectra is described. The localization of phoneme boundaries is particularly useful in speech recognition. It enables one to use more accurate acoustic models since the length of phonemes provide more information for parametrization. Our method relies on the values of power envelopes and their first derivatives for six frequency subbands. Specific scenarios that are typical for phoneme boundaries are searched for. Discrete times with such events are noted and graded using a distribution-like event function, which represent the change of the energy distribution in the frequency domain. The exact definition of this method is described in the paper. The final decision on localization of boundaries is taken by analysis of the event function. Boundaries are, therefore, extracted using information from all subbands. The method was developed on a small set of Polish hand segmented words and tested on another large corpus containing 16 425 utterances. A recall and precision measure specifically designed to measure the quality of speech segmentation was adapted by using fuzzy sets. From this, results with F-score equal to 72.49% were obtained.

Authors and Affiliations

Bartosz Ziółko

Mariusz Ziółko

Suresh Manandhar

Richard Wilson

The Benefit of Binaural Hearing Among Listeners with Sensorineural Hearing Loss

Anna Pastusiak Dawid Niemiec Jędrzej Kociński Anna Warzybok

Archives of Acoustics | 2019 | vol. 44 | No 4 | 709-717 | DOI: 10.24425/aoa.2019.129726

Keywords hearing speech audiometry speech perception audiology

Download PDF Download RIS Download Bibtex

Abstract

The performance of binaural processing may be disturbed in the presence of hearing loss, especially of sensorineural type. To assess the impact of hearing loss on speech perception in noise regarding binaural processing, series of speech recognition measurements in controlled laboratory conditions were carried out. The spatial conditions were simulated using dummy head recordings played back on headphones. The Intelligibility Level Difference (ILD) was determined by measuring the change in the speech reception thresholds (SRT) between two configurations of a masking signal source (N) and a speech source (S), namely the S0N90 condition (where numbers stand for angles in horizontal plane) and the co-located condition (S0N0). To disentangle the head shadow effect (better ear effect) from binaural processing in the brain, the difference between binaural and monaural S0N90 condition (so-called Binaural Intelligibility Level Difference, BILD) value was calculated.

Measurements were performed with a control group of normal-hearing listeners and a group of sensorineural hearing-impaired subjects. In all conditions performance of the hearing-impaired listeners was significantly lower than normal-hearing ones, resulting in higher SRT values (3 dB difference in the S0N0 configuration, 7.6 dB in S0N90 and 5 dB in monaural S0N90). The SRT improvement due to the spatial separation of target and masking signal (ILD) was also higher in the control group (8.1 dB) than in hearing-impaired listeners (3.5 dB). Moreover, a significant deterioration of the binaural processing described by BILD was found in people with sensorineural deficits. This parameter for normal-hearing listeners reached a value of 3 to 6 dB (4.6 dB on average) and decreased more than two times in the hearing-impaired group to 1.9 dB on average (with a deviation of 1.4 dB). These findings could not be explained by individual average hearing threshold (standard in audiological diagnostics) only. The outcomes indicate that there is a contribution of suprathershold deficits and it may be useful to consider binaural SRT measurements in noise in addition to the pure tone audiometry resulting in better diagnostics and hearing aid fitting.

Authors and Affiliations

Anna Pastusiak

Dawid Niemiec

Jędrzej Kociński

Anna Warzybok

Speech Intelligibility in Rooms with and without an Induction Loop for Hearing Aid Users

Jędrzej Kociński Edward Ozimek

Archives of Acoustics | 2015 | vol. 40 | No 1 | 51-58 | DOI: 10.1515/aoa-2015-0007

Keywords speech intelligibility induction loop hearing impairment speech enhancement

Download PDF Download RIS Download Bibtex

Abstract

The paper presents the results of sentence and logatome speech intelligibility measured in rooms with induction loop for hearing aid users. Two rooms with different acoustic parameters were chosen. Twenty two subjects with mild, moderate and severe hearing impairment using hearing aids took part in the experiment. The intelligibility tests composed of sentences or logatomes were presented to the subjects at fixed measurement points of an enclosure. It was shown that a sentence test is more useful tool for speech intelligibility measurements in a room than logatome test. It was also shown that induction loop is very efficient system at improving speech intelligibility. Additionally, the questionnaire data showed that induction loop, apart from improving speech intelligibility, increased a subject’s general satisfaction with speech perception

Authors and Affiliations

Jędrzej Kociński

Edward Ozimek

Estimation of the Fundamental Frequency of the Speech Signal Compressed by MP3 Algorithm

Zoran N. Milivojević Darko Brodić

Archives of Acoustics | 2013 | vol. 38 | No 3 | 363-373 | DOI: 10.2478/aoa-2013-0043

Keywords fundamental frequency speech compression speech processing signal representation MP3

Download PDF Download RIS Download Bibtex

Abstract

The paper analyzes the estimation of the fundamental frequency from the real speech signal which is obtained by recording the speaker in the real acoustic environment modeled by the MP3 method. The estimation was performed by the Picking-Peaks algorithm with implemented parametric cubic convolution (PCC) interpolation. The efficiency of PCC was tested for Catmull-Rom, Greville, and Greville two- parametric kernel. Depending on MSE, a window that gives optimal results was chosen.

Authors and Affiliations

Zoran N. Milivojević

Darko Brodić

Speech Emotion Recognition under White Noise

Chengwei Huang Guoming Chen Hua Yu Yongqiang Bao Li Zhao

Archives of Acoustics | 2013 | vol. 38 | No 4 | 457-463 | DOI: 10.2478/aoa-2013-0054

Keywords speech emotion recognition speech enhancement emotion model Gaussian mixture model

Download PDF Download RIS Download Bibtex

Abstract

Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.

Authors and Affiliations

Chengwei Huang

Guoming Chen

Hua Yu

Yongqiang Bao

Li Zhao

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Mikolaj Kundegorski Philip J.B. Jackson Bartosz Ziółko

Archives of Acoustics | 2014 | vol. 39 | No 3 | 411-420 | DOI: 10.2478/aoa-2014-0045

Keywords speech enhancement reverberation ASR Polish

Download PDF Download RIS Download Bibtex

Abstract

Reverberation is a common problem for many speech technologies, such as automatic speech recognition (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical conditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.

Authors and Affiliations

Mikolaj Kundegorski

Philip J.B. Jackson

Bartosz Ziółko

Speech Recognition in an Enclosure with a Long Reverberation Time

Jędrzej Kociński Edward Ozimek

Archives of Acoustics | 2016 | vol. 41 | No 2 | 255-264 | DOI: 10.1515/aoa-2016-0025

Keywords speech intelligibility speech recognition sentence test reverberation time clarity speech transmission index

Download PDF Download RIS Download Bibtex

Abstract

The aim of this work was to measure subjective speech intelligibility in an enclosure with a long reverberation time and comparison of these results with objective parameters. Impulse Responses (IRs) were first determined with a dummy head in different measurement points of the enclosure. The following objective parameters were calculated with Dirac 4.1 software: Reverberation Time (RT), Early Decay Time (EDT), weighted Clarity (C50) and Speech Transmission Index (STI). For the chosen measurement points, a convolution of the IRs with the Polish Sentence Test (PST) and logatome tests was made. PST was presented at a background of a babble noise and speech reception threshold - SRT (i.e. SNR yielding 50% speech intelligibility) for those points were evaluated. A relationship of the sentence and logatome recognition vs. STI was determined. It was found that the final SRT data are well correlated with speech transmission index (STI), and can be expressed by a psychometric function. The difference between SRT determined in condition without reverberation and in reverberation conditions appeared to be a good measure of the effect of reverberation on speech intelligibility in a room. In addition, speech intelligibility, with and without use of the sound amplification system installed in the enclosure, was compared.

Authors and Affiliations

Jędrzej Kociński

Edward Ozimek

Enhancing Speech Recognition in Adverse Listening Environments: The Impact of Brief Musical Training on Older Adults

Akhila R. Nandakumar Haralakatta Shivananjappa Somashekara Vibha Kanagokar Arivudai Nambi Pitchaimuthu

Archives of Acoustics | 2024 | vol. 49 | No 1 | 3-9 | DOI: 10.24425/aoa.2023.146825

Keywords musical training carnatic music speech recognition in noise speech recognition in reverberation

Download PDF Download RIS Download Bibtex

Abstract

The present research investigated the effects of short-term musical training on speech recognition in adverse listening conditions in older adults. A total of 30 Kannada-speaking participants with no history of gross otologic, neurologic, or cognitive problems were divided equally into experimental (M = 63 years) and control groups (M = 65 years). Baseline and follow-up assessments for speech in noise (SNR50) and reverberation was carried out for both groups. The participants in the experimental group were subjected to Carnatic classical music training, which lasted for seven days. The Bayesian likelihood estimates revealed no difference in SNR50 and speech recognition scores in reverberation between baseline and followed-up assessment for the control group. Whereas, in the experimental group, the SNR50 reduced, and speech recognition scores improved following musical training, suggesting the positive impact of music training. The improved performance on speech recognition suggests that short-term musical training using Carnatic music can be used as a potential tool to improve speech recognition abilities in adverse listening conditions in older adults.

Authors and Affiliations

Akhila R. Nandakumar

1

Haralakatta Shivananjappa Somashekara

1

e-mail:

ORCID:

Vibha Kanagokar

1

e-mail:

ORCID:

Arivudai Nambi Pitchaimuthu

1

e-mail:

ORCID:

Department of Audiology and Speech-Language Pathology, Kasturba Medical College, Mangalore Manipal Academy of Higher Education

A new book devoted to the comparative linguistics, written in a general perspective

Václav Blažek

Rocznik Slawistyczny | 2021 | No LXX | 193-196 | DOI: 10.24425/rslaw.2021.138349

Keywords language speech communication word relation

Download PDF Download RIS Download Bibtex

Abstract

The newest book of the renowned Polish linguist Leszek Bednarczuk summarizes his ideas in the field of comparative, areal and typological linguistics and brings some of his original ways out.

Bibliography

Bednarczuk L., 2020, Sporne problemy językoznawstwa porównawczego, Kraków: Lexis.
Boček V., 2010, Studie k nejstarším romanismům ve slovanských jazycích, Praha: Nakladatelství Lidové noviny. – (Studia etymologica Brunensia ; 9).
Boček V., 2012, On the Relationship between Gemination and Palatalization in Early Romance Loanwords in Common Slavic, “Journal of Slavic Linguistics”, vol. 20/2, pp. 151–170.
Boček V., 2014. Praslovanština a jazykový kontakt, Praha: Nakladatelství Lidové noviny. – (Studia etymologica Brunensia ; 17).
Boček V., 2019, Common Slavic in the light of language contact and areal linguistics: Issues of methodology and the history of research, [In:] Slavic on the Language Map of Europe. Historical and Areal‑Typological Dimensions, eds. A. Danylenko, N. Motoki, Berlin: De Gruyter, pp. 63–86.

Authors and Affiliations

Václav Blažek

1

e-mail:

ORCID:

Masaryk University, Brno

Comparative Analysis of Classifiers for the Assessment of Respiratory Disorders Using Speech Parameters

Poonam Shrivastava Neeta Tripathi Bikesh Kumar Singh Bhupesh Kumar Dewangan

Archives of Acoustics | 2023 | vol. 48 | No 1 | 13-24 | DOI: 10.24425/aoa.2022.142905

Keywords healthy speech affected speech machine learning classification techniques respiratory disorders speech analysis

Download PDF Download RIS Download Bibtex

Abstract

Non-invasive techniques for the assessment of respiratory disorders have gained increased importance in recent years due to the complexity of conventional methods. In the assessment of respiratory disorders, machine learning may play a very essential role. Respiratory disorders lead to variation in the production of speech as both go hand in hand. Thus, speech analysis can be a useful means for the pre-diagnosis of respiratory disorders. This article aims to develop a machine learning approach to differentiate healthy speech from speech corresponding to different respiratory disorders (affected). Thus, in the present work, a set of 15 relevant and efficient features were extracted from acquired data, and classification was done using different classifiers for healthy and affected speech. To assess the performance of different classifiers, accuracy, specificity (Sp), sensitivity (Se), and area under the receiver operating characteristic curve (AUC) was used by applying both multi-fold cross-validation methods (5-fold and 10-fold) and the holdout method. Out of the studied classifiers, decision tree, support vector machine (SVM), and k-nearest neighbor (KNN) were found more appropriate in providing correct assessment clinically while considering 15 features as well as three significant features (Se > 89%, Sp > 89%, AUC> 82%, and accuracy > 99%). The conclusion was that the proposed classifiers may provide an aid in the simple assessment of respiratory disorders utilising speech parameters with high efficiency. In the future, the proposed approach can be evaluated for the detection of specific respiratory disorders such as asthma, COPD, etc.

Authors and Affiliations

Poonam Shrivastava

1

Neeta Tripathi

1

Bikesh Kumar Singh

2

Bhupesh Kumar Dewangan

3

Department of Electronics and Telecommunication, SSTC Bhilai, India
Department of Biomedical Engineering, National Institute of Technology, Raipur, India
Department of Computer Science and Engineering, School of Engineering, OP Jindal University, Raigarh, India

Evaluation of New Polish Articulation Lists (NAL-93) in the Presence of Various Speech-Like Maskers

Anna Schelenz Ewa Skrodzka

Archives of Acoustics | 2020 | vol. 45 | No 3 | 393-400 | DOI: 10.24425/aoa.2020.134056

Keywords NAL-93 babble noise speech noise speech-in-noise test Speech Reception Threshold

Download PDF Download RIS Download Bibtex

Abstract

The main goal of the research was to obtain a set of data for ability of speech in noise recognition using Polish word test (New Articulation Lists – NAL-93) with two different masking signals. The attempt was also made to standardise the background noise for Polish speech tests by creating babble noise for NAL-93. Two types of background noise were used for Polish word test – the babble noise and the speech noise. The short method was chosen in the study as it provided similar results to constant stimuli method using less word material. The experiment using both maskers was presented to 10 listeners with normal hearing.

The mean SRT values for NAL-93 were −3.4 dB SNR for speech noise and 3.0 dB SNR for babble noise. In this regard, babble noise provided more efficient results. However, the SRT parameter for speech noise was more similar to values obtained for other Polish speech tests. The measurement of speech recognition using Polish word test is possible for both types of masking signals presented in the study. The decision as to which type of noise would be better in practice of hearing aid prosthetics remains an open-end question.

Authors and Affiliations

Anna Schelenz

Ewa Skrodzka

e-mail:

ORCID:

Sentence Recognition in the Presence of Competing Speech Messages Presented in Audiometric Booths with Reverberation Times of 0.4 and 0.6 Seconds

Kim Abouchacra Janet Koehnke Joan Besing Tomasz Letowski

Archives of Acoustics | 2011 | vol. 36 | No 1 | 3-14 | DOI: 10.2478/v10168-011-0001-4

Keywords sound field testing reverberation speech recognition

Download PDF Download RIS Download Bibtex

Abstract

This study examined whether differences in reverberation time (RT) between typical sound field test rooms used in audiology clinics have an effect on speech recognition in multi-talker environments. Separate groups of participants listened to target speech sentences presented simultaneously with 0-to-3 competing sentences through four spatially-separated loudspeakers in two sound field test rooms having RT = 0:6 sec (Site 1: N = 16) and RT = 0:4 sec (Site 2: N = 12). Speech recognition scores (SRSs) for the Synchronized Sentence Set (S3) test and subjective estimates of perceived task difficulty were recorded. Obtained results indicate that the change in room RT from 0.4 to 0.6 sec did not significantly influence SRSs in quiet or in the presence of one competing sentence. However, this small change in RT affected SRSs when 2 and 3 competing sentences were present, resulting in mean SRSs that were about 8-10% better in the room with RT = 0:4 sec. Perceived task difficulty ratings increased as the complexity of the task increased, with average ratings similar across test sites for each level of sentence competition. These results suggest that site-specific normative data must be collected for sound field rooms if clinicians would like to use two or more directional speech maskers during routine sound field testing.

Authors and Affiliations

Kim Abouchacra

Janet Koehnke

Joan Besing

Tomasz Letowski

Frequency Selection Based Separation of Speech Signals with Reduced Computational Time Using Sparse NMF

Yash Vardhan Varshney Omar Farooq Zia Ahmad Abbasi Musiur Raza Abidi

Archives of Acoustics | 2017 | vol. 42 | No 2 | DOI: 10.1515/aoa-2017-0031

Keywords sparse NMF mixed speech recognition Machine learning

Download PDF Download RIS Download Bibtex

Authors and Affiliations

Yash Vardhan Varshney

Omar Farooq

Zia Ahmad Abbasi

Musiur Raza Abidi

Voice Traces of Anxiety: Acoustic Parameters Affected by Anxiety Disorder

Turgut Özseven Muharrem Düğenci Ali Doruk Hilal İ. Kahraman

Archives of Acoustics | 2018 | vol. 43 | No 4 | 625–636 | DOI: 10.24425/aoa.2018.125156

Keywords anxiety acoustic analysis signal processing speech processing

Download PDF Download RIS Download Bibtex

Abstract

Although the emotions and learning based on emotional reaction are individual-specific, the main features are consistent among all people. Depending on the emotional states of the persons, various physical and physiological changes can be observed in pulse and breathing, blood flow velocity, hormonal balance, sound properties, face expression and hand movements. The diversity, size and grade of these changes are shaped by different emotional states. Acoustic analysis, which is an objective evaluation method, is used to determine the emotional state of people’s voice characteristics. In this study, the reflection of anxiety disorder in people’s voices was investigated through acoustic parameters. The study is a case-control study in cross-sectional quality. Voice recordings were obtained from healthy people and patients. With acoustic analysis, 122 acoustic parameters were obtained from these voice recordings. The relation of these parameters to anxious state was investigated statistically. According to the results obtained, 42 acoustic parameters are variable in the anxious state. In the anxious state, the subglottic pressure increases and the vocalization of the vowels decreases. The MFCC parameter, which changes in the anxious state, indicates that people can perceive this situation while listening to the speech. It has also been shown that text reading is also effective in triggering the emotions. These findings show that there is a change in the voice in the anxious state and that the acoustic parameters are influenced by the anxious state. For this reason, acoustic analysis can be used as an expert decision support system for the diagnosis of anxiety.

Authors and Affiliations

Turgut Özseven

Muharrem Düğenci

Ali Doruk

Hilal İ. Kahraman

The Acoustic Cues of Fear: Investigation of Acoustic Parameters of Speech Containing Fear

Turgut Özseven

Archives of Acoustics | 2018 | vol. 43 | No 2 | DOI: 10.24425/122372

Keywords emotion recognition acoustic analysis fear speech processing

Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition is an important part of human-machine interaction studies. The acoustic analysis method is used for emotion recognition through speech. An emotion does not cause changes on all acoustic parameters. Rather, the acoustic parameters affected by emotion vary depending on the emotion type. In this context, the emotion-based variability of acoustic parameters is still a current field of study. The purpose of this study is to investigate the acoustic parameters that fear affects and the extent of their influence. For this purpose, various acoustic parameters were obtained from speech records containing fear and neutral emotions. The change according to the emotional states of these parameters was analyzed using statistical methods, and the parameters and the degree of influence that the fear emotion affected were determined. According to the results obtained, the majority of acoustic parameters that fear affects vary according to the used data. However, it has been demonstrated that formant frequencies, mel-frequency cepstral coefficients, and jitter parameters can define the fear emotion independent of the data used.

Authors and Affiliations

Turgut Özseven

Speech Intelligibility Test for Polish Language – Relation to the Acoustic Properties of Classrooms and Comparison to Other Languages

Jan Radosz

Archives of Acoustics | 2018 | vol. 43 | No 1 | DOI: 10.24425/118088

Keywords speech intelligibility classrooms schools room acoustics

Download PDF Download RIS Download Bibtex

Authors and Affiliations

Jan Radosz

Development of the Polish Speech Test Signal and its Comparison with the International Speech Test Signal

Dorota Habasińska Ewa Skrodzka Edyta Bogusz-Witczak

Archives of Acoustics | 2018 | vol. 43 | No 2 | DOI: 10.24425/122373

Keywords Polish Speech Test Signal (PSTS) International Speech Test Signal (ISTS) hearing aidsfitting language

Download PDF Download RIS Download Bibtex

Abstract

The aim of this study was to create a single-language counterpart of the International Speech Test Signal (ISTS) and to compare both with respect to their acoustical characteristics. The development procedure of the Polish Speech Test Signal (PSTS) was analogous to the one of ISTS. The main difference was that instead of multi-lingual recordings, speech recordings of five Polish speakers were used. The recordings were cut into 100–600 ms long segments and composed into one-minute long signal, obeying a set of composition rules, imposed mainly to preserve a natural, speech-like features of the signal. Analyses revealed some differences between ISTS and PSTS. The latter has about twice as high volume of voiceless fragments of speech. PSTS’s sound pressure levels in 1/3-octave bands resemble the shape of the Polish long-term average female speech spectrum, having distinctive maxima at 3–4 and 8–10 kHz which ISTS lacks. As PSTS is representative of Polish language and contains inputs from multiple speakers, it can potentially find an application as a standardized signal used during the procedure of fitting hearing aids for patients that use Polish as their main language.

Authors and Affiliations

Dorota Habasińska

Ewa Skrodzka

Edyta Bogusz-Witczak

Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset

Gábor Gosztolya András Beke Tilda Neuberger László Tóth

Archives of Acoustics | 2016 | vol. 41 | No 4 | 669-682 | DOI: 10.1515/aoa-2016-0064

Keywords speech recognition speech technology computational paralinguistics laughter detection deep neural networks

Download PDF Download RIS Download Bibtex

Abstract

Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.

Authors and Affiliations

Gábor Gosztolya

András Beke

Tilda Neuberger

László Tóth

Chinese Word Identification and Sentence Intelligibility in Primary School Classrooms

Jianxin Peng Peng Jiang

Archives of Acoustics | 2016 | vol. 41 | No 2 | 213-219 | DOI: 10.1515/aoa-2016-0021

Keywords speech identification reverberation time signal-to-noise ratio classroom speech transmission index children

Download PDF Download RIS Download Bibtex

Abstract

The Chinese word identification and sentence intelligibility are evaluated by grades 3 and 5 students in the classrooms with different reverberation times (RTs) from three primary school under different signal-to-noise ratios (SNRs). The relationships between subjective word identification and sentence in- telligibility scores and speech transmission index (STI) are analyzed. The results show that both Chinese word identification and sentence intelligibility scores for grades 3 and 5 students in the classroom in- creased with the increase of SNR (and STI), increased with the increase of the age of students, and decreased with the increase of RT. To achieve a 99% sentence intelligibility score, the STIs required for grades 3, grade 5 students, and adults are 0.71, 0.61, and 0.51, respectively. The required objective acoustical index determined by a certain threshold of the word identification test might be underestimated for younger children (grade 3 students) in classroom but overestimated for adults. A method based on the sentence test is more useful for speech intelligibility evaluation in classrooms than that based on the word test for different age groups. Younger children need more favorable classroom acoustical environment with a higher STI than older children and adults to achieve the optimum speech communication in the classroom.

Authors and Affiliations

Jianxin Peng

Peng Jiang

Wavelet Packet Transform based Speech Enhancement via Two-Dimensional SPP Estimator with Generalized Gamma Priors

Pengfei Sun Jun Qin

Archives of Acoustics | 2016 | vol. 41 | No 3 | 579-590 | DOI: 10.1515/aoa-2016-0056

Keywords speech enhancement speech presence probability wavelet packet transform two-dimensional Teager energy operator

Download PDF Download RIS Download Bibtex

Abstract

Despite various speech enhancement techniques have been developed for different applications, existing methods are limited in noisy environments with high ambient noise levels. Speech presence probability (SPP) estimation is a speech enhancement technique to reduce speech distortions, especially in low signalto-noise ratios (SNRs) scenario. In this paper, we propose a new two-dimensional (2D) Teager-energyoperators (TEOs) improved SPP estimator for speech enhancement in time-frequency (T-F) domain. Wavelet packet transform (WPT) as a multiband decomposition technique is used to concentrate the energy distribution of speech components. A minimum mean-square error (MMSE) estimator is obtained based on the generalized gamma distribution speech model in WPT domain. In addition, the speech samples corrupted by environment and occupational noises (i.e., machine shop, factory and station) at different input SNRs are used to validate the proposed algorithm. Results suggest that the proposed method achieves a significant enhancement on perceptual quality, compared with four conventional speech enhancement algorithms (i.e., MMSE-84, MMSE-04, Wiener-96, and BTW).

Authors and Affiliations

Pengfei Sun

Jun Qin

Изменения в интонационном оформлении спонтанной речи: новая интонационная конструкция?

Татьяна Зиновьева

Rocznik Slawistyczny | 2017 | No LXVI

Keywords intonation intonation cliché spontaneous speech non-final syntagm voice range speech melody

Download PDF Download RIS Download Bibtex

Abstract

The article addresses prosodic characteristics of the new intonation contour which can be observed in spontaneous speech in the contemporary Russian language. The study focuses on the attempt made to identify the most relevant criteria of this new intonation construction and to explain the reasons underlying its occurrence.

Authors and Affiliations

Татьяна Зиновьева

1
2
...
4

This page uses 'cookies'. Learn more