Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 87
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

To determine speech intelligibility using the test suggested by Ozimek et al. (2009), the subject composed sentences with the words presented on a computer screen. However, the number and the type of these words were chosen arbitrarily. The subject was always presented with 18, similarly sounding words. Therefore, the aim of this study was to determine whether the number and the type of alternative words used by Ozimek et al. (2009), had a significant influence on the speech intelligibility. The aim was also to determine an optimal number of alternative words: i.e., the number that did not affect the speech reception threshold (SRT) and not unduly lengthened the duration of the test. The study conducted using a group of 10 subjects with normal hearing showed that an increase in the number of words to choose from 12 to 30 increased the speech intelligibility by about 0.3 dB/6 words. The use of paronyms as alternative words as opposed to random words, leads to an increase in the speech intelligibility by about 0.6 dB, which is equivalent to a decrease in intelligibility by 15 percentage points. Enlarging the number of words to choose from, and switching alternative words to paronyms, led to an increase in response time from approximately 11 to 16 s. It seems that the use of paronyms as alternative words as well as using 12 or 18 words to choose from is the best choice when using the Polish Sentence Test (PST).
Go to article

Authors and Affiliations

Magdalena Krenz
Andrzej Wicher
Aleksander Sęk
Download PDF Download RIS Download Bibtex

Abstract

This study sought to evaluate the effect of speech intensity on performance of the Callsign Acquisition Test (CAT) and Modified Rhyme Test (MRT) presented in noise. Fourteen normally hearing listeners performed both tests in 65 dB A white background noise. Speech intensity varied while background noise remained constant to form speech-to-noise ratios (SNRs) of -18, -15, -12, -9, and -6 dB. Results showed that CAT recognition scores were significantly higher than MRT scores at the same SNRs; however, the scores from both tests were highly correlated and their relationship for the SNRs tested can be expressed by a simple linear function. The concept of CAT can be easily ported to other languages for testing speech communication under adverse listening conditions.

Go to article

Authors and Affiliations

Misty Blue-Terry
Maranda McBride
Tomasz Letowski
Download PDF Download RIS Download Bibtex

Abstract

An analysis of low-level feature space for emotion recognition from the speech is presented. The main goal was to determine how the statistical properties computed from contours of low-level features influence the emotion recognition from speech signals. We have conducted several experiments to reduce and tune our initial feature set and to configure the classification stage. In the process of analysis of the audio feature space, we have employed the univariate feature selection using the chi-squared test. Then, in the first stage of classification, a default set of parameters was selected for every classifier. For the classifier that obtained the best results with the default settings, the hyperparameter tuning using cross-validation was exploited. In the result, we compared the classification results for two different languages to find out the difference between emotional states expressed in spoken sentences. The results show that from an initial feature set containing 3198 attributes we have obtained the dimensionality reduction about 80% using feature selection algorithm. The most dominant attributes selected at this stage based on the mel and bark frequency scales filterbanks with its variability described mainly by variance, median absolute deviation and standard and average deviations. Finally, the classification accuracy using tuned SVM classifier was equal to 72.5% and 88.27% for emotional spoken sentences in Polish and German languages, respectively.
Go to article

Authors and Affiliations

Lukasz Smietanka
1
Tomasz Maka
1

  1. Faculty of Computer Science and Information Technology, West Pomeranian University of Technology, Szczecin, Poland
Download PDF Download RIS Download Bibtex

Abstract

A phoneme segmentation method based on the analysis of discrete wavelet transform spectra is described. The localization of phoneme boundaries is particularly useful in speech recognition. It enables one to use more accurate acoustic models since the length of phonemes provide more information for parametrization. Our method relies on the values of power envelopes and their first derivatives for six frequency subbands. Specific scenarios that are typical for phoneme boundaries are searched for. Discrete times with such events are noted and graded using a distribution-like event function, which represent the change of the energy distribution in the frequency domain. The exact definition of this method is described in the paper. The final decision on localization of boundaries is taken by analysis of the event function. Boundaries are, therefore, extracted using information from all subbands. The method was developed on a small set of Polish hand segmented words and tested on another large corpus containing 16 425 utterances. A recall and precision measure specifically designed to measure the quality of speech segmentation was adapted by using fuzzy sets. From this, results with F-score equal to 72.49% were obtained.

Go to article

Authors and Affiliations

Bartosz Ziółko
Mariusz Ziółko
Suresh Manandhar
Richard Wilson
Download PDF Download RIS Download Bibtex

Abstract

The performance of binaural processing may be disturbed in the presence of hearing loss, especially of sensorineural type. To assess the impact of hearing loss on speech perception in noise regarding binaural processing, series of speech recognition measurements in controlled laboratory conditions were carried out. The spatial conditions were simulated using dummy head recordings played back on headphones. The Intelligibility Level Difference (ILD) was determined by measuring the change in the speech reception thresholds (SRT) between two configurations of a masking signal source (N) and a speech source (S), namely the S0N90 condition (where numbers stand for angles in horizontal plane) and the co-located condition (S0N0). To disentangle the head shadow effect (better ear effect) from binaural processing in the brain, the difference between binaural and monaural S0N90 condition (so-called Binaural Intelligibility Level Difference, BILD) value was calculated.

Measurements were performed with a control group of normal-hearing listeners and a group of sensorineural hearing-impaired subjects. In all conditions performance of the hearing-impaired listeners was significantly lower than normal-hearing ones, resulting in higher SRT values (3 dB difference in the S0N0 configuration, 7.6 dB in S0N90 and 5 dB in monaural S0N90). The SRT improvement due to the spatial separation of target and masking signal (ILD) was also higher in the control group (8.1 dB) than in hearing-impaired listeners (3.5 dB). Moreover, a significant deterioration of the binaural processing described by BILD was found in people with sensorineural deficits. This parameter for normal-hearing listeners reached a value of 3 to 6 dB (4.6 dB on average) and decreased more than two times in the hearing-impaired group to 1.9 dB on average (with a deviation of 1.4 dB). These findings could not be explained by individual average hearing threshold (standard in audiological diagnostics) only. The outcomes indicate that there is a contribution of suprathershold deficits and it may be useful to consider binaural SRT measurements in noise in addition to the pure tone audiometry resulting in better diagnostics and hearing aid fitting.

Go to article

Authors and Affiliations

Anna Pastusiak
Dawid Niemiec
Jędrzej Kociński
Anna Warzybok
Download PDF Download RIS Download Bibtex

Abstract

The paper presents the results of sentence and logatome speech intelligibility measured in rooms with induction loop for hearing aid users. Two rooms with different acoustic parameters were chosen. Twenty two subjects with mild, moderate and severe hearing impairment using hearing aids took part in the experiment. The intelligibility tests composed of sentences or logatomes were presented to the subjects at fixed measurement points of an enclosure. It was shown that a sentence test is more useful tool for speech intelligibility measurements in a room than logatome test. It was also shown that induction loop is very efficient system at improving speech intelligibility. Additionally, the questionnaire data showed that induction loop, apart from improving speech intelligibility, increased a subject’s general satisfaction with speech perception
Go to article

Authors and Affiliations

Jędrzej Kociński
Edward Ozimek
Download PDF Download RIS Download Bibtex

Abstract

The paper analyzes the estimation of the fundamental frequency from the real speech signal which is obtained by recording the speaker in the real acoustic environment modeled by the MP3 method. The estimation was performed by the Picking-Peaks algorithm with implemented parametric cubic convolution (PCC) interpolation. The efficiency of PCC was tested for Catmull-Rom, Greville, and Greville two- parametric kernel. Depending on MSE, a window that gives optimal results was chosen.
Go to article

Authors and Affiliations

Zoran N. Milivojević
Darko Brodić
Download PDF Download RIS Download Bibtex

Abstract

Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.
Go to article

Authors and Affiliations

Chengwei Huang
Guoming Chen
Hua Yu
Yongqiang Bao
Li Zhao
Download PDF Download RIS Download Bibtex

Abstract

Reverberation is a common problem for many speech technologies, such as automatic speech recognition (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical conditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.
Go to article

Authors and Affiliations

Mikolaj Kundegorski
Philip J.B. Jackson
Bartosz Ziółko
Download PDF Download RIS Download Bibtex

Abstract

The aim of this work was to measure subjective speech intelligibility in an enclosure with a long reverberation time and comparison of these results with objective parameters. Impulse Responses (IRs) were first determined with a dummy head in different measurement points of the enclosure. The following objective parameters were calculated with Dirac 4.1 software: Reverberation Time (RT), Early Decay Time (EDT), weighted Clarity (C50) and Speech Transmission Index (STI). For the chosen measurement points, a convolution of the IRs with the Polish Sentence Test (PST) and logatome tests was made. PST was presented at a background of a babble noise and speech reception threshold - SRT (i.e. SNR yielding 50% speech intelligibility) for those points were evaluated. A relationship of the sentence and logatome recognition vs. STI was determined. It was found that the final SRT data are well correlated with speech transmission index (STI), and can be expressed by a psychometric function. The difference between SRT determined in condition without reverberation and in reverberation conditions appeared to be a good measure of the effect of reverberation on speech intelligibility in a room. In addition, speech intelligibility, with and without use of the sound amplification system installed in the enclosure, was compared.
Go to article

Authors and Affiliations

Jędrzej Kociński
Edward Ozimek
Download PDF Download RIS Download Bibtex

Abstract

The present research investigated the effects of short-term musical training on speech recognition in adverse listening conditions in older adults. A total of 30 Kannada-speaking participants with no history of gross otologic, neurologic, or cognitive problems were divided equally into experimental (M = 63 years) and control groups (M = 65 years). Baseline and follow-up assessments for speech in noise (SNR50) and reverberation was carried out for both groups. The participants in the experimental group were subjected to Carnatic classical music training, which lasted for seven days. The Bayesian likelihood estimates revealed no difference in SNR50 and speech recognition scores in reverberation between baseline and followed-up assessment for the control group. Whereas, in the experimental group, the SNR50 reduced, and speech recognition scores improved following musical training, suggesting the positive impact of music training. The improved performance on speech recognition suggests that short-term musical training using Carnatic music can be used as a potential tool to improve speech recognition abilities in adverse listening conditions in older adults.
Go to article

Authors and Affiliations

Akhila R. Nandakumar
1
Haralakatta Shivananjappa Somashekara
1
ORCID: ORCID
Vibha Kanagokar
1
ORCID: ORCID
Arivudai Nambi Pitchaimuthu
1
ORCID: ORCID

  1. Department of Audiology and Speech-Language Pathology, Kasturba Medical College, Mangalore Manipal Academy of Higher Education
Download PDF Download RIS Download Bibtex

Abstract

The newest book of the renowned Polish linguist Leszek Bednarczuk summarizes his ideas in the field of comparative, areal and typological linguistics and brings some of his original ways out.
Go to article

Bibliography

Bednarczuk L., 2020, Sporne problemy językoznawstwa porównawczego, Kraków: Lexis.
Boček V., 2010, Studie k nejstarším romanismům ve slovanských jazycích, Praha: Nakladatelství Lidové noviny. – (Studia etymologica Brunensia ; 9).
Boček V., 2012, On the Relationship between Gemination and Palatalization in Early Romance Loanwords in Common Slavic, “Journal of Slavic Linguistics”, vol. 20/2, pp. 151–170.
Boček V., 2014. Praslovanština a jazykový kontakt, Praha: Nakladatelství Lidové noviny. – (Studia etymologica Brunensia ; 17).
Boček V., 2019, Common Slavic in the light of language contact and areal linguistics: Issues of methodology and the history of research, [In:] Slavic on the Language Map of Europe. Historical and Areal‑Typological Dimensions, eds. A. Danylenko, N. Motoki, Berlin: De Gruyter, pp. 63–86.
Go to article

Authors and Affiliations

Václav Blažek
1
ORCID: ORCID

  1. Masaryk University, Brno
Download PDF Download RIS Download Bibtex

Abstract

Non-invasive techniques for the assessment of respiratory disorders have gained increased importance in recent years due to the complexity of conventional methods. In the assessment of respiratory disorders, machine learning may play a very essential role. Respiratory disorders lead to variation in the production of speech as both go hand in hand. Thus, speech analysis can be a useful means for the pre-diagnosis of respiratory disorders. This article aims to develop a machine learning approach to differentiate healthy speech from speech corresponding to different respiratory disorders (affected). Thus, in the present work, a set of 15 relevant and efficient features were extracted from acquired data, and classification was done using different classifiers for healthy and affected speech. To assess the performance of different classifiers, accuracy, specificity (Sp), sensitivity (Se), and area under the receiver operating characteristic curve (AUC) was used by applying both multi-fold cross-validation methods (5-fold and 10-fold) and the holdout method. Out of the studied classifiers, decision tree, support vector machine (SVM), and k-nearest neighbor (KNN) were found more appropriate in providing correct assessment clinically while considering 15 features as well as three significant features (Se > 89%, Sp > 89%, AUC> 82%, and accuracy > 99%). The conclusion was that the proposed classifiers may provide an aid in the simple assessment of respiratory disorders utilising speech parameters with high efficiency. In the future, the proposed approach can be evaluated for the detection of specific respiratory disorders such as asthma, COPD, etc.
Go to article

Authors and Affiliations

Poonam Shrivastava
1
Neeta Tripathi
1
Bikesh Kumar Singh
2
Bhupesh Kumar Dewangan
3

  1. Department of Electronics and Telecommunication, SSTC Bhilai, India
  2. Department of Biomedical Engineering, National Institute of Technology, Raipur, India
  3. Department of Computer Science and Engineering, School of Engineering, OP Jindal University, Raigarh, India
Download PDF Download RIS Download Bibtex

Abstract

The main goal of the research was to obtain a set of data for ability of speech in noise recognition using Polish word test (New Articulation Lists – NAL-93) with two different masking signals. The attempt was also made to standardise the background noise for Polish speech tests by creating babble noise for NAL-93. Two types of background noise were used for Polish word test – the babble noise and the speech noise. The short method was chosen in the study as it provided similar results to constant stimuli method using less word material. The experiment using both maskers was presented to 10 listeners with normal hearing.

The mean SRT values for NAL-93 were −3.4 dB SNR for speech noise and 3.0 dB SNR for babble noise. In this regard, babble noise provided more efficient results. However, the SRT parameter for speech noise was more similar to values obtained for other Polish speech tests. The measurement of speech recognition using Polish word test is possible for both types of masking signals presented in the study. The decision as to which type of noise would be better in practice of hearing aid prosthetics remains an open-end question.

Go to article

Authors and Affiliations

Anna Schelenz
Ewa Skrodzka
ORCID: ORCID
Download PDF Download RIS Download Bibtex

Abstract

This study examined whether differences in reverberation time (RT) between typical sound field test rooms used in audiology clinics have an effect on speech recognition in multi-talker environments. Separate groups of participants listened to target speech sentences presented simultaneously with 0-to-3 competing sentences through four spatially-separated loudspeakers in two sound field test rooms having RT = 0:6 sec (Site 1: N = 16) and RT = 0:4 sec (Site 2: N = 12). Speech recognition scores (SRSs) for the Synchronized Sentence Set (S3) test and subjective estimates of perceived task difficulty were recorded. Obtained results indicate that the change in room RT from 0.4 to 0.6 sec did not significantly influence SRSs in quiet or in the presence of one competing sentence. However, this small change in RT affected SRSs when 2 and 3 competing sentences were present, resulting in mean SRSs that were about 8-10% better in the room with RT = 0:4 sec. Perceived task difficulty ratings increased as the complexity of the task increased, with average ratings similar across test sites for each level of sentence competition. These results suggest that site-specific normative data must be collected for sound field rooms if clinicians would like to use two or more directional speech maskers during routine sound field testing.

Go to article

Authors and Affiliations

Kim Abouchacra
Janet Koehnke
Joan Besing
Tomasz Letowski
Download PDF Download RIS Download Bibtex

Abstract

Although the emotions and learning based on emotional reaction are individual-specific, the main features are consistent among all people. Depending on the emotional states of the persons, various physical and physiological changes can be observed in pulse and breathing, blood flow velocity, hormonal balance, sound properties, face expression and hand movements. The diversity, size and grade of these changes are shaped by different emotional states. Acoustic analysis, which is an objective evaluation method, is used to determine the emotional state of people’s voice characteristics. In this study, the reflection of anxiety disorder in people’s voices was investigated through acoustic parameters. The study is a case-control study in cross-sectional quality. Voice recordings were obtained from healthy people and patients. With acoustic analysis, 122 acoustic parameters were obtained from these voice recordings. The relation of these parameters to anxious state was investigated statistically. According to the results obtained, 42 acoustic parameters are variable in the anxious state. In the anxious state, the subglottic pressure increases and the vocalization of the vowels decreases. The MFCC parameter, which changes in the anxious state, indicates that people can perceive this situation while listening to the speech. It has also been shown that text reading is also effective in triggering the emotions. These findings show that there is a change in the voice in the anxious state and that the acoustic parameters are influenced by the anxious state. For this reason, acoustic analysis can be used as an expert decision support system for the diagnosis of anxiety.

Go to article

Authors and Affiliations

Turgut Özseven
Muharrem Düğenci
Ali Doruk
Hilal İ. Kahraman
Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition is an important part of human-machine interaction studies. The acoustic analysis method is used for emotion recognition through speech. An emotion does not cause changes on all acoustic parameters. Rather, the acoustic parameters affected by emotion vary depending on the emotion type. In this context, the emotion-based variability of acoustic parameters is still a current field of study. The purpose of this study is to investigate the acoustic parameters that fear affects and the extent of their influence. For this purpose, various acoustic parameters were obtained from speech records containing fear and neutral emotions. The change according to the emotional states of these parameters was analyzed using statistical methods, and the parameters and the degree of influence that the fear emotion affected were determined. According to the results obtained, the majority of acoustic parameters that fear affects vary according to the used data. However, it has been demonstrated that formant frequencies, mel-frequency cepstral coefficients, and jitter parameters can define the fear emotion independent of the data used.
Go to article

Authors and Affiliations

Turgut Özseven
Download PDF Download RIS Download Bibtex

Abstract

The aim of this study was to create a single-language counterpart of the International Speech Test Signal (ISTS) and to compare both with respect to their acoustical characteristics. The development procedure of the Polish Speech Test Signal (PSTS) was analogous to the one of ISTS. The main difference was that instead of multi-lingual recordings, speech recordings of five Polish speakers were used. The recordings were cut into 100–600 ms long segments and composed into one-minute long signal, obeying a set of composition rules, imposed mainly to preserve a natural, speech-like features of the signal. Analyses revealed some differences between ISTS and PSTS. The latter has about twice as high volume of voiceless fragments of speech. PSTS’s sound pressure levels in 1/3-octave bands resemble the shape of the Polish long-term average female speech spectrum, having distinctive maxima at 3–4 and 8–10 kHz which ISTS lacks. As PSTS is representative of Polish language and contains inputs from multiple speakers, it can potentially find an application as a standardized signal used during the procedure of fitting hearing aids for patients that use Polish as their main language.
Go to article

Authors and Affiliations

Dorota Habasińska
Ewa Skrodzka
Edyta Bogusz-Witczak
Download PDF Download RIS Download Bibtex

Abstract

Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.

Go to article

Authors and Affiliations

Gábor Gosztolya
András Beke
Tilda Neuberger
László Tóth
Download PDF Download RIS Download Bibtex

Abstract

The Chinese word identification and sentence intelligibility are evaluated by grades 3 and 5 students in the classrooms with different reverberation times (RTs) from three primary school under different signal-to-noise ratios (SNRs). The relationships between subjective word identification and sentence in- telligibility scores and speech transmission index (STI) are analyzed. The results show that both Chinese word identification and sentence intelligibility scores for grades 3 and 5 students in the classroom in- creased with the increase of SNR (and STI), increased with the increase of the age of students, and decreased with the increase of RT. To achieve a 99% sentence intelligibility score, the STIs required for grades 3, grade 5 students, and adults are 0.71, 0.61, and 0.51, respectively. The required objective acoustical index determined by a certain threshold of the word identification test might be underestimated for younger children (grade 3 students) in classroom but overestimated for adults. A method based on the sentence test is more useful for speech intelligibility evaluation in classrooms than that based on the word test for different age groups. Younger children need more favorable classroom acoustical environment with a higher STI than older children and adults to achieve the optimum speech communication in the classroom.
Go to article

Authors and Affiliations

Jianxin Peng
Peng Jiang
Download PDF Download RIS Download Bibtex

Abstract

Despite various speech enhancement techniques have been developed for different applications, existing methods are limited in noisy environments with high ambient noise levels. Speech presence probability (SPP) estimation is a speech enhancement technique to reduce speech distortions, especially in low signalto-noise ratios (SNRs) scenario. In this paper, we propose a new two-dimensional (2D) Teager-energyoperators (TEOs) improved SPP estimator for speech enhancement in time-frequency (T-F) domain. Wavelet packet transform (WPT) as a multiband decomposition technique is used to concentrate the energy distribution of speech components. A minimum mean-square error (MMSE) estimator is obtained based on the generalized gamma distribution speech model in WPT domain. In addition, the speech samples corrupted by environment and occupational noises (i.e., machine shop, factory and station) at different input SNRs are used to validate the proposed algorithm. Results suggest that the proposed method achieves a significant enhancement on perceptual quality, compared with four conventional speech enhancement algorithms (i.e., MMSE-84, MMSE-04, Wiener-96, and BTW).

Go to article

Authors and Affiliations

Pengfei Sun
Jun Qin
Download PDF Download RIS Download Bibtex

Abstract

The article addresses prosodic characteristics of the new intonation contour which can be observed in spontaneous speech in the contemporary Russian language. The study focuses on the attempt made to identify the most relevant criteria of this new intonation construction and to explain the reasons underlying its occurrence.
Go to article

Authors and Affiliations

Татьяна Зиновьева

This page uses 'cookies'. Learn more