Enhancing Speech Recognition in Adverse Listening Environments: The Impact of Brief Musical Training on Older Adults

The present research investigated the eﬀects of short-term musical training on speech recognition in adverse listening conditions in older adults. A total of 30 Kannada-speaking participants with no history of gross otologic, neurologic, or cognitive problems were divided equally into experimental ( M = 63 years) and control groups ( M = 65 years). Baseline and follow-up assessments for speech in noise (SNR50) and reverberation was carried out for both groups. The participants in the experimental group were subjected to Carnatic classical music training, which lasted for seven days. The Bayesian likelihood estimates revealed no diﬀerence in SNR50 and speech recognition scores in reverberation between baseline and followed-up assessment for the control group. Whereas, in the experimental group, the SNR50 reduced, and speech recognition scores improved following musical training, suggesting the positive impact of music training. The improved performance on speech recognition suggests that short-term musical training using Carnatic music can be used as a potential tool to improve speech recognition abilities in adverse listening conditions in older adults.


Introduction
Several anatomical and physiological changes occur in the auditory system of older adults as part of the aging process (Chisolm et al., 2003).Aging causes alterations in the metabolic activity of the cochlea, leading to a decrease in endo-cochlear potentials (EP) (Wangemann, 2002).This reduction in EP impairs the functioning of the cochlear amplifier and raises the neural threshold (Schmiedt et al., 2002).Additionally, the aging process disrupts the precise timing of the neuronal firing, resulting in inaccuracies in the phase locking of auditory neurons (Moser et al., 2006).Animal models of aging have demonstrated a decrease in the size of spiral ganglion cells in Rosenthal's canal and a reduction of approximately 15 to 25% of cells throughout the cochlear duct (Mills et al., 2006).The progressive degeneration of cells within the auditory system leads to var-ious auditory perceptual deficits (Tun et al., 2012), including reduced audibility (Schuknecht, Gacek, 1993), deterioration in suprathreshold auditory spectral processing (Nambi et al., 2016), temporal processing (He et al., 2008) and cognitive abilities (Verhaeghen, Cerella, 2002).These deficits can contribute to difficulties in speech perception in noisy and reverberant environments (Helfer, Wilber, 1990;Nambi et al., 2016).The most common complaint among older adults is the difficulty in comprehending speech in adverse listening conditions.This difficulty stems from their diminished auditory processing abilities, which hinder their ability to separate target speech from background noise, resulting in reduced speech perception in noisy environments (Schoof, Rosen, 2014).Due to the decline in auditory processing abilities in older adults, their passive and effortless speech processing in noisy environments is compromised (Rabbitt, 1990).As a compensatory mech-anism, older adults rely on active and conscious signal processing, which relies on intact cognitive functioning.However, the aging process also impacts cognitive abilities, which can contribute to difficulties in speech perception in noisy situations (Tun et al., 2002).
Methods to overcome communication difficulties in older adults have been the topic of interest among researchers.One preventive measure often recommended to counter the effects of aging is engaging in physical exercise (Alessio et  Older adult musicians with longterm expertise retain neuro-physiological advantages due to music which may improve their speech coding abilities.Anderson and Kraus (2010) found that these musicians outperformed their non-musician counterparts in tasks involving auditory, spectral, temporal, and cognitive processing.These promising findings suggest that musical training could serve as an effective strategy to mitigate speech perception deficits in older adults (Kraus, White-Schwoch, 2014).Therefore, it would be interesting to investigate whether musical training can be employed as an auditory training method to overcome speech recognition deficits in challenging listening conditions.
To the best of our knowledge, only Jain et al. (2015) investigated the effect of short-term musical training on speech recognition in noise among young adults, reporting enhancements in speech recognition.However, the impact of short-term musical training on speech recognition abilities in older adults remains unexplored.Hence, the present study aims to investigate the effects of short-term musical training on speech recognition in adverse listening conditions in older adults.

Method
The Institutional Ethics Committee (IEC) at Kasturba Medical College (KMC), Mangaluru, approved the research protocol.A total of 30 participants were selected using the convenient sampling method and were evenly divided into experimental and control groups.All participants were native Kannada speakers with no prior musical training experience or significant ear, neurological, or cognitive issues.Before conducting the study, informed consent was obtained from all individuals.Table 1 depicts the mean age of the groups with their average pure tone thresholds at 500 Hz, 1 kHz, and 2 kHz (PTA1), as well as 1, 2, and 4 kHz (PTA2).An independent t-test revealed no statistically significant difference (p > 0.05) in PTA1 (t 28 = 1.619, p = 0.117) and PTA2 (t 28 = 1.337, p = 0.192) between the two groups.

Procedure
The research was conducted in three distinct phases.During the initial phase, participants from both groups underwent testing to evaluate their speech recognition ability in noisy and reverberant conditions.In the subsequent phase, participants in the experimental groups received music training.Finally, in the last phase, the speech recognition ability in noise and reverberation was reassessed for all participants in both groups.Stimuli for speech recognition tests were presented from a personal laptop and routed through the Creative Soundblaster X-Fi USB sound card, while the Sennheiser HD 280 Pro headphones were used for stimulus presentation.All stimuli were digitized at a sampling rate of 44 100 Hz.The signal processing for speech recognition in noise and the music training paradigm was implemented in the MATLAB version 7.10.0platform.Additionally, the signal processing for speech recognition in reverberation was performed using Adobe Audition Version 3 software.

Assessment of speech recognition in noise
The standard QuickSIN protocol was employed to estimate speech recognition in noise.Two lists of the standard QuickSIN Kannada (Methi et al., 2009) sentences, spoken by the female speakers were used as the targets and a 4-talker speech babble was used as the background noise.Each list consisted of seven sentences, with the first sentence presented at a signal-tonoise ratio (SNR) of 20 dB.Subsequently, the SNR was gradually decreased in 5 dB increments until reaching −10 dB SNR for the final sentence.The sentences were presented at the most comfortable level (MCL) of the participant.For each sentence, the count of correctly identified keywords by each participant was determined and converted into the proportion of correct responses for each list.The SNR required to achieve a 50% correct recognition score (SNR50) was then estimated by fitting the cumulative Gaussian psychometric function to the proportion of correct responses at each SNR level.SNR50 was calculated as the midpoint of the psychometric function, separately for each list, and then averaged.In total, four sentence lists were employed to assess SNR50, with two sets used for pre-training evaluation and the remaining two sets for post-training assessment.

Assessment of speech recognition in reverberation
A single list of sentences from the QuickSIN test was convolved with binaural room impulse responses (BRIRs) to simulate speech recognition in a reverberant environment.This BRIR was generated to simulate a standard rectangular auditorium with an average reverberation time of 0.6 seconds.The reverberant material was presented to the participants at the MCL set by the participants.The total count of accurately identified keywords was tallied, with a maximum achievable score of 35.For assessment purposes, two sets of sentences were utilized, one for the pre-training evaluation and another for the post-training assessment.

Music training
The participants in the experimental group were subjected to short-term musical training spanning approximately seven days.The training initially consisted of ten Sampoorna ragas of Carnatic classical music.The ascending and descending pattern (Arohana and Avarohana) of all ten Sampoorna ragas were recorded using violin, veena, and flute instruments played by three professional artists with over ten years of experience.These ten ragas were divided into two lists, each containing five ragas.Subsequently, based on a pilot study, only one list comprising the ragas Mayamalavagowla, Kalyani, Thodi, Natabhairavi, and Charukeshi was selected for the musical training.A custom training module was developed in the graphical user interface (GUI) format, incorporating a training component and an assessment module for raga identification.
During the initial training session, the participants were familiarized with all five ragas by listening to violin samples.The unique characteristics of each raga were explained to them.Gradually, they were taught to identify and discriminate the ragas based on the ascends and descends.Throughout the training, multiple rehearsals and feedback were provided.At the end of each session, the participant's ability to identify each raga was assessed by randomly presenting each raga ten times.The training session continued until the participant achieved a 100% correct score.
Once the training with the violin samples was completed, a similar process was followed using veena samples.In the final phase of the training, the participant's ability to transfer the knowledge of ragas acquired from the violin and veena to the flute was ensured.In this stage, the participants underwent a raga identification test where each raga played on the flute was randomly presented ten times.The training was considered finished when the participants achieved a flawless score of 100%.If any participants failed to attain a perfect score of 100%, they were taken back to the previous stage, where they received further training with veena samples.Once they achieved a perfect score for the veena samples, they progressed to the next stage for the raga identification test with flute samples.This process continued until all participants obtained perfect scores of 100% for the flute samples.

Results
The statistical analyses were performed using the JASP version 1.17.1.0software.JASP is a comprehensive and user-friendly statistical software that offers a wide range of tools for data analysis, including Bayesian and frequentist methods.With its intuitive interface and extensive statistical capabilities, JASP provides researchers with a powerful platform for conducting rigorous and transparent statistical analyses.In the current study, series of Bayesian paired sample t-tests were employed to investigate the main effect of music training on speech recognition outcomes in noise and reverberation.Series of Bayesian independent sample t-tests were performed to examine the disparity in speech recognition performance in noise and reverberation between the control and experimental groups.

Speech recognition in noise
The statistical analysis revealed that the SNR50 of participants in the experimental group was significantly different in the post-training session compared to the pre-training session (BF 10 = 9.20).Music training had a positive influence by reducing the SNR50 in the experimental group.On the other hand, there was no significant difference in SNR50 between the baseline and follow-up sessions (BF 10 = 0.44) in the control group.
The statistical analysis revealed that there was no significant difference in SNR50 between the control group and experimental group in the pre-training session (BF 10 = 0.35).However, after subjecting the experimental group to musical training, the SNR50 was estimated in both the control and experimental groups.
The SNR50 in the experimental group was found to be better than the control group (BF 10 = 141.5).The mean and standard deviation of SNR50 in baseline and follow-up sessions in both the control and experimental group is depicted in Fig. 1.

SNR50 [dB]
Pre-test Post-test Control Experiment Fig. 1.Mean and standard deviation of SNR50 in baseline and follow-up sessions for both the control and experimental groups.

Speech recognition in reverberation
The main effect of music training on speech recognition scores in reverberation was evaluated by comparing the scores obtained in pre-training and posttraining sessions.The total correct speech recognition scores of the participants in the experimental group were significantly larger in post-training sessions than in pre-training sessions (BF 10 = 7.57).This result suggests that music training has improved speech recognition ability in reverberation.In contrast, there was no significant difference (BF 10 = 0.45) in the baseline and follow-up performance of the control group on speech recognition scores.
Speech recognition scores measured at the pre-training session were not different between the control and experimental group (BF 10 = 0.26).The speech recognition scores in reverberation measured following the music training in the experimental group were higher (BF 10 = 11.68)than the speech recognition scores of the control group.The mean and standard deviations of the correct scores are depicted in Fig. 2.

Pre-test
Post-test Control Experiment Fig. 2. Mean and standard deviations of speech recognition scores in baseline and follow-up sessions for both the control and experimental group.

Discussion
The present study suggests a positive impact of short-term musical training on speech recognition abilities in older adults in adverse listening conditions.This is evident from the improved SNR50 and speech recognition scores under reverberant conditions.Previous research has consistently shown that musicians tend to exhibit enhanced auditory abilities compared to non-musicians, as demonstrated in various studies found that the peaks of the waveform, carrying crucial temporal cues, were better preserved in musicians than in non-musicians, both in quiet and in the presence of background noise.Musicians also exhibited enhanced phase-locking abilities compared to nonmusicians.The process of learning music enables the auditory system to adapt and extract essential cues from complex signals, resulting in improved neural representation within the auditory system.It permits better coding of the temporal and spectral aspects of the signal and also helps in concurrent stream segregation, which is essential for perceiving speech in adverse listening conditions (Zendel, Alain, 2009).
Exposure to music can strengthen the neural responses to stimuli and facilitate bottom-up processing.The auditory efferent system, known for suppressing irrelevant background noise, can enhance the perception of target speech (Luo et al., 2008;Zhang et al., 1997).Through prolonged musical training, top-down processing may modulate neural responses and magnify the cues that are important for stimulus identification.The formation of the auditory template plays a vital role in speech perception, and speaker identification (Best et al., 2008).It is possible when good timber perception produces an excellent harmonic representation of the complex stimulus.A good perception of timbre, which generates a high-quality harmonic representation of complex stimuli, was observed in musicians compared to non-musicians.Additionally, musicians exhibited heightened sensitivity to subtle harmonic changes (Musacchia et al., 2008;Zendel, Alain, 2009).These factors could have also influenced our study, potentially contributing to improved speech performance in older adults following musical training.
Various hypotheses have been proposed to explain the musical training-dependent changes in auditory processing abilities.Patel (2011) introduced the OPERA (overlap, precision, emotions, repetition, and attention) hypothesis, which offers potential explanations for the changes observed in auditory processing abilities resulting from musical training.The overlap hypothesis suggests anatomical overlap in the brain networks responsible for processing music and speech.According to the precision theory, the heightened precision required for music processing can also be beneficial for speech processing.The emotion theory proposes that the positive emotions evoked by music activate the brain's reward centres, leading to neural plasticity.Additionally, the brain's networks are frequently exposed to musical stimuli, leading to the repetition effect.Lastly, the attention theory states that the networks engaged in music processing are linked to focused attention, which is also crucial for recognizing speech in noisy environments.Therefore, the OPERA hypothesis provides a framework for understanding the improvements in auditory processing and speech recognition abilities in challenging listening conditions associated with music training.
Musical training also presents challenges to shortterm memory and attention.Throughout the training, the participants were required to listen attentively to the ragas being played, placing a cognitive load on their memory as they aimed to recognize the raga based on the notes.Patel (2011) hypothesized that focused attention on the intricate details of the musical sounds promotes plasticity.Studies on animals have also demonstrated that training-induced plasticity is enhanced when active listening is involved (Fritz et al., 2005).Kraus and White-Schwoch (2014) believed that music training enhances auditory processing regardless of duration and intensity.The findings of the present study align with their viewpoint, indicating that shortterm musical training improves auditory processing and speech recognition abilities in older adults.Consequently, short-term music training holds potential as a way to alleviate auditory processing and speech recognition deficits in older adults.However, further investigation is necessary to determine the minimum duration of training required to maintain the generalized benefits.This aspect presents a promising avenue for future research in this field.
One notable finding in the present study is the extent of improvement observed in SNR50.Specifically, the magnitude of improvement observed in our study slightly exceeds that observed in long-term trained musicians who are native English speakers (Parbery-Clark et al., 2009b).Conversely, Jain et al. (2015) reported a similar magnitude of improvement following short-term music training in young native Kannada language speakers.These observations lead to the speculation that Carnatic music may be more effective in enhancing speech understanding abilities compared to other genres of music.Additionally, the favourable phonetic characteristics of the Kannada language may contribute to the manifestation of the effects of music training on speech understanding.However, further exploration is necessary to investigate these speculations.Mishra and Panda (2014), also reported a positive effect of Carnatic music on auditory perceptual abilities, observing improved auditory perceptual abilities in Carnatic musicians compared to non-musicians.Each Carnatic music raga has unique ascending and descending musical patterns, distinguished by variations in the pitch of the notes.The ragas chosen for this study included all seven notes of music, known as Sampoorna ragas in Carnatic music.Indian classical music experts recognize the distinct properties of each raga, such as the tonic frequency, Swaras, Arohana (ascending notes), Avarohana (descending notes), Vaadi (primary note), Samvaadi (secondary note), and more.Each raga follows specific rules that define its characteristics and set it apart from others.While some ragas may share the same set of notes or Swaras, their combinations differ.The ascending pitch sequence is known as Arohana, while the descending sequence is called Avarohana.The fundamental basis of differentiation lies in the frequency and corresponding pitch.Unlike Western music, Indian musical notes do not adhere to standardized frequencies.Instead, artists choose a convenient frequency as a reference, which serves as the base for the entire raga.The ragas selected for this study differed from one another by one or two notes, with these differing notes falling in frequencies close to each other.Due to these unique qualities of Carnatic music ragas, they possess a higher potential as effective tools for auditory training.

Conclusion
The ability to perceive and distinguish important cues such as timber, pitch, and timing is critical in processing complex signals like speech and music.Developing precise auditory discrimination skills is vital for effectively extracting these cues.Musical training plays a crucial role in refining these skills and strengthening the neural representation of the auditory system, thereby enhancing speech perception.The present findings suggest that even a short period of musical training can significantly improve the speech perception abilities of older adults, especially in challenging listening conditions.Furthermore, the enjoyable nature of music further underscores its potential as a valuable tool for enhancing speech perception skills in adverse listening situations for older adults.However, the long-term sustainability of the training effect cannot be determined solely based on the current study, calling for further research on the long-term maintenance of short-term training outcomes.
al., 2002; Curhan et al., 2013).Physical exercise may help in preventing age-related auditory disorders, although it remains unclear whether it can reverse hearing changes that have already occurred due to aging.Other studies have demonstrated the benefits of auditory training using different stimuli, such as monosyllables (Burk et al., 2006) and consonant-vowel transitions at the syllable, word, sentence, and context levels (Anderson et al., 2013).These training methods have improved neural timing, processing speed, and speech perception in noisy environments.In a broader sense, musical training can be considered a form of auditory training, and long-term musical training has been found to have positive effects on auditory and cognitive abilities (Parbery-Clark et al., 2009a; 2009b; 2012; 2013; Kraus, Chandrasekaran, 2010; Patel, 2011).
(Kraus, Chandrasekaran, 2010; Kraus, White-Schwoch, 2014; Musacchia et al., 2007; Parbery-Clark et al., 2012; Rammsayer, Altenmüller, 2006; Slater et al., 2015; Strait, Kraus, 2011).Furthermore, even older adults with musical experience performed better than their non-musician counterparts in speech-in-noise tasks (Anderson et al., 2013; Kraus, White-Schwoch, 2014; White-Schwoch et al., 2013).Electrophysiological studies have indicated that long-term musical training can influence neural encoding by altering the responsiveness of sub-cortical and cortical neurons, thereby enhancing auditory processing ability.Recent electrophysiological studies focussing on short-term musical training lasting eight days, conducted on young non-musicians, observed changes primarily in cortical responses rather than subcortical responses (Devi et al., 2015; Jain et al., 2014).These findings provide evidence that even brief musical training can lead to improvements in the neural encoding process.Thus, it can be inferred that the improvements observed in the current study may also be attributed to enhanced neural encoding mechanisms associated with musical training.Parbery-Clark et al. (2009) investigated subcortical speech coding in musicians and non-musicians using speech-evoked auditory brainstem responses.They Anderson et al. (2013) expressed a similar viewpoint, emphasizing the importance of cognitive involvement in auditory training programs.The cognitive demand on memory leads to an increased reliance on perceptual cues mediated by the prefrontal cortex.Consequently, the perceptual demands and memory interacted during the training program to strengthen the neural representation of speech perception in the presence of background noise.Therefore, the height-ened cognitive load experienced during the training is likely to positively impact the auditory processing and speech recognition abilities of older adults.

Table 1 .
Age and hearing thresholds of all the participants in the experimental and control group.