Prof. Hanna Bogucka, head of the Department of Wireless Communications at the Poznań University of Technology, discusses unnecessary inhibitions, the usefulness of microphones, and the links between people and technology.
This paper presents and compares microphone calibration methods for the simultaneous calibration of small electret microphones in a wave guide. The microphones are simultaneously calibrated to a reference microphone both in amplitude and phase. The calibration procedure is formulated on the basis of the damped plane wave propagation equation, from which the acoustics field along the wave guide is predicted, using several reference measurements. Different calibration models are presented and the methods were found to be sensitive to the formulation, as well as to the number of free parameters used during the reconstruction of the wave-field. The wave guide model based on five free parameters was found to be the preferred method for this type of calibration procedure.
The development of digital microphones and loudspeakers adds new and interesting possibilities of their applications in different fields, extended from industrial, medical to consumer audio markets. One of the rapidly growing field of applications is mobile multimedia, such as mobile phones, digital cameras, laptop and desktop PCs, etc. The advances have also been made in digital audio, particularly in direct digital transduction, so it is now possible to create the all-digital audio recording and reproduction chains potentially having several advantages over existing analog systems.
Whenever the recording engineer uses stereo microphone techniques, he/she has to consider a recording angle resulting from the positioning of microphones relative to sound sources, besides other acoustic factors. The recording angle, the width of a captured acoustic scene and the properties of a particular microphone technique are closely related. We propose a decision supporting method, based on the mapping of the actual position of a sound source to its position in the reproduced acoustic scene. This research resulted in a set of localisation curves characterising four most popular stereo microphone techniques. The curves were obtained by two methods: calculation, based on appropriate engineering formulae, and experiment consisting in the recording of sources and estimation of the perceived position in listening tests. The analysis of curves brings several conclusions important in the recording practice.
Passive noise reduction means are commonly used to reduce noise in the industry but, unfortunately, their effectiveness is poor in the low frequency range. By applying active structural acoustic control to the enclosure walls significant improvement of the insulating properties in this frequency range can be achieved. In this paper a model of double panel structure with ASAC is presented. The structure consists of two aluminium plates separated by an air gap. Two inertial magnetoelectric actuators and two piezoceramic MFC sensors were used for controlling the structure. A multichannel FxLMS algorithm with virtual error microphone technique is used as a control algorithm. The signal of a virtual error microphone is extrapolated basing on signals from MFC sensors. Performance of this actively controlled structure for tonal signals at selected frequencies is presented in the article. During the study, a double panel structure was mounted on one wall of sound insulating enclosure located in an acoustic chamber. During the measurements local and global reduction of noise test signal was investigated.
The aim of this paper is to describe the process of choosing the best surround microphone technique for recording of choir with an instrumental ensemble. First, examples of multichannel microphone techniques including those used in the recording are described. Then, the assumptions and details of music recording in Radio Gdansk Studio are provided as well as the process of mixing of the multichannel recording. The extensive subjective tests were performed employing a group of sound engineers and students in order to find the most preferable recording techniques. Because the final recording is based on the mix of "direct/ambient" and "direct-sound all-around" approaches, a subjective quality evaluation was conducted and on this basis the best rated multichannel techniques were chosen. The results show that listeners might consider different factors when choosing the best rated multichannel techniques in separate tasks, as different systems were chosen in the two tests.
Research work on the design of robust multimodal speech recognition systems making use of acoustic and visual cues, extracted using the relatively noise robust alternate speech sensors is gaining interest in recent times among the speech processing research fraternity. The primary objective of this work is to study the exclusive influence of Lombard effect on the automatic recognition of the confusable syllabic consonant-vowel units of Hindi language, as a step towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature. The dataset for this work comprises the confusable 145 consonant-vowel (CV) syllabic units of Hindi language recorded simultaneously using three modalities that capture the acoustic and visual speech cues, namely normal acoustic microphone (NM), throat microphone (TM) and a camera that captures the associated lip movements. The Lombard effect is induced by feeding crowd noise into the speaker’s headphone while recording. Convolutional Neural Network (CNN) models are built to categorise the CV units based on their place of articulation (POA), manner of articulation (MOA), and vowels (under clean and Lombard conditions). For validation purpose, corresponding Hidden Markov Models (HMM) are also built and tested. Unimodal Automatic Speech Recognition (ASR) systems built using each of the three speech cues from Lombard speech show a loss in recognition of MOA and vowels while POA gets a boost in all the systems due to Lombard effect. Combining the three complimentary speech cues to build bimodal and trimodal ASR systems shows that the recognition loss due to Lombard effect for MOA and vowels reduces compared to the unimodal systems, while the POA recognition is still better due to Lombard effect. A bimodal system is proposed using only alternate acoustic and visual cues which gives a better discrimination of the place and manner of articulation than even standard ASR system. Among the multimodal ASR systems studied, the proposed trimodal system based on Lombard speech gives the best recognition accuracy of 98%, 95%, and 76% for the vowels, MOA and POA, respectively, with an average improvement of 36% over the unimodal ASR systems and 9% improvement over the bimodal ASR systems.