Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 30
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

In this article, an analysis of an innovative system for filtering signals in the audible range (16 Hz - 20 kHz) on programmable logic devices using a filters with a finite impulse response, is presented. Mentioned system was neat combination of software and hardware platform, where in the program layer a multiple programming languages including VHDL, JavaScript, Matlab or HTML were used to create completely useful application. To determine the coefficients of polynomial filters the Matlab Filter Design & Analysis Tool was used. Thanks to the developed graphic layer, a user-friendly interface was created, which allows easily transfer the required coefficients from the computer to the executive system. The practical implementation made on the FPGA platform, specifically on the Altera DE2- 115 development kit with the FPGA Cyclone IV, was compared with simulation realization of Matlab FIR filters. The performed research confirm the effectiveness of filtration in real time with up to 128th order of the filter for both audio channels simultaneously in FPGA-based system.
Go to article

Authors and Affiliations

Adrian Lipowski
1
Paweł Majewski
1
Sławomir Pluta
1

  1. Opole University Technology, Opole, Poland
Download PDF Download RIS Download Bibtex

Abstract

In October 2018, local digital radio was launched to cover the agglomeration of Wroclaw. The implementation of this undertaking required many tests, including qualitative ones, that refer to both music and speech. This paper presents the results of subjective tests based on the evaluation of speech quality of signals recorded at various points in Wroclaw. Measurements were carried out in accordance with the recommendations of the International Telecommunication Union as well as in ordinary acoustic conditions in listeners’ flats. The rating was made for male and female voices. The most important conclusion is that for speech signal assessment in meaning of the quality the test conditions do not influence the obtained results. The other fact confirmed in the experiment was that the receiving place of DAB+ signal in the Single-Frequency Network also does not affect the perceived voice quality.
Go to article

Authors and Affiliations

Stefan Brachmański
1
ORCID: ORCID
Maurycy Kin
1
Patrycja Zemankiewicz
1

  1. Wroclaw University of Science and Technology, Poland
Download PDF Download RIS Download Bibtex

Abstract

This study investigates listeners’ perceptual responses in audio-visual interactions concerning binaural spatial audio. Audio stimuli are coupled with or without visual cues to the listeners. The subjective test participants are tasked to indicate the direction of the incoming sound while listening to the audio stimulus via loudspeakers or headphones with the head-related transfer function (HRTF) plugin. First, the methodology assumptions and the experimental setup are described to the participants. Then, the results are presented and analysed using statistical methods. The results indicate that the headphone trials showed much higher perceptual ambiguity for the listeners than when the sound is delivered via loudspeakers. The influence of the visual modality dominates the audio-visual evaluation when loudspeaker playback is employed. Moreover, when the visual stimulus is present, the headphone playback pattern of behavior is not always in response to the loudspeaker playback.
Go to article

Authors and Affiliations

Bartłomiej Mróz
1 2
Bożena Kostek
2

  1. Multimedia Systems Department, Gdansk, Poland
  2. Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland
Download PDF Download RIS Download Bibtex

Abstract

In this article some key events concerning founding Polish Section of the Audio Engineering Society were presented. In addition, the history covering International Symposia on Sound Engineering and Mastering was outlined. Also, papers contained in this issue were shortly reviewed.

Go to article

Authors and Affiliations

Bożena Kostek
Marianna Sankiewicz
Download PDF Download RIS Download Bibtex

Abstract

The paper presents a comparative study of music features derived from audio recordings, i.e. the same music pieces but representing different music genres, excerpts performed by different musicians, and songs performed by a musician, whose style evolved over time. Firstly, the origin and the background of the division of music genres were shortly presented. Then, several objective parameters of an audio signal were recalled that have an easy interpretation in the context of perceptual relevance. Within the study parameter values were extracted from music excerpts, gathered and compared to determine to what extent they are similar within the songs of the same performer or samples representing the same piece.

Go to article

Authors and Affiliations

Aleksandra Dorochowicz
Bożena Kostek

Abstract

The 16th International Symposium on Sound Engineering and Tonmeistering (ISSET) organized by the Institute of Radioelectronics and Multimedia Technology (Warsaw University of Technology), Department of Sound Engineering (Fryderyk Chopin University of Music) and the Polish Radio, under auspicious of the Polish Section of the Audio Engineering Society was held in Warsaw on October 8-10 in 2015. The main topics of the Symposium covered mostly all domains of audio engineering, i.e. musical acoustics, noise control, signal processing, room acoustics, radio and television, multimedia, sound engineering and tonmeistering, perception and quality assessment, and many others. The extra attention has been paid for the problems of loudness of audio programs in radio and TV broadcasting. Over 60 people from different branches of audio technology participated in this Symposium and shared their knowledge and experiences during the paper sessions, technical tours, workshops and special presentations. The selection of abstracts of the papers presented at the ISSET’2015 are inserted below.
Go to article
Download PDF Download RIS Download Bibtex

Abstract

Recently, the rapid advancement of the IT industry has resulted in significant changes in audio-system configurations; particularly, the audio over internet protocol (AoIP) network-based audio-transmission technology has received favourable evaluations in this field. Applying the AoIP in a certain section of the multiple-cable zone is advantageous because the installation cost is lower than that for the existing systems, and the original sound is transmitted without any distortion. The existing AoIP-based technology, however, cannot control the audio-signal characteristics of every device and can only transmit multiple audio signals through a network. In this paper, the proposed Audio Network & Control Hierarchy Over peer-to-peer (Anchor) system enables all audio equipment to send and receive signals via a data network, and the receiving device can mix the signals of different IPs. Accordingly, it was possible to improve the system-application flexibility by simplifying the audio-system configuration. The research results confirmed that the received audio signals from different IPs were received, mixed, and output without errors. It is expected that Anchor will become a standard for audio-network protocols.

Go to article

Authors and Affiliations

Jaeho Lee
Hyoungjoon Jeon
Pyungho Choi
Soonchul Kwon
Seunghyun Lee
Download PDF Download RIS Download Bibtex

Abstract

In the early days, consumption of multimedia content related with audio signals was only possible in a stationary manner. The music player was located at home, with a necessary physical drive. An alternative way for an individual was to attend a live performance at a concert hall or host a private concert at home. To sum up, audio-visual effects were only reserved for a narrow group of recipients. Today, thanks to portable players, vision and sound is at last available for everyone. Finally, thanks to multimedia streaming platforms, every music piece or video, e.g. from one’s favourite artist or band, can be viewed anytime and everywhere. The background or status of an individual is no longer an issue. Each person who is connected to the global network can have access to the same resources. This paper is focused on the consumption of multimedia content using mobile devices. It describes a year to year user case study carried out between 2015 and 2019, and describes the development of current trends related with the expectations of modern users. The goal of this study is to aid policymakers, as well as providers, when it comes to designing and evaluating systems and services.

Go to article

Authors and Affiliations

Przemysław Falkowski-Gilski
Download PDF Download RIS Download Bibtex

Abstract

The paper presents the results of research and analysis of voice data transmission quality in IP packet networks. It analyses mechanisms allowing for the assessment of packet telephony data transmission quality. Possible transmission quality levels and adequate quality metrics, applicable in the recommendations of standardisation organisations, as well as suggested limit values conditioning acceptable voice data transmission quality were indicated and discussed. A packet network model was designed and tested, taking into account VoIP architecture supporting various audio codecs used for voice compression. Transmission mechanisms based on audio codecs G.711, G.723, G.726, G.728 and G.729 were investigated. It was shown that for delay-sensitive traffic which fluctuates beyond its nominal rate, selected codecs have an advantage over others and allow for better transmission quality of VoIP traffic with guaranteed bandwidth and delay.
Go to article

Bibliography

[1] S. K. Puspita FM and S. Z. Taib BM, “Improved models of internet charging scheme of single bottleneck link in multi qos networks,” 2013. [Online]. Available: http://ddms.usim.edu.my:80/jspui/handle/123456789/15429
[2] A. R. Modarressi and S. Mohan, “Control and management in next-generation networks: challenges and opportunities,” IEEE Communications Magazine, vol. 38, no. 10, pp. 94–102, 2000. [Online]. Available: https://doi.org/10.1109/35.874976
[3] D. Strzęciwilk, K. Ptaszek, P. Hoser, and I. Antoniku, “A research on the impact of encryption algorithms on the quality of vpn tunnels’ transmission,” in ITM Web of Conferences, vol. 21. EDP Sciences, 2018, p. 00011. [Online]. Available: https://doi.org/10.1051/itmconf/ 20182100011
[4] H. J. Kim and S. G. Choi, “A study on a qos/qoe correlation model for qoe evaluation on iptv service,” in 2010 The 12th International Conference on Advanced Communication Technology (ICACT), vol. 2. IEEE, 2010, pp. 1377–1382.
[5] D. Strzęciwilk, “Examination of transmission quality in the ip multiprotocol label switching corporate networks,” International Journal of Electronics and Telecommunications, vol. 58, pp. 267–272, 2012. [Online]. Available: http://doi.org/10.2478/v10177-012-0037-z
[6] A. J. Estepa, R. Estepa, J. M. Vozmediano, and P. Carrillo, “Dynamic voip codec selection on smartphones,” Netw. Protoc. Algorithms, vol. 6, no. 2, pp. 22–37, 2014. [Online]. Available: https://doi.org/10.5296/npa.v6i2.5370
[7] W. M. Zuberek and D. Strzeciwilk, “Modeling traffic shaping and traffic policing in packet-switched networks,” Journal of Computer Sciences and Applications, vol. 6, no. 2, pp. 75–81, 2018. [Online]. Available: http://pubs.sciepub.com/jcsa/6/2/4
[8] D. Cohen, “Specifications for the network voice protocol,” UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY INFORMATION SCIENCES INST, Tech. Rep., 1976. [Online]. Available: https://www.rfc-editor.org/info/rfc741
[9] J. Davidson, J. Peters, J. Peters, and B. Gracely, Voice over IP fundamentals. Cisco press, 2000. [10] S. Ganguly and S. Bhatnagar, VoIP: wireless, P2P and new enterprise voice over IP. John Wiley & Sons, 2008.
[11] B. Hartpence, Packet Guide to Voice over IP: A system administrator’s guide to VoIP technologies. " O’Reilly Media, Inc.", 2013.
[12] S. Deering and R. Hinden, “Rfc2460: Internet protocol, version 6 (ipv6) specification,” 1998.
[13] K. Ramakrishnan, S. Floyd, and D. Black, “Rfc3168: The addition of explicit congestion notification (ecn) to ip,” 2001.
[14] K. Nicholas, “Definition of the differentiated services field in the ipv4 and ipv6 headers,” RFC 2474, 1998.
[15] F. Baker, J. Polk, and M. Dolly, “A differentiated services code point (dscp) for capacity-admitted traffic,” Internet Engineering Task Force (IETF), 2010.
[16] D. Strzęciwilk, R. Nafkha, and R. Zawi´slak, “Performance analysis of a qos system with wfq queuing using temporal petri nets,” in International Conference on Computer Information Systems and Industrial Management. Springer, 2021, pp. 462–476. [Online]. Available: https://doi.org/10.1007/978-3-030-84340-3_38 [17] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss, “An architecture for differentiated services,” 1998.
[18] D. C. Dowden, R. D. Gitlin, and R. L. Martin, “Next-generation networks,” Bell Labs technical journal, vol. 3, no. 4, pp. 3–14, 1998. [Online]. Available: https://doi.org/10.1002/bltj.2125
[19] G. R. Ash, Traffic engineering and QoS optimization of integrated voice and data networks. Elsevier, 2006.
[20] M. H. Miraz, S. A. Molvi, M. A. Ganie, M. Ali, and A. H. Hussein, “Simulation and analysis of quality of service (qos) parameters of voice over ip (voip) traffic through heterogeneous networks,” arXiv preprint arXiv:1708.01572, 2017. [Online]. Available: https://arxiv.org/abs/1708.01572
[21] E. T. Affonso, R. D. Nunes, R. L. Rosa, G. F. Pivaro, and D. Z. Rodriguez, “Speech quality assessment in wireless voip communication using deep belief network,” IEEE Access, vol. 6, pp. 77 022–77 032, 2018. [Online]. Available: https://doi.org/10.1109/ACCESS.2018.2871072
[22] J. Yu and I. Al-Ajarmeh, “Call admission control and traffic engineering of voip,” in 2007 Second International Conference on Digital Telecommunications (ICDT’07). IEEE, 2007, pp. 11–11.
[23] T. ITU, “Recommendation g. 114, one-way transmission time,” Series G: Transmission Systems and Media, Digital Systems and Networks, Telecommunication Standardization Sector of ITU, 2000.
[24] J. H. James, B. Chen, and L. Garrison, “Implementing voip: a voice transmission performance progress report,” IEEE Communications Magazine, vol. 42, no. 7, pp. 36–41, 2004. [Online]. Available: https://doi.org/10.1109/MCOM.2004.1316528
[25] J. G. Beerends, C. Schmidmer, J. Berger, M. Obermann, R. Ullmann, J. Pomy, and M. Keyhl, “Perceptual objective listening quality assessment (polqa), the third generation itut standard for end-to-end speech quality measurement part i—temporal alignment,” Journal of the Audio Engineering Society, vol. 61, no. 6, pp. 366–384, 2013. [Online]. Available: http://resolver.tudelft.nl/uuid:91d98cbc-d802-40d3-a1bb-a58d67668728
[26] R. D. Nunes, R. L. Rosa, and D. Z. Rodríguez, “Performance improvement of a non-intrusive voice quality metric in lossy networks,” IET Communications, vol. 13, no. 20, pp. 3401–3408, 2019. [Online]. Available: https://doi.org/10.1049/iet-com.2018.5165
[27] B. Naderi and R. Cutler, “An open source implementation of itu-t recommendation p. 808 with validation,” arXiv preprint arXiv:2005.08138, 2020. [Online]. Available: https://arxiv.org/ct?url=https%3A%2F%2Fdx. doi.org%2F10.21437%2FInterspeech.2020-2665&v=69f1738e
[28] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221), vol. 2. IEEE, 2001, pp. 749–752.
[29] S. Voran, “Objective estimation of perceived speech quality. i. development of the measuring normalizing block technique,” IEEE Transactions on speech and audio processing, vol. 7, no. 4, pp. 371–382, 1999. [Online]. Available: https://doi.org/10.1109/89.771259
[30] M. Coto-Jimenez, J. Goddard-Close, L. Di Persia, and H. L. Rufiner, “Hybrid speech enhancement with wiener filters and deep lstm denoising autoencoders,” in 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, 2018, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IWOBI.2018.8464132
[31] L. Ding and R. A. Goubran, “Speech quality prediction in voip using the extended e-model,” in GLOBECOM’03. IEEE Global Telecommunications Conference (IEEE Cat. No. 03CH37489), vol. 7. IEEE, 2003, pp. 3974–3978. [Online]. Available: https://doi.org/10.1109/GLOCOM.2003.1258975
[32] J. A. Bergstra and C. Middelburg, “Itu-t recommendation g. 107: The e-model, a computational model for use in transmission planning,” 2003.
[33] R. Jain, “Quality of experience,” IEEE multimedia, vol. 11, no. 1, pp. 96–95, 2004. [Online]. Available: https://doi.org/10.1109/MMUL.2004.10000
[34] A. Eskandar, M. Syed et al., “Performance analysis of voip over gre tunnel.” International Journal of Computer Network & Information Security, vol. 7, no. 12, 2015. [Online]. Available: http://doi.org/10.5815/ijcnis.2015.12.01
[35] R. S. Ramakrishnan and P. V. Kumar, “Performance analysis of different codecs in voip using sip,” in The Conference on Mobile and Pervasive Computing, 2008, pp. 142–145.
[36] S. Ragot, B. Kovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl et al., “Itu-t g. 729.1: An 8-32 kbit/s scalable coder interoperable with g. 729 for wideband telephony and voice over ip,” in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4. IEEE, 2007, pp. IV–529. [Online]. Available: https://doi.org/10.1109/ICASSP. 2007.366966
Go to article

Authors and Affiliations

Dariusz Strzęciwilk
1

  1. Institute of Information Technology, University of Life Sciences, Warsaw, Poland
Download PDF Download RIS Download Bibtex

Abstract

Audio data compression is used to reduce the transmission bandwidth and storage requirements of audio data. It is the second stage in the audio mastering process with audio equalization being the first stage. Compression algorithms such as BSAC, MP3 and AAC are used as standards in this paper. The challenge faced in audio compression is compressing the signal at low bit rates. The previous algorithms which work well at low bit rates cannot be dominant at higher bit rates and vice-versa. This paper proposes an altered form of vector quantization algorithm which produces a scalable bit stream which has a number of fine layers of audio fidelity. This modified form of the vector quantization algorithm is used to generate a perceptually audio coder which is scalable and uses the quantization and encoding stages which are responsible for the psychoacoustic and arithmetical terminations that are actually detached as practically all the data detached during the prediction phases at the encoder side is supplemented towards the audio signal at decoder stage. Therefore, clearly the quantization phase which is modified to produce a bit stream which is scalable. This modified algorithm works well at both lower and higher bit rates. Subjective evaluations were done by audio professionals using the MUSHRA test and the mean normalized scores at various bit rates was noted and compared with the previous algorithms.
Go to article

Authors and Affiliations

Shajin Prince
1
Bini D
1
A Alfred Kirubaraj
1
J Samson Immanuel
1
Surya M
1

  1. Karunya Institute of Technology and Sciences, Coimbatore, India
Download PDF Download RIS Download Bibtex

Abstract

This article presents an efficient method of modelling acoustic phenomena for real-time applications such as computer games. Simplified models of reflections, transmission, and medium attenuation are described along with assessments conducted by a professional sound designer. The article introduces representation of sound phenomena using digital filters for further digital audio processing.
Go to article

Authors and Affiliations

Bartłomiej Miga
Bartosz Ziółko
Download PDF Download RIS Download Bibtex

Abstract

In this paper, a new lifting wavelet domain audio watermarking algorithm based on the statistical characteristics of sub-band coefficients is proposed. First of all, an original audio signal was segmented and each segment was divided into two sections. Then, the Barker code was used for synchronization, the LWT (lifting wavelet transform) was performed on each section, a synchronization code and a watermark were embedded into the first section and the second section, respectively, by modifying the statistical average value of the sub-band coefficients. The embed strength was determined adaptively according to the auditory masking property. Experiments show that the embedded watermark has better robustness against common signal processing attacks than present algorithms based on LWT and can resist random cropping in particular.

Go to article

Authors and Affiliations

Zhi Tao
He-ming Zhao
Jun Wu
Ji-hua Gu
Yi-shen Xu
Di Wu
Download PDF Download RIS Download Bibtex

Abstract

In building speech recognition based applications, robustness to different noisy background condition is an important challenge. In this paper bimodal approach is proposed to improve the robustness of Hindi speech recognition system. Also an importance of different types of visual features is studied for audio visual automatic speech recognition (AVASR) system under diverse noisy audio conditions. Four sets of visual feature based on Two-Dimensional Discrete Cosine Transform feature (2D-DCT), Principal Component Analysis (PCA), Two-Dimensional Discrete Wavelet Transform followed by DCT (2D-DWT- DCT) and Two-Dimensional Discrete Wavelet Transform followed by PCA (2D-DWT-PCA) are reported. The audio features are extracted using Mel Frequency Cepstral coefficients (MFCC) followed by static and dynamic feature. Overall, 48 features, i.e. 39 audio features and 9 visual features are used for measuring the performance of the AVASR system. Also, the performance of the AVASR using noisy speech signal generated by using NOISEX database is evaluated for different Signal to Noise ratio (SNR: 30 dB to −10 dB) using Aligarh Muslim University Audio Visual (AMUAV) Hindi corpus. AMUAV corpus is Hindi continuous speech high quality audio visual databases of Hindi sentences spoken by different subjects.
Go to article

Authors and Affiliations

Prashant Upadhyaya
Omar Farooq
M.R. Abidi
Priyanka Varshney
Download PDF Download RIS Download Bibtex

Abstract

Biography and scientific achievements of Professors Marianna Sankiewicz-Budzyński and Gustaw K.E. Budzyński - Founders of the Polish Audio Engineering.

Go to article

Authors and Affiliations

Andrzej Czyżewski
Bożena Kostek
Download PDF Download RIS Download Bibtex

Abstract

The MDCT and IntMDCT Algorithm is widely utilized is Audio coding. By lifting scheme or rounding operation IntegerMDCT is evolved from Modified Discrete Cosine Transform. This method acquire the properties of MDCT and contribute excelling invertiblity and good spectral mean .In this paper we discuss about the audio codec like AAC and FLAC using MDCT and Integer MDCT algorithm and to find which algorithm shows better Compression Ratio(CR).The confines of this task is to hybriding lossy and lossless audio codec with diminished bit rate but with finer sound quality. Certainly the quality of the audio is figure out by Subjective and Objective testing which is in terms of MOS (Mean opinion square), ABx and some of the hearing aid testing methodology like PEAQ(Perceptual Evaluation Audio Quality) and ODG(Objective Difference Grade)is followed. Execution measure, that is Compression Ratio(CR) and Sound Pressure Level (SPL) is approximated.

Go to article

Authors and Affiliations

M. Davidson Kamala Dhas
R. Priyadharsini
Download PDF Download RIS Download Bibtex

Abstract

Field programmable analog arrays (FPAA), thanks to their flexibility and reconfigurability, give the designers quite new possibilities in analog circuit design. The number of both academic projects on FPAA and applications of commercially available programmable devices is still growing. This paper explores the properties and parameters of two most popular FPAA circuits: the AnadigmVortex AN221E04 and AnadigmApex AN231E04 from the Anadigm company. The research conducted by the authors led to the discovery of some undocumented features of these devices. Several applications for audio processing were built and tested. The results show that these circuits can be used in medium-demanding audio applications. Thanks to dynamic reconfigurability, they also allow to build an universal analog audio signal processor. These circuits can also act as a versatile platform for rapid prototyping and educational purposes.

Go to article

Authors and Affiliations

Piotr Falkowski
Andrzej Malcher
Download PDF Download RIS Download Bibtex

Abstract

The aim of the study was to examine how the wording of a question about audio, visual and audiovisual stimuli can affect the assessment of the environment. The participants of the psychophysical experiments were asked to rate, on a numerical scale, audio and visual information both separately and together, combined into mixes. A set of questions was used for all the investigated audio, visual, and audio-visual stimuli. The participants were asked about the comfort or the discomfort caused by the perceived stimuli presented at three different sound levels.
The results show that there are no statistically significant differences between the assessment of comfort and discomfort associated with visual samples. Actually, the comfort and discomfort ratings are equivalent to the extent that a discomfort rating can be represented as the opposite to the comfort rating, i.e. the discomfort rating is equal to the 10 minus comfort rating.
In general, the results obtained for audio and audio-visual samples were the same, with only a few exceptions that were dependent on sound level. No statistically significant differences were found for the loudest stimuli, but there were some exceptions for the softener cases. Based on the results, we show that only for visual stimuli both scales are totally interchangeable. When presenting audio and audio-visual samples, only one scale should be applied – either discomfort or comfort, depending on the context and the character of the stimuli.
Go to article

Authors and Affiliations

Jan Felcyn
1
ORCID: ORCID
Anna Preis
1
Marcin Praszkowski
1
Małgorzata Wrzosek
2

  1. Department of Acoustics, Faculty of Physics, Adam Mickiewicz University, Poznan, Poland
  2. Institute of Philosophy, Szczecin University, Szczecin, Poland
Download PDF Download RIS Download Bibtex

Abstract

As the virtual reality (VR) market is growing at a fast pace, numerous users and producers are emerging with the hope to navigate VR towards mainstream adoption. Although most solutions focus on providing highresolution and high-quality videos, the acoustics in VR is as important as visual cues for maintaining consistency with the natural world. We therefore investigate one of the most important audio solutions for VR applications: ambisonics. Several VR producers such as Google, HTC, and Facebook support the ambisonic audio format. Binaural ambisonics builds a virtual loudspeaker array over a VR headset, providing immersive sound. The configuration of the virtual loudspeaker influences the listening perception, as has been widely discussed in the literature. However, few studies have investigated the influence of the orientation of the virtual loudspeaker array. That is, the same loudspeaker arrays with different orientations can produce different spatial effects. This paper introduces a VR audio technique with optimal design and proposes a dual-mode audio solution. Both an objective measurement and a subjective listening test show that the proposed solution effectively enhances spatial audio quality.
Go to article

Authors and Affiliations

Shu-Nung Yao
1

  1. Department of Electrical Engineering, National Taipei University, No. 151, University Rd., Sanxia Dist., New Taipei City 237303, Taiwan
Download PDF Download RIS Download Bibtex

Abstract

The paper examines the usage of Convolutional Bidirectional Recurrent Neural Network (CBRNN) for a problem of quality measurement in a music content. The key contribution in this approach, compared to the existing research, is that the examined model is evaluated in terms of detecting acoustic anomalies without the requirement to provide a reference (clean) signal. Since real music content may include some modes of instrumental sounds, speech and singing voice or different audio effects, it is more complex to analyze than clean speech or artificial signals, especially without a comparison to the known reference content. The presented results might be treated as a proof of concept, since some specific types of artefacts are covered in this paper (examples of quantization defect, missing sound, distortion of gain characteristics, extra noise sound). However, the described model can be easily expanded to detect other impairments or used as a pre-trained model for other transfer learning processes. To examine the model efficiency several experiments have been performed and reported in the paper. The raw audio samples were transformed into Mel-scaled spectrograms and transferred as input to the model, first independently, then along with additional features (Zero Crossing Rate, Spectral Contrast). According to the obtained results, there is a significant increase in overall accuracy (by 10.1%), if Spectral Contrast information is provided together with Mel-scaled spectrograms. The paper examines also the influence of recursive layers on effectiveness of the artefact classification task.

Go to article

Authors and Affiliations

Kamila Organiściak
Józef Borkowski
Download PDF Download RIS Download Bibtex

Abstract

Independent Component Analysis (ICA) can be used for single channel audio separation, if a mixed signal is transformed into time-frequency domain and the resulting matrix of magnitude coefficients is processed by ICA. Previous works used only frequency (spectral) vectors and Kullback-Leibler distance measure for this task. New decomposition bases are proposed: time vectors and time-frequency components. The applicability of several different measures of distance of components are analysed. An algorithm for clustering of components is presented. It was tested on mixes of two and three sounds. The perceptual quality of separation obtained with the measures of distance proposed was evaluated by listening tests, indicating "beta" and "correlation" measures as the most appropriate. The "Euclidean" distance is shown to be appropriate for sounds with varying amplitudes. The perceptual effect of the amount of variance used was also evaluated.

Go to article

Authors and Affiliations

Dariusz Mika
Piotr Kleczkowski
Download PDF Download RIS Download Bibtex

Abstract

This paper reviews parametric audio coders and discusses novel technologies introduced in a low-complexity, low-power consumption audio decoder and music synthesizer platform developed by the authors. The decoder uses parametric coding scheme based on the MPEG-4 Parametric Audio standard. In order to keep the complexity low, most of the processing is performed in the parametric domain. This parametric processing includes pitch and tempo shifting, volume adjustment, selection of psychoacoustically relevant components for synthesis and stereo image creation. The decoder allows for good quality 44.1 kHz stereo audio streaming at 24 kbps. The synthesizer matches the audio quality of industry-standard sample-based synthesizers while using a twenty times smaller memory footprint soundbank. The presented decoder/synthesizer is designed for low-power mobile platforms and supports music streaming, ringtone synthesis, gaming and remixing applications.

Go to article

Authors and Affiliations

Marek Szczerba
Werner Oomen
Dieter Therssen
Download PDF Download RIS Download Bibtex

Abstract

This paper addresses the problem of tampering detection and discusses methods used for authenticity analysis of digital audio recordings. Presented approach is based on frame offset measurement in audio files compressed and decoded by using perceptual audio coding algorithms which employ modified discrete cosine transform. The minimum values of total number of active MDCT coefficients occur for frame shifts equal to multiplications of applied window length. Any modification of audio file, including cutting off or pasting a part of audio recording causes a disturbance within this regularity. In this study the algorithm based on checking frame offset previously described in the literature is expanded by using each of four types of analysis windows commonly applied in the majority of MDCT based encoders. To enhance the robustness of the method additional histogram analysis is performed by detecting the presence of small value spectral components. Moreover, computation of maximum values of nonzero spectral coefficients is employed, which creates a gating function for the results obtained based on previous algorithm. This solution radically minimizes a number of false detections of forgeries. The influence of compression algorithms' parameters on detection of forgeries is presented by applying AAC and Ogg Vorbis encoders as examples. The effectiveness of tampering detection algorithms proposed in this paper is tested on a predefined music database and compared graphically using ROC-like curves.

Go to article

Authors and Affiliations

Rafał Korycki
Download PDF Download RIS Download Bibtex

Abstract

In this paper, a robust and perceptually transparent single-level and multi-level blind audio watermarking scheme using wavelets is proposed. A randomly generated binary sequence is used as a watermark, and wavelet function coding is used to embed the watermark sequence in audio signals. Multi-level watermarking is used to enhance payload capacity and can be used for a different level of security. The robustness of the scheme is evaluated by applying different attacks such as filtering, sampling rate alteration, compression, noise addition, amplitude scaling, and cropping. The simulation results obtained show that the proposed watermarking scheme is resilient to various attacks except cropping. Perceptual transparency of watermark is measured by using Perceptual Evaluation of Audio Quality (PEAQ) basic model of ITU-R (PEAQ ITU-R BS.1387) on Speech Quality Assessing Material (SQAM) given by European Broadcasting Union (EBU). Average Objective Difference Grade (ODG) measured for this method is -0.067 and -0.080 for single-level and multi-level watermarked audio signals, respectively. In the proposed single-level digital audio watermarking scheme, the payload capacity is increased by 19.05% as compared to the single-level Chirp-Based Digital Audio Watermarking (CB-DAWM) scheme.
Go to article

Authors and Affiliations

Farooq Husain
Omar Farooq
Ekram Khan
Download PDF Download RIS Download Bibtex

Abstract

In the age of digital media, delivering broadcast content to customers at an acceptable level of quality is one of the most challenging tasks. The most important factor is the efficient use of available resources, including bandwidth. An appropriate way of managing the digital multiplex is essential for both the economic and technical issues. In this paper we describe transmission quality measurements in the DAB+ broadcast system. We provide a methodology for analysing parameters and factors related with the efficiency and reliability of a digital radio link. We describe a laboratory stand that can be used for transmission quality assessment on a regional and national level.

Go to article

Authors and Affiliations

Przemysław Gilski
Jacek Stefański

This page uses 'cookies'. Learn more