Search for: [Authors = "Pondel\-Sycz, Karolina"]

Search results

Number of results: 2

items per page: 25 50 75

Sort by:

of 1

A system dedicated to Polish automatic speech recognition – overview of solutions

Bulletin of the Polish Academy of Sciences Technical Sciences | 2024 | 72 | 4 | e149818 | DOI: 10.24425/bpasts.2024.149818

Keywords automatic speech recognition deep neural networks transformer conformer

Download PDF Download RIS Download Bibtex

Abstract

The paper presents the analysis of modern Artificial Intelligence algorithms for the automated system supporting human beings during their conversation in Polish language. Their task is to perform Automatic Speech Recognition (ASR) and process it further, for instance fill the computer-based form or perform the Natural Language Processing (NLP) to assign the conversation to one of predefined categories. The State-of-the-Art review is required to select the optimal set of tools to process speech in the difficult conditions, which degrade accuracy of ASR. The paper presents the top-level architecture of the system applicable for the task. Characteristics of Polish language are discussed. Next, existing ASR solutions and architectures with the End-To-End (E2E) deep neural network (DNN) based ASR models are presented in detail. Differences between Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN) and Transformers in the context of ASR technology are also discussed.

Go to article

Authors and Affiliations

Karolina Pondel-Sycz

Piotr Bilski

e-mail:

ORCID:

The Faculty of Electronics and Information Technology on Warsaw University of Technology, Nowowiejska 15/19 Av., 00-665 Warsaw, Poland

End-To-End deep neural models for Automatic Speech Recognition for Polish Language

Karolina Pondel-Sycz Agnieszka Paula Pietrzak Julia Szymla

International Journal of Electronics and Telecommunications | 2024 | vol. 70 | No 2 | 315-321 | DOI: 10.24425/ijet.2024.149547

Keywords Automatic Speech Recognition Deep Neural Networks End-To-End Polish Language

Download PDF Download RIS Download Bibtex

Abstract

This article concerns research on deep learning models (DNN) used for automatic speech recognition (ASR). In such systems, recognition is based on Mel Frequency Cepstral Coefficients (MFCC) acoustic features and spectrograms. The latest ASR technologies are based on convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformers. The article presents an analysis of modern artificial intelligence algorithms adapted for automatic recognition of the Polish language. The differences between conventional architectures and ASR DNN End-To-End (E2E) models are discussed. Preliminary tests of five selected models (QuartzNet, FastConformer, Wav2Vec 2.0 XLSR, Whisper and ESPnet Model Zoo) on Mozilla Common Voice, Multilingual LibriSpeech and VoxPopuli databases are demonstrated. Tests were conducted for clean audio signal, signal with bandwidth limitation and degraded. The tested models were evaluated on the basis of Word Error Rate (WER).

Go to article

Authors and Affiliations

Karolina Pondel-Sycz

Agnieszka Paula Pietrzak

Julia Szymla

Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland

Search results

Filters

Search results

A system dedicated to Polish automatic speech recognition – overview of solutions

Abstract

Authors and Affiliations

End-To-End deep neural models for Automatic Speech Recognition for Polish Language

Abstract

Authors and Affiliations