Szukana fraza: [Słowa kluczowe = "language modelling"]

System for Automatic Transcription of Sessions of the Polish Senate

Krzysztof Marasek Danijel Koržinek Łukasz Brocki

Archives of Acoustics | 2014 | vol. 39 | No 4 | 501-509 | DOI: 10.2478/aoa-2014-0054

Słowa kluczowe large vocabulary speech recognition language modelling transcription transliteration subtitles

Abstrakt

This paper describes research behind a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of Senate speeches for the Polish language. The system utilizes severalcomponents: a phonetic transcription system, language and acoustic model training systems, a Voice Activity Detector (VAD), a LVCSR decoder, and a subtitle generator and presentation system. Some of the modules relied on already available tools and some had to be made from the beginning but the authors ensured that they used the most advanced techniques they had available at the time. Finally, several experiments were performed to compare the performance of both more modern and more conventional technologies.

Przejdź do artykułu

Autorzy i Afiliacje

Krzysztof Marasek

Danijel Koržinek

Łukasz Brocki

Rapid Text Entry Using Mobile and Auxiliary Devices for People with Speech Disorders Communication

Iurii V. Krak Olexander V. Barmak Ruslan O. Bahrii Waldemar Wójcik Saule Rakhmetullina Saltanat Amirgaliyeva

International Journal of Electronics and Telecommunications | 2020 | vol. 66 | No 2 | 273-279 | DOI: 10.24425/ijet.2020.131874

Słowa kluczowe information technology alternative communication ambiguous virtual keyboard text prediction statistical language model N-gram

Pobierz PDF Pobierz RIS Pobierz Bibtex

Abstrakt

The article considers information technology for the realization of human communication using residual human capabilities, obtained by organizing text entry using mobile and auxiliary devices. The components of the proposed technology are described in detail: the method for entering text information to realize the possibility of introducing a limited number of controls and the method of predicting words that are most often encountered after words already entered in the sentence. A generalized representation of the process of entering text is described with the aid of an ambiguous virtual keyboard and the representation of control signals for the selection of control elements. The approaches to finding the optimal distribution of the set of alphabet characters for different numbers of control signals are given. The method of word prediction is generalized and improved, the statistical language model with "back-off" is used, and the approach to the formation of the training corpus of the spoken Ukrainian language is proposed.

Przejdź do artykułu

Autorzy i Afiliacje

Iurii V. Krak

Olexander V. Barmak

Ruslan O. Bahrii

Waldemar Wójcik

Saule Rakhmetullina

Saltanat Amirgaliyeva

Implementation of language models within an infrastructure designed for Natural Language Processing

Bartosz Walkowiak Tomasz Walkowiak

International Journal of Electronics and Telecommunications | 2024 | vol. 70 | No 1 | 153–159 | DOI: 10.24425/ijet.2024.149525

Słowa kluczowe language model deployment quantization Llama-2 E5 model ONNX llama.cpp CLARIN-PL

Pobierz PDF Pobierz RIS Pobierz Bibtex

Abstrakt

This paper explores cost-effective alternatives for resource-constrained environments in the context of language models by investigating methods such as quantization and CPUbased model implementations. The study addresses the computational efficiency of language models during inference and the development of infrastructure for text document processing. The paper discusses related technologies, the CLARIN-PL infrastructure architecture, and implementations of small and large language models. The emphasis is on model formats, data precision, and runtime environments (GPU and CPU). It identifies optimal solutions through extensive experimentation. In addition, the paper advocates for a more comprehensive performance evaluation approach. Instead of reporting only average token throughput, it suggests considering the curve’s shape, which can vary from constant to monotonically increasing or decreasing functions. Evaluating token throughput at various curve points, especially for different output token counts, provides a more informative perspective.

Przejdź do artykułu

Autorzy i Afiliacje

Bartosz Walkowiak

1

Tomasz Walkowiak

1

Faculty of Information and Communication Technology, Wroclaw University of Science and Technology, Wroclaw, Poland

Procedurally generated AI compound media for expanding audial creations, broadening immersion and perception experience

Grzegorz Samson

International Journal of Electronics and Telecommunications | 2024 | vol. 70 | No 2 | 341-348 | DOI: 10.24425/ijet.2024.149550

Słowa kluczowe procedural generation generative media multimodal art audiovisual perception text-to-image transformers large language models latent diffusion models

Pobierz PDF Pobierz RIS Pobierz Bibtex

Abstrakt

Recently, the world has been gaining vastly increasing access to more and more advanced artificial intelligence tools. This phenomenon does not bypass the world of sound and visual art, and both of these worlds can benefit in ways yet unexplored, drawing them closer to one another. Recent breakthroughs open possibilities to utilize AI driven tools for creating generative art and using it as a compound of other multimedia. The aim of this paper is to present an original concept of using AI to create a visual compound material to existing audio source. This is a way of broadening accessibility thus appealing to different human senses using source media, expanding its initial form. This research utilizes a novel method of enhancing fundamental material consisting of text audio or text source (script) and sound layer (audio play) by adding an extra layer of multimedia experience – a visual one, generated procedurally. A set of images generated by AI tools, creating a story-telling animation as a new way to immerse into the experience of sound perception and focus on the initial audial material. The main idea of the paper consists of creating a pipeline, form of a blueprint for the process of procedural image generation based on the source context (audial or textual) transformed into text prompts and providing tools to automate it by programming a set of code instructions. This process allows creation of coherent and cohesive (to a certain extent) visual cues accompanying audial experience levering it to multimodal piece of art. Using nowadays technologies, creators can enhance audial forms procedurally, providing them with visual context. The paper refers to current possibilities, use cases, limitations and biases giving presented tools and solutions.

Przejdź do artykułu

Autorzy i Afiliacje

Grzegorz Samson

1

Feliks Nowowiejski Academy of Music in Bydgoszcz, Poland

Wyniki wyszukiwania

Filtruj wyniki

Wyniki wyszukiwania

System for Automatic Transcription of Sessions of the Polish Senate

Abstrakt

Autorzy i Afiliacje

Rapid Text Entry Using Mobile and Auxiliary Devices for People with Speech Disorders Communication

Abstrakt

Autorzy i Afiliacje

Implementation of language models within an infrastructure designed for Natural Language Processing

Abstrakt

Autorzy i Afiliacje

Procedurally generated AI compound media for expanding audial creations, broadening immersion and perception experience

Abstrakt

Autorzy i Afiliacje