This paper proposes a comprehensive study on machine listening for localisation of snore sound excitation. Here we investigate the effects of varied frame sizes, and overlap of the analysed audio chunk for extracting low-level descriptors. In addition, we explore the performance of each kind of feature when it is fed into varied classifier models, including support vector machines, k-nearest neighbours, linear discriminant analysis, random forests, extreme learning machines, kernel-based extreme learning machines, multilayer perceptrons, and deep neural networks. Experimental results demonstrate that, wavelet packet transform energy can outperform most other features. A deep neural network trained with subband energy ratios reaches the highest performance achieving an unweighted average recall of 72.8% from four types for snoring.
This paper presents an alternative approach to the sequential data classification, based on traditional machine learning algorithms (neural networks, principal component analysis, multivariate Gaussian anomaly detector) and finding the shortest path in a directed acyclic graph, using A* algorithm with a regression-based heuristic. Palm gestures were used as an example of the sequential data and a quadrocopter was the controlled object. The study includes creation of a conceptual model and practical construction of a system using the GPU to ensure the realtime operation. The results present the classification accuracy of chosen gestures and comparison of the computation time between the CPU- and GPU-based solutions.
Affective computing studies and develops systems capable of detecting humans affects. The search for universal well-performing features for speech-based emotion recognition is ongoing. In this paper, a small set of features with support vector machines as the classifier is evaluated on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. It is shown that a set of 87 features can offer results on-par with state-of-the-art, yielding 80.21, 88.6, 75.42 and 93.41% average emotion recognition rate, respectively. In addition, an experiment is conducted to explore the significance of gender in emotion recognition using random forests. Two models, trained on the first and second database, respectively, and four speakers were used to determine the effects. It is seen that the feature set used in this work performs well for both male and female speakers, yielding approximately 27% average emotion recognition in both models. In addition, the emotions for female speakers were recognized 18% of the time in the first model and 29% in the second. A similar effect is seen with male speakers: the first model yields 36%, the second 28% a verage emotion recognition rate. This illustrates the relationship between the constitution of training data and emotion recognition accuracy.
A variety of algorithms allows gesture recognition in video sequences. Alleviating the need for interpreters is of interest to hearing impaired people, since it allows a great degree of self-sufficiency in communicating their intent to the non-sign language speakers without the need for interpreters. State-of-theart in currently used algorithms in this domain is capable of either real-time recognition of sign language in low resolution videos or non-real-time recognition in high-resolution videos. This paper proposes a novel approach to real-time recognition of fingerspelling alphabet letters of American Sign Language (ASL) in ultra-high-resolution (UHD) video sequences. The proposed approach is based on adaptive Laplacian of Gaussian (LoG) filtering with local extrema detection using Features from Accelerated Segment Test (FAST) algorithm classified by a Convolutional Neural Network (CNN). The recognition rate of our algorithm was verified on real-life data.
The application of churn prevention represents an important step for mobile communication
companies aiming at increasing customer loyalty. In a machine learning perspective,
Customer Value Management departments require automated methods and processes to
create marketing campaigns able to identify the most appropriate churn prevention approach.
Moving towards a big data-driven environment, a deeper understanding of data
provided by churn processes and client operations is needed. In this context, a procedure
aiming at reducing the number of churners by planning a customized marketing campaign
is deployed through a data-driven approach. Decision Tree methodology is applied to drow
up a list of clients with churn propensity: in this way, customer analysis is detailed, as well
as the development of a marketing campaign, integrating the individual churn model with
viral churn perspective. The first step of the proposed procedure requires the evaluation of
churn probability for each customer, based on the influence of his social links. Then, the
customer profiling is performed considering (a) individual variables, (b) variables describing
customer-company interactions, (c) external variables. The main contribution of this work
is the development of a versatile procedure for viral churn prevention, applying Decision
Tree techniques in the telecommunication sector, and integrating a direct campaign from
the Customer Value Management marketing department to each customer with significant
churn risk. A case study of a mobile communication company is also presented to explain
the proposed procedure, as well as to analyze its real performance and results.
This is a modest endeavour written from an engineering perspective by a nonphilosopher to set things straight if somewhat roughly: What does artificial intelligence boil down to? What are its merits and why some dangers may stem from its development in this time of confusion when, to quote Rémi Brague: “From the point of view of technology, man appears as outdated, or at least superfluous”?
The availability of cheap and widely applicable person identification techniques is essential due to a wide-spread usage of online services. The dynamics of typing is characteristic to particular users, and users are hardly able to mimic the dynamics of typing of others. State-of-the-art solutions for person identification from the dynamics of typing are based on machine learning. The presence of hubs, i.e., few instances that appear as nearest neighbours of surprisingly many other instances, have been observed in various domains recently and hubness-aware machine learning approaches have been shown to work well in those domains. However, hubness has not been studied in the context of person identification yet, and hubnessaware techniques have not been applied to this task. In this paper, we examine hubness in typing data and propose to use ECkNN, a recent hubness-aware regression technique together with dynamic time warping for person identification. We collected time-series data describing the dynamics of typing and used it to evaluate our approach. Experimental results show that hubness-aware techniques outperform state-of-the-art time-series classifiers.
The goal of this research is to find a set of acoustic parameters that are related to differences between Polish and Lithuanian language consonants. In order to identify these differences, an acoustic analysis is performed, and the phoneme sounds are described as the vectors of acoustic parameters. Parameters known from the speech domain as well as those from the music information retrieval area are employed. These parameters are time- and frequency-domain descriptors. English language as an auxiliary language is used in the experiments. In the first part of the experiments, an analysis of Lithuanian and Polish language samples is carried out, features are extracted, and the most discriminating ones are determined. In the second part of the experiments, automatic classification of Lithuanian/English, Polish/English, and Lithuanian/Polish phonemes is performed.