Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 12
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition is an important part of human-machine interaction studies. The acoustic analysis method is used for emotion recognition through speech. An emotion does not cause changes on all acoustic parameters. Rather, the acoustic parameters affected by emotion vary depending on the emotion type. In this context, the emotion-based variability of acoustic parameters is still a current field of study. The purpose of this study is to investigate the acoustic parameters that fear affects and the extent of their influence. For this purpose, various acoustic parameters were obtained from speech records containing fear and neutral emotions. The change according to the emotional states of these parameters was analyzed using statistical methods, and the parameters and the degree of influence that the fear emotion affected were determined. According to the results obtained, the majority of acoustic parameters that fear affects vary according to the used data. However, it has been demonstrated that formant frequencies, mel-frequency cepstral coefficients, and jitter parameters can define the fear emotion independent of the data used.
Go to article

Authors and Affiliations

Turgut Özseven
Download PDF Download RIS Download Bibtex

Abstract

Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.
Go to article

Authors and Affiliations

Chengwei Huang
Guoming Chen
Hua Yu
Yongqiang Bao
Li Zhao
Download PDF Download RIS Download Bibtex

Abstract

This paper concerns measurement procedures on an emotion monitoring stand designed for tracking human emotions in the Human-Computer Interaction with physiological characteristics. The paper addresses the key problem of physiological measurements being disturbed by a motion typical for human-computer interaction such as keyboard typing or mouse movements. An original experiment is described, that aimed at practical evaluation of measurement procedures performed at the emotion monitoring stand constructed at GUT. Different locations of sensors were considered and evaluated for suitability and measurement precision in the Human- Computer Interaction monitoring. Alternative locations (ear lobes and forearms) for skin conductance, blood volume pulse and temperature sensors were proposed and verified. Alternative locations proved correlation with traditional locations as well as lower sensitiveness to movements like typing or mouse moving, therefore they can make a better solution for monitoring the Human-Computer Interaction.

Go to article

Authors and Affiliations

Agnieszka Landowska
Download PDF Download RIS Download Bibtex

Abstract

Affective computing studies and develops systems capable of detecting humans affects. The search for universal well-performing features for speech-based emotion recognition is ongoing. In this paper, a small set of features with support vector machines as the classifier is evaluated on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. It is shown that a set of 87 features can offer results on-par with state-of-the-art, yielding 80.21, 88.6, 75.42 and 93.41% average emotion recognition rate, respectively. In addition, an experiment is conducted to explore the significance of gender in emotion recognition using random forests. Two models, trained on the first and second database, respectively, and four speakers were used to determine the effects. It is seen that the feature set used in this work performs well for both male and female speakers, yielding approximately 27% average emotion recognition in both models. In addition, the emotions for female speakers were recognized 18% of the time in the first model and 29% in the second. A similar effect is seen with male speakers: the first model yields 36%, the second 28% a verage emotion recognition rate. This illustrates the relationship between the constitution of training data and emotion recognition accuracy.

Go to article

Authors and Affiliations

J. Hook
F. Noroozi
O. Toygar
G. Anbarjafari
Download PDF Download RIS Download Bibtex

Abstract

Covid-19 pandemic is severely impacting worldwide. A line of research warned that facial occlusion may impair facial emotion recognition, whilst prior research highlighted the role of Trait Emotional Intelligence in the recognition of non-verbal social stimuli. The sample consisted of 102 emerging adults, aged 18-24 (M = 20.76; SD = 2.10; 84% females, 16% males) and were asked to recognize four different emotions (happiness, fear, anger, and sadness) in fully visible faces and in faces wearing a mask and to complete a questionnaire assessing Trait Emotional Intelligence. Results highlighted that individuals displayed lower accuracy in detecting happiness and fear in covered faces, while also being more inaccurate in reporting correct answers. The results show that subjects provide more correct answers when the photos show people without a mask than when they are wearing it. In addition, participants give more wrong answers when there are subjects wearing masks in the photos than when they are not wearing it. In addition, participants provide more correct answers regarding happiness and sadness when in the photos the subjects are not wearing the mask, compared to when they are wearing it. Implications are discussed.
Go to article

Authors and Affiliations

Marco Cannavò
1
ORCID: ORCID
Nadia Barberis
1
ORCID: ORCID
Rosalba Larcan
2
ORCID: ORCID
Francesca Cuzzocrea
1
ORCID: ORCID

  1. Università degli studi Magna Graecia Catanzaro, Italy
  2. Università degli studi di Messina, Messina, Italy
Download PDF Download RIS Download Bibtex

Abstract

Today’s human-computer interaction systems have a broad variety of applications in which automatic human emotion recognition is of great interest. Literature contains many different, more or less successful forms of these systems. This work emerged as an attempt to clarify which speech features are the most informative, which classification structure is the most convenient for this type of tasks, and the degree to which the results are influenced by database size, quality and cultural characteristic of a language. The research is presented as the case study on Slavic languages.

Go to article

Authors and Affiliations

Željko Nedeljković
Milana Milošević
Željko Đurović
Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.
Go to article

Authors and Affiliations

Jingjie Yan
Xiaolan Wang
Weiyi Gu
LiLi Ma
Download PDF Download RIS Download Bibtex

Abstract

The study investigates the use of speech signal to recognise speakers’ emotional states. The introduction includes the definition and categorization of emotions, including facial expressions, speech and physiological signals. For the purpose of this work, a proprietary resource of emotionally-marked speech recordings was created. The collected recordings come from the media, including live journalistic broadcasts, which show spontaneous emotional reactions to real-time stimuli. For the purpose of signal speech analysis, a specific script was written in Python. Its algorithm includes the parameterization of speech recordings and determination of features correlated with emotional content in speech. After the parametrization process, data clustering was performed to allows for the grouping of feature vectors for speakers into greater collections which imitate specific emotional states. Using the t-Student test for dependent samples, some descriptors were distinguished, which identified significant differences in the values of features between emotional states. Some potential applications for this research were proposed, as well as other development directions for future studies of the topic.
Go to article

Authors and Affiliations

Zuzanna Piątek
1
Maciej Kłaczyński
1

  1. AGH University of Science and Technology, Faculty of Mechanical Engineering and Robotics, Department of Mechanics and Vibroacoustics, Cracow, Poland
Download PDF Download RIS Download Bibtex

Abstract

The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.

Go to article

Authors and Affiliations

Teodora Dimitrova-Grekow
Aneta Klis
Magdalena Igras-Cybulska
ORCID: ORCID
Download PDF Download RIS Download Bibtex

Abstract

Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Go to article

Bibliography

  1.  M. Gupta, et al., “Emotion recognition from speech using wavelet packet transform and prosodic features”, J. Intell. Fuzzy Syst. 35, 1541–1553 (2018).
  2.  M. El Ayadi, et al., “Survey on speech emotion recognition: Features, classification schemes, and databases”, Pattern Recognit. 44, 572–587 (2011).
  3.  P. Tzirakis, et al., “End-to-end speech emotion recognition using deep neural networks”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 5089‒5093, doi: 10.1109/ICASSP.2018.8462677.
  4.  J.M Liu, et al., “Learning Salient Features for Speech Emotion Recognition Using CNN”, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), Beijing, China, 2018, pp. 1‒5, doi: 10.1109/ACIIAsia.2018.8470393.
  5.  J. Kim, et al., “Learning spectro-temporal features with 3D CNNs for speech emotion recognition”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, USA, 2017, pp. 383‒388, doi: 10.1109/ACII.2017.8273628.
  6.  M.Y Chen, X.J He, et al., “3-D Convolutional Recurrent Neural Networks with Attention Model for Speech Emotion Recognition”, IEEE Signal Process Lett. 25(10), 1440‒1444 (2018), doi: 10.1109/LSP.2018.2860246.
  7.  V.N. Degaonkar and S.D. Apte, “Emotion modeling from speech signal based on wavelet packet transform”, Int. J. Speech Technol. 16, 1‒5 (2013).
  8.  T. Feng and S. Yang, “Speech Emotion Recognition Based on LSTM and Mel Scale Wavelet Packet Decomposition”, Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2018), New York, USA, 2018, art. 38.
  9.  P. Yenigalla, A. Kumar, et. al”, Speech Emotion Recognition Using Spectrogram & Phoneme Embedding Promod”, Proc. Interspeech 2018, 2018, pp. 3688‒3692, doi: 10.21437/Interspeech.2018-1811.
  10.  J. Kim, K.P. Truong, G. Englebienne, and V. Evers, “Learning spectro-temporal features with 3D CNNs for speech emotion recognition”, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, USA, 2017, pp. 383‒388, doi: 10.1109/ACII.2017.8273628.
  11.  S. Jing, X. Mao, and L. Chen, “Prominence features: Effective emotional features for speech emotion recognition”, Digital Signal Process. 72, 216‒231 (2018).
  12.  L. Chen, X. Mao, P. Wei, and A. Compare, “Speech emotional features extraction based on electroglottograph”, Neural Comput. 25(12), 3294–3317 (2013).
  13.  J. Hook, et al., “Automatic speech based emotion recognition using paralinguistics features”, Bull. Pol. Ac.: Tech. 67(3), 479‒488, 2019.
  14.  A. Mencattini, E. Martinelli, G. Costantini, M. Todisco, B. Basile, M. Bozzali, and C. Di Natale, “Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure”, Knowl.-Based Syst. 63, 68–81 (2014).
  15.  H. Mori, T. Satake, M. Nakamura, and H. Kasuya, “Constructing a spoken dialogue corpus for studying paralinguistic information in expressive conversation and analyzing its statistical/acoustic characteristics”, Speech Commun. 53(1), 36–50 (2011).
  16.  B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan, “Paralinguistics in speech and language—state- of-the-art and the challenge”, Comput. Speech Lang. 27(1), 4–39 (2013).
  17.  S. Mariooryad and C. Busso, “Compensating for speaker or lexical variabilities in speech for emotion recognition”, Speech Commun. 57, 1–12 (2014).
  18.  G.Trigeorgis et.al, “Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network”, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 5200‒5204, doi: 10.1109/ ICASSP.2016.7472669.
  19.  Y. Xie et.al, “Attention-based dense LSTM for speech emotion recognition”, IEICE Trans. Inf. Syst. E102.D, 1426‒1429 (2019).
  20.  F. Tao and G.Liu, “Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 2018, pp. 2906‒2910, doi: 10.1109/ ICASSP.2018.8461750.
  21.  Y.M. Huang and W. Ao, “Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition”, Wireless Personal Commun. 95, 2223–2238 (2017).
  22.  Firoz Shah A. and Babu Anto P., “Wavelet Packets for Speech Emotion Recognition”, 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), Chennai, 2017, pp. 479‒481, doi: 10.1109/ AEEICB.2017.7972358.
  23.  K.Wang, N. An, and L. Li, “Speech Emotion Recognition Based on Wavelet Packet Coefficient Model”, The 9th International Symposium on Chinese Spoken Language Processing, Singapore, China, 2014, pp. 478‒482, doi: 10.1109/ISCSLP.2014.6936710.
  24.  S. Sekkate, et al., “An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition”, Computers 8, 91 (2019).
  25.  Varsha N. Degaonkar and Shaila D. Apte, “Emotion Modeling from Speech Signal based on Wavelet Packet Transform”, Int. J. Speech Technol. 16, 1–5 (2013).
  26.  F. Eyben, et al., “Opensmile: the munich versatile and fast open-source audio feature extractor”, MM ’10: Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459‒1462.
  27.  Ch.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011,” Artif. Intell. 43(2), 155–177 (2015).
  28.  H. Meng, T. Yan, F. Yuan, and H. Wei, “Speech Emotion Recognition From 3D Log-Mel SpectrogramsWith Deep Learning Network”, IEEE Access 7, 125868‒125881 (2019).
  29.  Keren, Gil and B. Schuller. “Convolutional RNN: An enhanced model for extracting features from sequential data,” International Joint Conference on Neural Networks, 2016, pp. 3412‒3419.
  30.  C.W. Huang and S.S. Narayanan, “Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition”, IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, 2017, pp. 583‒588, doi: 10.1109/ ICME.2017.8019296.
  31.  S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic Speech Emotion Recognition using Recurrent Neural Networks with Local Attention”, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, 2017, pp. 2227- 2231, doi: 10.1109/ICASSP.2017.7952552.
  32.  Ashish Vaswani, et al., “Attention Is All You Need”, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, USA, 2017.
  33.  X.J Wang, et al., “Dynamic Attention Deep Model for Article Recommendation by Learning Human Editors’ Demonstration”, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017.
  34.  C. Busso, et al., “IEMOCAP: interactive emotional dyadic motion capture database,” Language Resources & Evaluation 42(4), 335 (2008).
  35.  F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, and B.Weiss, “A database of German emotional speech,” INTERSPEECH 2005 – Eurospeech, Lisbon, Portugal, 2005, pp. 1517‒1520.
  36.  D. Kingma and J. Ba, “International Conference on Learning Representations (ICLR)”, ICLR, San Diego, USA, 2015.
  37.  F. Vuckovic, G. Lauc, and Y. Aulchenko. “Normalization and batch correction methods for high-throughput glycomics”, Joint Meeting of the Society-For-Glycobiology 2016, pp. 1160‒1161.
Go to article

Authors and Affiliations

Hao Meng
1
Tianhao Yan
1
Hongwei Wei
1
Xun Ji
2

  1. Key laboratory of Intelligent Technology and Application of Marine Equipment (Harbin Engineering University), Ministry of Education, Harbin, 150001, China
  2. College of Marine Electrical Engineering, Dalian Maritime University, Dalian, 116026, China
Download PDF Download RIS Download Bibtex

Abstract

In the domain of affective computing different emotional expressions play an important role. To convey the emotional state of human emotions, facial expressions or visual cues are used as an important and primary cue. The facial expressions convey humans affective state more convincingly than any other cues. With the advancement in the deep learning techniques, the convolutional neural network (CNN) can be used to automatically extract the features from the visual cues; however variable sized and biased datasets are a vital challenge to be dealt with as far as implementation of deep models is concerned. Also, the dataset used for training the model plays a significant role in the retrieved results. In this paper, we have proposed a multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on the visual cues. We have used a CNN and pre-trained ResNet-50 model for the transfer learning. VGGFace model’s weights are used to initialize weights of ResNet50 for fine-tuning the model. The proposed system shows significant improvement in test accuracy in affective state recognition compared to the singleton CNN model developed from scratch or transfer learned model. The proposed methodology is validated on The Karolinska Directed Emotional Faces (KDEF) dataset with 77.85% accuracy. The obtained results are promising compared to the existing state of the art methods.
Go to article

Bibliography

  1.  W. Łosiak and J. Siedlecka, “Recognition of facial expressions of emotions in schizophrenia,” Pol. Psychol. Bull., vol. 44, no. 2, pp. 232– 238, 2013, doi: 10.2478/ppb-2013-0026.
  2.  I.M. Revina and W.R.S. Emmanuel, “A Survey on human face expression recognition techniques,” J. King Saud Univ. Comput. Inf. Sci., vol. 33, no. 6, pp. 619–628, 2021, doi: 10.1016/j.jksuci.2018.09.002.
  3.  I.J. Goodfellow et al., “Challenges in representation learning: A report on three machine learning contests,” Neural Networks, vol. 64, pp. 59‒63, 2015, doi: 10.1016/j.neunet.2014.09.005.
  4.  M. Mohammadpour, H. Khaliliardali, S.M.R. Hashemi, and M.M. AlyanNezhadi. “Facial emotion recognition using deep convolution- al networks,” in Proc. IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Tehran, 2017, pp. 0017–0021.
  5.  D.V. Sang, N. Van Dat, and D.P. Thuan, “Facial expression recognition using deep convolutional neural networks,” in Proc. 9th Interna- tional Conference on Knowledge and Systems Engineering (KSE), Hue, 2017, pp. 130‒135.
  6.  C. Pramerdorfer and M. Kampel, “Facial expression recognition using convolutional neural networks: state of the art,” ArXiv, abs/1612.02903.
  7.  J. Yan et al., “Multi-cue fusion for emotion recognition in the wild,” Neurocomputing, vol. 309, pp.  27–35, 2018, doi: 10.1016/j.neu- com.2018.03.068.
  8.  T.A. Rashid, “Convolutional neural networks based method for improving facial expression recognition,” in Advances in Intelligent Systems and Computing, Intelligent Systems Technologies, and Applications 2016. ISTA 2016, J. C. Rodriguez, S. Mitra, S. Thampi, E. S. El-Alfy (Eds)., vol. 530, 2016, Springer, Cham.
  9.  A. Ruiz-Garcia, M. Elshaw, A. Altahhan, and V. Palade, “Deep learning for emotion recognition in faces,” in Artificial Neural Net- works and Machine Learning – ICANN 2016, A.E.P. Villa, P. Masulli, and A.J.P. Rivero (Eds.), vol. 9887, 2016, Switzerland: Springer Verlag, pp. 38‒46, doi: 10.1007/978-3-319-44781-0_5.
  10.  M. Shamim Hossain and Ghulam Muhammad, “Emotion recognition using deep learning approach from audio-visual emotional big data,” Information Fusion, vol. 49, pp. 69‒78, 2019, doi: 10.1016/j.inffus.2018.09.008.
  11.  A.S. Vyas, H.B. Prajapati, and V.K. Dabhi, “Survey on face expression recognition using CNN,” in Proc. 5th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2019, pp. 102‒106.
  12.  M.M. Taghi Zadeh, M. Imani, and B. Majid, “Fast facial emotion recognition using convolutional neural networks and Gabor filters,” in Proc. 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), Tehran, Iran, 2019, pp. 577–581.
  13.  A. Renda, M. Barsacchi, A. Bechini, and F. Marcelloni, “Comparing ensemble strategies for deep learning: An application to facial ex- pression recognition,” Expert Syst. Appl., vol. 136, pp. 1‒11, 2019, doi: 10.1016/j.eswa.2019.06.025.
  14.  H. Ding, S. Zhou, and R. Chellappa, “FaceNet2ExpNet: Regularizing a deep face recognition net for expression recognition,” in Proc. 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), Washington, USA, 2017, pp. 118‒126. doi: 10.1109/FG.2017.23.
  15.  J. Li et al., “Facial Expression Recognition by Transfer Learning for Small Datasets,” in Security with Intelligent Computing and Big-data Services. SICBS 2018. Advances in Intelligent Systems and Computing, C. N. Yang, S. L. Peng, L. Jain, (Eds.), vol. 895, Springer, Cham, 2018.
  16.  Y. Wang, C. Wang, L. Luo, and Z. Zhou, “Image Classification Based on transfer Learning of Convolutional neural network,” in Proc. Chinese Control Conference (CCC), Guangzhou, China, 2019, pp.  7506‒7510.
  17.  I. Lee, H. Jung, C. H. Ahn, J. Seo, J. Kim, and O. Kwon, “Real-time personalized facial expression recognition system based on deep learning,” in Proc. 2016 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, USA, 2016, pp. 267‒268.
  18.  J. Chen, X. Liu, P. Tu, and A. Aragones, “Person-specific expression recognition with transfer learning,” in Proc 19th IEEE International Conference on Image Processing, Orlando, USA, 2012, pp. 2621‒2624.
  19.  Y. Fan, J.C.K. Lam, and V.O.K. Li, “Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition”, arXiv, 2018, cs. CV, https://arxiv.org/abs/1807.10575v1.
  20.  K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016, pp.  770‒778.
  21.  J. Chmielińska and J. Jakubowski, “Detection of driver fatigue symptoms using transfer learning,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 66, pp. 869‒874, 2018, doi: 10.24425/bpas.2018.125934.
  22.  E. Lukasik et al., “Recognition of handwritten Latin characters with diacritics using CNN,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 69, no. 1, 2021, article number: e136210, doi: 10.24425/bpasts.2020.136210.
  23.  H. Zhang, A. Jolfaei, and M. Alazab, “A Face Emotion Recognition Method Using Convolutional Neural Network and Image Edge Computing,” IEEE Access, vol. 7, pp. 159081‒159089, 2019, doi: 10.1109/ACCESS.2019.2949741.
  24.  HackerEarth, “Transfer Learning Introduction Tutorials and Notes: Machine Learning,” [Online]. Available: https://www.hackerearth. com/practice/machine-learning/transfer-learning/transfer-learning-intro/tutorial/
  25.  S. Minaee, M. Minaei, and A. Abdolrashidi, “Deep-emotion: Facial expression recognition using attentional convolutional network,” Sensors, vol. 21, no. 9, p. 3046, 2021, doi: 10.3390/s21093046.
  26.  M.J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, “Coding facial expressions with Gabor wavelets,” in Proc. 3rd IEEE International Conference on Automatic Face and Gesture Recognition, 1998, pp. 200‒205, doi: 10.1109/AFGR.1998.670949.
  27.  P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, San Francisco, USA, 2010, pp.  94‒101, doi: 10.1109/CVPRW.2010.5543262.
  28.  M.F.H. Siddiqui and A.Y. Javaid, “A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images,” Multimodal Technol. Interact., vol. 4, no. 3, p. 46, 2020, doi: 10.3390/mti4030046.
  29.  M.S. Zia, M. Hussain, and M.A.A Jaffar, “Novel spontaneous facial expression recognition using dynamically weighted majority voting based ensemble classifier,” Multimed. Tools Appl., vol. 77, pp. 25537–25567, 2018.
  30.  D. Lundqvist, A. Flykt, and A. Öhman, “The Karolinska Directed Emotional Faces – KDEF,” CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, 1998.
Go to article

Authors and Affiliations

Nagesh Jadhav
1
Rekha Sugandhi
1

  1. MIT ADT University, Pune, Maharashtra, 412201, India
Download PDF Download RIS Download Bibtex

Abstract

Due to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which music excerpts with similar mood are organized next to each other on the two-dimensional display.
Go to article

Authors and Affiliations

Magdalena Plewa
Bożena Kostek

This page uses 'cookies'. Learn more