Search for: [Keywords = "late fusion"]

Search results

Number of results: 2

items per page: 25 50 75

Sort by:

of 1

Multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on visual cues

Bulletin of the Polish Academy of Sciences Technical Sciences | 2021 | 69 | 6 | e138819 | DOI: 10.24425/bpasts.2021.138819

Keywords deep learning convolution neural network emotion recognition transfer learning late fusion

Download PDF Download RIS Download Bibtex

Abstract

In the domain of affective computing different emotional expressions play an important role. To convey the emotional state of human emotions, facial expressions or visual cues are used as an important and primary cue. The facial expressions convey humans affective state more convincingly than any other cues. With the advancement in the deep learning techniques, the convolutional neural network (CNN) can be used to automatically extract the features from the visual cues; however variable sized and biased datasets are a vital challenge to be dealt with as far as implementation of deep models is concerned. Also, the dataset used for training the model plays a significant role in the retrieved results. In this paper, we have proposed a multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on the visual cues. We have used a CNN and pre-trained ResNet-50 model for the transfer learning. VGGFace model’s weights are used to initialize weights of ResNet50 for fine-tuning the model. The proposed system shows significant improvement in test accuracy in affective state recognition compared to the singleton CNN model developed from scratch or transfer learned model. The proposed methodology is validated on The Karolinska Directed Emotional Faces (KDEF) dataset with 77.85% accuracy. The obtained results are promising compared to the existing state of the art methods.

Go to article

Bibliography

W. Łosiak and J. Siedlecka, “Recognition of facial expressions of emotions in schizophrenia,” Pol. Psychol. Bull., vol. 44, no. 2, pp. 232– 238, 2013, doi: 10.2478/ppb-2013-0026.
I.M. Revina and W.R.S. Emmanuel, “A Survey on human face expression recognition techniques,” J. King Saud Univ. Comput. Inf. Sci., vol. 33, no. 6, pp. 619–628, 2021, doi: 10.1016/j.jksuci.2018.09.002.
I.J. Goodfellow et al., “Challenges in representation learning: A report on three machine learning contests,” Neural Networks, vol. 64, pp. 59‒63, 2015, doi: 10.1016/j.neunet.2014.09.005.
M. Mohammadpour, H. Khaliliardali, S.M.R. Hashemi, and M.M. AlyanNezhadi. “Facial emotion recognition using deep convolution- al networks,” in Proc. IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Tehran, 2017, pp. 0017–0021.
D.V. Sang, N. Van Dat, and D.P. Thuan, “Facial expression recognition using deep convolutional neural networks,” in Proc. 9th Interna- tional Conference on Knowledge and Systems Engineering (KSE), Hue, 2017, pp. 130‒135.
C. Pramerdorfer and M. Kampel, “Facial expression recognition using convolutional neural networks: state of the art,” ArXiv, abs/1612.02903.
J. Yan et al., “Multi-cue fusion for emotion recognition in the wild,” Neurocomputing, vol. 309, pp. 27–35, 2018, doi: 10.1016/j.neu- com.2018.03.068.
T.A. Rashid, “Convolutional neural networks based method for improving facial expression recognition,” in Advances in Intelligent Systems and Computing, Intelligent Systems Technologies, and Applications 2016. ISTA 2016, J. C. Rodriguez, S. Mitra, S. Thampi, E. S. El-Alfy (Eds)., vol. 530, 2016, Springer, Cham.
A. Ruiz-Garcia, M. Elshaw, A. Altahhan, and V. Palade, “Deep learning for emotion recognition in faces,” in Artificial Neural Net- works and Machine Learning – ICANN 2016, A.E.P. Villa, P. Masulli, and A.J.P. Rivero (Eds.), vol. 9887, 2016, Switzerland: Springer Verlag, pp. 38‒46, doi: 10.1007/978-3-319-44781-0_5.
M. Shamim Hossain and Ghulam Muhammad, “Emotion recognition using deep learning approach from audio-visual emotional big data,” Information Fusion, vol. 49, pp. 69‒78, 2019, doi: 10.1016/j.inffus.2018.09.008.
A.S. Vyas, H.B. Prajapati, and V.K. Dabhi, “Survey on face expression recognition using CNN,” in Proc. 5th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2019, pp. 102‒106.
M.M. Taghi Zadeh, M. Imani, and B. Majid, “Fast facial emotion recognition using convolutional neural networks and Gabor filters,” in Proc. 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), Tehran, Iran, 2019, pp. 577–581.
A. Renda, M. Barsacchi, A. Bechini, and F. Marcelloni, “Comparing ensemble strategies for deep learning: An application to facial ex- pression recognition,” Expert Syst. Appl., vol. 136, pp. 1‒11, 2019, doi: 10.1016/j.eswa.2019.06.025.
H. Ding, S. Zhou, and R. Chellappa, “FaceNet2ExpNet: Regularizing a deep face recognition net for expression recognition,” in Proc. 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), Washington, USA, 2017, pp. 118‒126. doi: 10.1109/FG.2017.23.
J. Li et al., “Facial Expression Recognition by Transfer Learning for Small Datasets,” in Security with Intelligent Computing and Big-data Services. SICBS 2018. Advances in Intelligent Systems and Computing, C. N. Yang, S. L. Peng, L. Jain, (Eds.), vol. 895, Springer, Cham, 2018.
Y. Wang, C. Wang, L. Luo, and Z. Zhou, “Image Classification Based on transfer Learning of Convolutional neural network,” in Proc. Chinese Control Conference (CCC), Guangzhou, China, 2019, pp. 7506‒7510.
I. Lee, H. Jung, C. H. Ahn, J. Seo, J. Kim, and O. Kwon, “Real-time personalized facial expression recognition system based on deep learning,” in Proc. 2016 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, USA, 2016, pp. 267‒268.
J. Chen, X. Liu, P. Tu, and A. Aragones, “Person-specific expression recognition with transfer learning,” in Proc 19th IEEE International Conference on Image Processing, Orlando, USA, 2012, pp. 2621‒2624.
Y. Fan, J.C.K. Lam, and V.O.K. Li, “Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition”, arXiv, 2018, cs. CV, https://arxiv.org/abs/1807.10575v1.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016, pp. 770‒778.
J. Chmielińska and J. Jakubowski, “Detection of driver fatigue symptoms using transfer learning,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 66, pp. 869‒874, 2018, doi: 10.24425/bpas.2018.125934.
E. Lukasik et al., “Recognition of handwritten Latin characters with diacritics using CNN,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 69, no. 1, 2021, article number: e136210, doi: 10.24425/bpasts.2020.136210.
H. Zhang, A. Jolfaei, and M. Alazab, “A Face Emotion Recognition Method Using Convolutional Neural Network and Image Edge Computing,” IEEE Access, vol. 7, pp. 159081‒159089, 2019, doi: 10.1109/ACCESS.2019.2949741.
HackerEarth, “Transfer Learning Introduction Tutorials and Notes: Machine Learning,” [Online]. Available: https://www.hackerearth. com/practice/machine-learning/transfer-learning/transfer-learning-intro/tutorial/
S. Minaee, M. Minaei, and A. Abdolrashidi, “Deep-emotion: Facial expression recognition using attentional convolutional network,” Sensors, vol. 21, no. 9, p. 3046, 2021, doi: 10.3390/s21093046.
M.J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, “Coding facial expressions with Gabor wavelets,” in Proc. 3rd IEEE International Conference on Automatic Face and Gesture Recognition, 1998, pp. 200‒205, doi: 10.1109/AFGR.1998.670949.
P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, San Francisco, USA, 2010, pp. 94‒101, doi: 10.1109/CVPRW.2010.5543262.
M.F.H. Siddiqui and A.Y. Javaid, “A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images,” Multimodal Technol. Interact., vol. 4, no. 3, p. 46, 2020, doi: 10.3390/mti4030046.
M.S. Zia, M. Hussain, and M.A.A Jaffar, “Novel spontaneous facial expression recognition using dynamically weighted majority voting based ensemble classifier,” Multimed. Tools Appl., vol. 77, pp. 25537–25567, 2018.
D. Lundqvist, A. Flykt, and A. Öhman, “The Karolinska Directed Emotional Faces – KDEF,” CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, 1998.

Go to article

Authors and Affiliations

Nagesh Jadhav

Rekha Sugandhi

MIT ADT University, Pune, Maharashtra, 412201, India

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems

Sadasivam Uma Maheswari A. Shahina Ramesh Rishickesh A. Nayeemulla Khan

Archives of Acoustics | 2020 | vol. 45 | No 3 | 419-431 | DOI: 10.24425/aoa.2020.134058

Keywords Lombard speech multimodal ASR throat microphone visual speech Convolutional Neural Network Hidden Markov Model late fusion intermediate fusion

Download PDF Download RIS Download Bibtex

Abstract

Research work on the design of robust multimodal speech recognition systems making use of acoustic and visual cues, extracted using the relatively noise robust alternate speech sensors is gaining interest in recent times among the speech processing research fraternity. The primary objective of this work is to study the exclusive influence of Lombard effect on the automatic recognition of the confusable syllabic consonant-vowel units of Hindi language, as a step towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature. The dataset for this work comprises the confusable 145 consonant-vowel (CV) syllabic units of Hindi language recorded simultaneously using three modalities that capture the acoustic and visual speech cues, namely normal acoustic microphone (NM), throat microphone (TM) and a camera that captures the associated lip movements. The Lombard effect is induced by feeding crowd noise into the speaker’s headphone while recording. Convolutional Neural Network (CNN) models are built to categorise the CV units based on their place of articulation (POA), manner of articulation (MOA), and vowels (under clean and Lombard conditions). For validation purpose, corresponding Hidden Markov Models (HMM) are also built and tested. Unimodal Automatic Speech Recognition (ASR) systems built using each of the three speech cues from Lombard speech show a loss in recognition of MOA and vowels while POA gets a boost in all the systems due to Lombard effect. Combining the three complimentary speech cues to build bimodal and trimodal ASR systems shows that the recognition loss due to Lombard effect for MOA and vowels reduces compared to the unimodal systems, while the POA recognition is still better due to Lombard effect. A bimodal system is proposed using only alternate acoustic and visual cues which gives a better discrimination of the place and manner of articulation than even standard ASR system. Among the multimodal ASR systems studied, the proposed trimodal system based on Lombard speech gives the best recognition accuracy of 98%, 95%, and 76% for the vowels, MOA and POA, respectively, with an average improvement of 36% over the unimodal ASR systems and 9% improvement over the bimodal ASR systems.

Go to article

Authors and Affiliations

Sadasivam Uma Maheswari

A. Shahina

Ramesh Rishickesh

A. Nayeemulla Khan

Search results

Filters

Search results

Multi-model hybrid ensemble weighted adaptive approach with decision level fusion for personalized affect recognition based on visual cues

Abstract

Bibliography

Authors and Affiliations

A Study on the Impact of Lombard Effect on Recognition of Hindi Syllabic Units Using CNN Based Multimodal ASR Systems

Abstract

Authors and Affiliations