Consensual Collaborative Training And Knowledge Distillation Based Facial Expression Recognition Under Noisy Annotations
How to Cite?
Darshan Gera, S. Balasubramanian, "Consensual Collaborative Training And Knowledge Distillation Based Facial Expression Recognition Under Noisy Annotations," International Journal of Engineering Trends and Technology, vol. 69, no. 7, pp. 244-254, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I7P231
Abstract
Presence of noise in the labels of large scale facial expression datasets has been a key challenge towards Facial Expression Recognition (FER) in the wild. During early learning stage, deep networks fit on clean data. Then, eventually, they start overfitting on noisy labels due to their memorization ability, which limits FER performance. This work proposes an effective training strategy in the presence of noisy labels, called as Consensual Collaborative Training (CCT) framework. CCT co-trains three networks jointly using a convex combination of supervision loss and consistency loss, without making any assumption about the noise distribution. A dynamic transition mechanism is used to move from supervision loss in early learning to consistency loss for consensus of predictions among networks in the later stage. Inference is done using a single network based on a simple knowledge distillation scheme. Effectiveness of the proposed framework is demonstrated on synthetic as well as real noisy FER datasets. In addition, a large test subset of around 5K images is annotated from the FEC dataset using crowd wisdom of 16 different annotators and reliable labels are inferred. CCT is also validated on it. State-of-the-art performance is reported on the benchmark FER datasets RAFDB (90.84%) FERPlus (89.99%) and AffectNet (66%).
Keywords
Collaborative training, Crowd-sourcing, Knowledge distillation, Facial expression recognition, Noisy annotation.
Reference
[1] S. Li, W. Deng, Deep facial expression recognition: A survey, IEEE Transactions on Affective Computing (2020).
[2] F. Abdat, C. Maaoui, A. Pruski, Human-computer interaction using emotion recognition from facial expression, in: 2011 UKSim 5th European Symposium on Computer Modeling and Simulation, (2011)196–201.
[3] JihenKhalfallah, J. B. H. Slama, Facial expression recognition for intelligent tutoring systems in remote laboratories platform, Procedia Computer Science 73(2015) 274–281.
[4] S. Hachisuka, K. Ishida, T. Enya, M. Kamijo, Facial expression measurement for detecting driver drowsiness, in: D. Harris (Ed.) Engineering Psychology and Cognitive Ergonomics, Springer Berlin Heidelberg,Berlin, Heidelberg, (2011) 135–144.
[5] Z. Fei, E. Yang, D. Day-Uei, S. Butler, W. Ijomah, X. Li, H. Zhou, Deep convolution network based emotion analysis towards mental health care, Neurocomputing 388 (2020) 212–227.
[6] H. chun Lo, R. Chung, Facial expression recognition approach for performance animation, IEEE Proceedings Second International Workshop on Digital and Computational Video 6 (2001) 613–622.
[7] J. MA, Facial expression recognition using hybrid texture features based ensemble classifier, International Journal of Advanced Computer Science and Applications(IJACSA) 6(2017). URL https://dx.doi.org/10.14569/IJACSA.2017.080660
[8] S. CF, G. SG, M. PW, Facial expression recognition based on local binary patterns: A comprehensive study, Image and Vision Computing 27(2009) 803–816. URL https://doi.org/10.1016/j.imavis.2008.08.005
[9] P. Hu, D. Cai, S. Wang, A. Yao, Y. Chen, Learning supervised scoring ensemble for emotion recognition in the wild, Proceedings of the 19th ACM International Conference on Multimodal Interaction. (2017) 553–560.
[10] H. chun Lo, R. Chung, Facial expression recognition approach for performance animation, IEEE Proceedings 2nd International Workshop on Digital and Computational Video 6 (2001) 613–622.
[11] T. Kanade, J. F. Cohn, Y. Tian, Comprehensive database for facial expression analysis, in: Proceedings of the Foruth IEEE International Conference on Automatic Face and Gesture Recognition(FG’OO) Grenoble, France, (2000)46–53.
[12] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews, The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, in: Computer Vision and Pattern Recognition Workshops(CVPRW) San Francisco, USA, (2010)94–101.
[13] G. Zhao, X. Huang, M. Taini, S. Z. Li, M. Pietikainen, Facial expression recognition from near-infrared videos, Image and Vision Computing 29(2011) 607–619. URL https://doi.org/10.1016/j.imavis.2011.07.002
[14] S. FY, C. CF, W. PSP, Performance comparisons of facial expression recognition in jaffe database, International Journal of Pattern Recognition and Artificial Intelligence 22(2008) 445–459. doi:0.1142/S0218001408006284.
[15] A. Mollahosseini, B. Hasani, M. H. Mahoor, Affectnet: A database for facial expression, valence, and arousal computing in the wild, IEEE Transactions on Affective Computing 10 (1) (2017) 18–31.
[16] E. Barsoum, C. Zhang, C. C. Ferrer, Z. Zhang, Training deep networks for facial expression recognition with crowd-sourced label distribution, in: Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016, pp. 279–283.
[17] S. Li, W. Deng, Reliable crowdsourcing and deep localitypreserving learning for unconstrained facial expression recognition, IEEE Transactions on Image Processing 28 (1) (2018) 356–370.
[18] S. Li, W. Deng, J. Du, Reliable crowdsourcing and deep localitypreserving learning for expression recognition in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2852–2861.
[19] K. Wang, X. Peng, J. Yang, S. Lu, Y. Qiao, Suppressing uncertainties for large-scale facial expression recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6897–6906.
[20] D. Arpit, S. Jastrz, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal,T. Maharaj, A. Fischer, A. Courville, Y. Bengio, et al., A closer look at memorization in deep networks, in: International Conference on Machine Learning, PMLR, 2017, pp. 233–242.
[21] C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires rethinking generalization, in: ICLR, 2017.URL https://arxiv.org/abs/1611.03530
[22] B. Frenay, M. Verleysen, Classification in the presence of label noise: a survey, IEEE transactions on neural networks and learning systems 25 (5)(2013) 845–869.
[23] J. Goldberger, E. Ben-Reuven, Training deep neural-networks using a noise adaptation layer (2016).
[24] G. Patrini, A. Rozza, A. Menon, R. Nock, L. Qu, Making neural networks robust to label noise: a loss correction approach, arXiv preprint arXiv:1609.03683 (2016).
[25] B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, M. Sugiyama, Coteaching: Robust training of deep neural networks with extremely noisy labels, arXiv preprint arXiv:1804.06872 (2018).
[26] L. Jiang, Z. Zhou, T. Leung, L.-J. Li, L. Fei-Fei, Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels, in: International Conference on Machine Learning, PMLR, (2018) 2304–2313.
[27] R. Vemulapalli, A. Agarwala, A compact embedding for facial expression similarity, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019)5683–5692.
[28] X. Yu, B. Han, J. Yao, G. Niu, I. Tsang, M. Sugiyama, How does disagreement help generalization against label corruption?, in: International Conference on Machine Learning, PMLR, (2019)7164–7173.
[29] X. Wang, Y. Hua, E. Kodirov, N. M. Robertson, Imae for noiserobust learning: Mean absolute error does not treat examples equally and gradient magnitude’s variance matters, arXiv preprint arXiv:1903.12141 (2019).
[30] Y. Wang, X. Ma, Z. Chen, Y. Luo, J. Yi, J. Bailey, Symmetric cross entropy for robust learning with noisy labels, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019)322–330.
[31] B. Yuan, J. Chen, W. Zhang, H.-S. Tai, S. McMains, Iterative cross learning on noisy labels, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) IEEE, (2018)757–765.
[32] M. Ren, W. Zeng, B. Yang, R. Urtasun, Learning to reweight examples for robust deep learning, in: International Conference on Machine Learning, PMLR, (2018) 4334–4343.
[33] B. Han, G. Niu, J. Yao, X. Yu, M. Xu, I. Tsang, M. Sugiyama, Pumpout: A meta approach for robustly training deep neural networks with noisy labels (2018).
[34] Q. Miao, Y. Cao, G. Xia, M. Gong, J. Liu, J. Song, Rboost: Label noiserobust boosting algorithm based on a nonconvex loss function and the numerically stable base learners, IEEE transactions on neural networks and learning systems 27(11)(2015) 2216–2228.
[35] Y. Kim, J. Yim, J. Yun, J. Kim, Nlnl: Negative learning for noisy labels, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019) 101–110.
[36] W. Zhang, Y. Wang, Y. Qiao, Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019)7373–7382.
[37] K. Yi, J. Wu, Probabilistic end-to-end noise correction for learning with noisy labels, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019)7017–7025.
[38] E. Malach, S. Shalev-Shwartz, Decoupling., when to update, from” how to update, Advances in Neural Information Processing Systems (2017).
[39] H. Wei, L. Feng, X. Chen, B. An, Combating noisy labels by agreement: A joint training method with co-regularization, in: Proceedings of 9the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020)13726–13735.
[40] F. Sarfraz, E. Arani, B. Zonooz, Noisy concurrent training for efficient learning under label noise, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, (2021)3159–3168.
[41] J. Zeng, S. Shan, X. Chen, Facial expression recognition with inconsistently annotated datasets, in: Proceedings of the European conference on computer vision (ECCV) (2018)222–237.
[42] A. Blum, T. Mitchell, Combining labeled and unlabeled data with cotraining, in: Proceedings of the eleventh annual conference on Computational learning theory, (1998)92–100.
[43] A. Dutt, D. Pellerin, G. Quenot, Coupled ensembles of neural networks, Neurocomputing 396(2020) 346–357.
[44] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531 (2015).
[45] B. I. Aydin, Y. S. Yilmaz, Y. Li, Q. Li, J. Gao, M. Demirbas, Crowdsourcing for multiple-choice question answering., in: AAAI, Citeseer, (2014)2946–2953.
[46] Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, J. Han, A confidence-aware approach for truth discovery on long-tail data, Proceedings of the VLDB Endowment 8 (4)(2014) 425–436.
[47] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Processing Letters 23(10)(2016) 1499–1503.
[48] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, (2016)770–778.
[49] Y. Guo, L. Zhang, Y. Hu, X. He, J. Gao, Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, in: European conference on computer vision, Springer, (2016)87–102.
[50] K. Wang, X. Peng, J. Yang, D. Meng, Y. Qiao, Region attention networks for pose and occlusion robust facial expression recognition, IEEE Transactions on Image Processing 29(2020) 4057–4069.
[51] Y. Li, J. Zeng, S. Shan, X. Chen, Occlusion aware facial expression recognition using cnn with attention mechanism, IEEE Transactions on Image Processing 28(5) (2018) 2439–2450.
[52] H. Ding, P. Zhou, R. Chellappa, Occlusion-adaptive deep network for robust facial expression recognition, in: 2020 IEEE International Joint Conference on Biometrics (IJCB) IEEE, (2020)1–9.
[53] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE international conference on computer vision, (2017) 618–626.
[54] D. Acharya, Z. Huang, D. Pani Paudel, L. Van Gool, Covariance pooling for facial expression recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2018) 367–374.
[55] E. Arnaud, A. Dapogny, K. Bailly, Thin: Throwable information networks and application for facial expression recognition in the wild, arXiv preprint arXiv:2010.07614 (2020).
[56] M. A. Mahmoudi, A. Chetouani, F. Boufera, H. Tabia, Kernelized dense layers for facial expression recognition, in: 2020 IEEE International Conference on Image Processing (ICIP) IEEE, (2020)2226–2230.
[57] P. Liu, Y. Lin, Z. Meng, W. Deng, J. T. Zhou, Y. Yang, Point adversarial self mining: A simple method for facial expression recognition in the wild, arXiv preprint arXiv:2008.11401 (2020).
[58] [58] P. Jiang, B. Wan, Q. Wang, J. Wu, Fast and efficient facial expression recognition using a gabor convolutional network, IEEE Signal Processing Letters 27(2020) 1954–1958.
[59] M.-I. Georgescu, R. T. Ionescu, M. Popescu, Local learning with deep and handcrafted features for facial expression recognition, IEEE Access 7 (2019) 64827–64836.
[60] H. Siqueira, S. Magg, S. Wermter, Efficient facial feature learning with wide ensemble-based convolutional neural networks, in: Proceedings of the AAAI conference on artificial intelligence, 34(2020) 5800–5809.
[61] D. Kollias, S. Cheng, E. Ververas, I. Kotsia, S. Zafeiriou, Generating faces for affect analysis, arXiv preprint arXiv:1811.05027 16 (2018).
[62] W. Hua, F. Dai, L. Huang, J. Xiong, G. Gui, Hero: Human emotions recognition for realizing intelligent internet of things, IEEE Access 7 (2019) 24321–24332.
[63] Y. Fu, X. Wu, X. Li, Z. Pan, D. Luo, Semantic neighborhood-aware deep facial expression recognition, IEEE Transactions on Image Processing 29(2020) 6535–6548.
[64] Y. Chen, J. Wang, S. Chen, Z. Shi, J. Cai, Facial motion prior networks for facial expression recognition, in:VCIP, IEEE, (2019) 1–4.