Video Face Detection Based on Improved SSD Model and Target Tracking Algorithm

Authors

  • Yilin Liu College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China
  • Ruian Liu College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China
  • Shengxiong Wang College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China
  • Da Yan College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China
  • Bo Peng College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China
  • Tong Zhang College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

DOI:

https://doi.org/10.13052/jwe1540-9589.21218

Keywords:

Deep neural network, SSD target detection, continuous frame target tracking, kernel correlation filtering

Abstract

Video face detection technology has a wide range of applications, such as video surveillance, image retrieval, and human-computer interaction. However, face detection always has some uncontrollable interference factors in the video sequence, such as changes in lighting, complex backgrounds, and face changes in scale and occlusion conditions, etc. Therefore, this paper introduces deep learning theory and combines the continuity characteristics of video sequences to make related research on video face detection algorithms based on deep learning. First, this algorithm uses the residual network as the basic network of the Single Shot MultiBox Detector (SSD) target detection network model and trains a Rest-SSD face detection model to detect faces. Experimental results show that the method can achieve real-time detection and improve the accuracy of video face detection, which is required for face detection in a video. Then we based on the continuity characteristics of video sequences. This paper proposes a video face detection method based on the training of the Rest-SSD face detection model. The method first uses kernel correlation filtering to track consecutive n frames according to the detection results, sets weights on the confidence of the n frames of tracking results, uses the weighted average method to calculate the best tracking result, and then sets the best tracking result confidence and the current frame sets the appropriate weights for the confidence of the detection result for fusion, thereby improving the video face detection accuracy.

Downloads

Download data is not yet available.

Author Biographies

Yilin Liu, College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

Yilin Liu received her bachelor’s degree in Electronic information Engineering from Huaiyin Institute of Technology in 2018. She is currently studying intelligent science and technology at Tianjin Normal University and will receive her master’s degree in 2022. Her research interests include face recognition, facial expression recognition and target detection.

Ruian Liu, College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

Ruian Liu, Professor, received a Ph.D. degree in Precision Instrument and Opto-electronics Engineering from Tianjin University. He is currently Head of School of Electronics in the College of Electronic and Communication Engineering at Tianjin Normal University. His current research interests include image processing, deep learning and artificial intelligence, etc.

Shengxiong Wang, College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

Shengxiong Wang received his bachelor’s degree of Electronic Information Engineering from Tianjin University of Science and Technology in 2020. He is currently studying for a master’s degree of Information and Communication Engineering in Tianjin Normal University and will receive his master’s degree in 2023. His research interests are computer vision, object detection and image processing.

Da Yan, College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

Da Yan received a Bachelor of engineering degree from Yantai University, Yantai City, Shandong Province, majoring in communication engineering. Now studying in Tianjin Normal University, majoring in intelligent science and technology. His research fields include the solution of occlusion in pedestrian re recognition and pedestrian re recognition based on local features.

Bo Peng, College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

Bo Peng received her bachelor’s degree in Communication Engineering from Tianjin University of Commerce in 2019. She is currently studying Electronic Message at Tianjin Normal University and will receive her master’s degree in 2022. Her research interests include image enhancement and image denoising.

Tong Zhang, College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, 300387, China

Tong Zhang is studying for a master’s degree in electronic information from Tianjin Normal University in China. His research fields include face recognition and small target high-precision detection in complex environment.

References

Ming-Hsuan, Y.; Kriegman, D.J.; Ahuja, N. Detecting faces in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002, 24, 34–58, doi:10.1109/34.982883.

Kuo, W.; Hariharan, B.; Malik, J. DeepBox: Learning Objectness with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 Dec. 2015, 2015; pp. 2479–2487.

Shi, Y.; Yu, X.; Sohn, K.; Chandraker, M.; Jain, A.K. Towards Universal Representation Learning for Deep Face Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, 2020; pp. 6816–6825.

Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 June 2014, 2014; pp. 580–587.

He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 37, 1904–1916, doi:10.1109/TPAMI.2015.2389824.

Tang, J.; Mao, Y.; Wang, J.; Wang, L. Multi-task Enhanced Dam Crack Image Detection Based on Faster R-CNN. In Proceedings of the 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), 5–7 July 2019, 2019; pp. 336–340.

Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 1137–1149, doi:10.1109/TPAMI.2016.2577031.

Wu, W.; Yin, Y.; Wang, X.; Xu, D. Face Detection With Different Scales Based on Faster R-CNN. IEEE Transactions on Cybernetics 2019, 49, 4017–4028, doi:10.1109/TCYB.2018.2859482.

Jing, S.; Hu, C.; Wang, C.; Zhou, G.; Yu, J. Vehicle Face Detection Based on Cascaded Convolutional Neural Networks. In Proceedings of the 2019 Chinese Automation Congress (CAC), 22–24 Nov. 2019, 2019; pp. 5149–5152.

Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 2016, 23, 1499–1503, doi:10.1109/LSP.2016.2603342.

Dang, K.; Sharma, S. Review and comparison of face detection algorithms. In Proceedings of the 2017 7th International Conference on Cloud Computing, Data Science & Engineering – Confluence, 12–13 Jan. 2017, 2017; pp. 629–633.

Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J.; Patel, V.M.; Castillo, C.D.; Chellappa, R. Deep Learning for Understanding Faces: Machines May Be Just as Good, or Better, than Humans. IEEE Signal Processing Magazine 2018, 35, 66–83, doi:10.1109/MSP.2017.2764116.

Jiao, L.; Zhang, R.; Liu, F.; Yang, S.; Hou, B.; Li, L.; Tang, X. New Generation Deep Learning for Video Object Detection: A Survey. IEEE Transactions on Neural Networks and Learning Systems 2021, 1–21, doi:10.1109/TNNLS.2021.3053249.

Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, 2016; pp. 779–788.

Chengcheng, N.; Huajun, Z.; Yan, S.; Jinhui, T. Inception Single Shot MultiBox Detector for object detection. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 10–14 July 2017, 2017; pp. 549–554.

Hua, W.; Tong, Q. Research on Face Expression Detection Based on Improved Faster R-CNN. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), 27–29 June 2020, 2020; pp. 1189–1193.

Lavinia, Y.; Vo, H.H.; Verma, A. Fusion Based Deep CNN for Improved Large-Scale Image Action Recognition. In Proceedings of the 2016 IEEE International Symposium on Multimedia (ISM), 11–13 Dec. 2016, 2016; pp. 609–614.

Kuo, W.; Hariharan, B.; Malik, J. DeepBox: Learning Objectness with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 Dec. 2015, 2015; pp. 2479–2487.

Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]. European conference on computer vision. Springer, Cham, 2016: 21–37.

Yang, S.; Luo, P.; Loy, C.C.; Tang, X. WIDER FACE: A Face Detection Benchmark. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, 2016; pp. 5525–5533.

Fu, J.; Alvar, S.R.; Bajic, I.; Vaughan, R. FDDB-360: Face Detection in 360-Degree Fisheye Images. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 28–30 March 2019, 2019; pp. 15–19.

Downloads

Published

2022-01-22

Issue

Section

Articles