Video Face Detection Based on Improved SSD Model and Target Tracking Algorithm
DOI:
https://doi.org/10.13052/jwe1540-9589.21218Keywords:
Deep neural network, SSD target detection, continuous frame target tracking, kernel correlation filteringAbstract
Video face detection technology has a wide range of applications, such as video surveillance, image retrieval, and human-computer interaction. However, face detection always has some uncontrollable interference factors in the video sequence, such as changes in lighting, complex backgrounds, and face changes in scale and occlusion conditions, etc. Therefore, this paper introduces deep learning theory and combines the continuity characteristics of video sequences to make related research on video face detection algorithms based on deep learning. First, this algorithm uses the residual network as the basic network of the Single Shot MultiBox Detector (SSD) target detection network model and trains a Rest-SSD face detection model to detect faces. Experimental results show that the method can achieve real-time detection and improve the accuracy of video face detection, which is required for face detection in a video. Then we based on the continuity characteristics of video sequences. This paper proposes a video face detection method based on the training of the Rest-SSD face detection model. The method first uses kernel correlation filtering to track consecutive n frames according to the detection results, sets weights on the confidence of the n frames of tracking results, uses the weighted average method to calculate the best tracking result, and then sets the best tracking result confidence and the current frame sets the appropriate weights for the confidence of the detection result for fusion, thereby improving the video face detection accuracy.
Downloads
References
Ming-Hsuan, Y.; Kriegman, D.J.; Ahuja, N. Detecting faces in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002, 24, 34–58, doi:10.1109/34.982883.
Kuo, W.; Hariharan, B.; Malik, J. DeepBox: Learning Objectness with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 Dec. 2015, 2015; pp. 2479–2487.
Shi, Y.; Yu, X.; Sohn, K.; Chandraker, M.; Jain, A.K. Towards Universal Representation Learning for Deep Face Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, 2020; pp. 6816–6825.
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 June 2014, 2014; pp. 580–587.
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 37, 1904–1916, doi:10.1109/TPAMI.2015.2389824.
Tang, J.; Mao, Y.; Wang, J.; Wang, L. Multi-task Enhanced Dam Crack Image Detection Based on Faster R-CNN. In Proceedings of the 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), 5–7 July 2019, 2019; pp. 336–340.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 1137–1149, doi:10.1109/TPAMI.2016.2577031.
Wu, W.; Yin, Y.; Wang, X.; Xu, D. Face Detection With Different Scales Based on Faster R-CNN. IEEE Transactions on Cybernetics 2019, 49, 4017–4028, doi:10.1109/TCYB.2018.2859482.
Jing, S.; Hu, C.; Wang, C.; Zhou, G.; Yu, J. Vehicle Face Detection Based on Cascaded Convolutional Neural Networks. In Proceedings of the 2019 Chinese Automation Congress (CAC), 22–24 Nov. 2019, 2019; pp. 5149–5152.
Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 2016, 23, 1499–1503, doi:10.1109/LSP.2016.2603342.
Dang, K.; Sharma, S. Review and comparison of face detection algorithms. In Proceedings of the 2017 7th International Conference on Cloud Computing, Data Science & Engineering – Confluence, 12–13 Jan. 2017, 2017; pp. 629–633.
Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J.; Patel, V.M.; Castillo, C.D.; Chellappa, R. Deep Learning for Understanding Faces: Machines May Be Just as Good, or Better, than Humans. IEEE Signal Processing Magazine 2018, 35, 66–83, doi:10.1109/MSP.2017.2764116.
Jiao, L.; Zhang, R.; Liu, F.; Yang, S.; Hou, B.; Li, L.; Tang, X. New Generation Deep Learning for Video Object Detection: A Survey. IEEE Transactions on Neural Networks and Learning Systems 2021, 1–21, doi:10.1109/TNNLS.2021.3053249.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, 2016; pp. 779–788.
Chengcheng, N.; Huajun, Z.; Yan, S.; Jinhui, T. Inception Single Shot MultiBox Detector for object detection. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 10–14 July 2017, 2017; pp. 549–554.
Hua, W.; Tong, Q. Research on Face Expression Detection Based on Improved Faster R-CNN. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), 27–29 June 2020, 2020; pp. 1189–1193.
Lavinia, Y.; Vo, H.H.; Verma, A. Fusion Based Deep CNN for Improved Large-Scale Image Action Recognition. In Proceedings of the 2016 IEEE International Symposium on Multimedia (ISM), 11–13 Dec. 2016, 2016; pp. 609–614.
Kuo, W.; Hariharan, B.; Malik, J. DeepBox: Learning Objectness with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 7–13 Dec. 2015, 2015; pp. 2479–2487.
Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]. European conference on computer vision. Springer, Cham, 2016: 21–37.
Yang, S.; Luo, P.; Loy, C.C.; Tang, X. WIDER FACE: A Face Detection Benchmark. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27–30 June 2016, 2016; pp. 5525–5533.
Fu, J.; Alvar, S.R.; Bajic, I.; Vaughan, R. FDDB-360: Face Detection in 360-Degree Fisheye Images. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 28–30 March 2019, 2019; pp. 15–19.