Feature-level Fusion vs. Score-level Fusion for Image Retrieval Based on Pre-trained Deep Neural Networks
DOI:
https://doi.org/10.13052/jmm1550-4646.2041Keywords:
Content-based image retrieval system (CBIR), late fusion, early fusion, deep convolutional neural networks (DCNNs), ResNet152, GoogLeNet, InceptionV3Abstract
Today’s complex multimedia content made retrieving images similar to the user’s query from the database a challenging task. The performance of a Content-Based Image Retrieval System (CBIR) system highly depends on the image representation in a form of low-level features and similarity measurement. The traditional visual descriptors that do not provide good prior domain knowledge could lead to poor performance retrieval results. On the other hand, Deep Convolutional Neural Networks (DCNNs) have recently achieved a remarkable success as methods for image classification in various domains. Recently, pre-trained deep convolution neural networks on thousands of classes have the ability to extract very accurate and representative features which, in addition to classification, can also be successfully used in image retrieval systems. ResNet152, GoogLeNet and InceptionV3 are some of the effective and successful examples of pre-trained DCNNs recently applied in a computer vision tasks such as object recognition, clustering, and classification. In this paper, two approaches for a CBIR system, namely early fusion and late fusion, have been presented and compared. The early fusion utilizes concatenation of the features extracted by each possible pair of DCNNs, that is ResNet152-GoogLeNet, ResNet152-InceptionV3, and GoogLeNet-InceptionV3, and the late fusion apply CombSum method with Z-Score standardization to combine the score results provided by each DCNN of the aforementioned pairs. In the experiments on a popular WANG dataset it has been shown that late fusion approach slightly outperforms early fusion approach. The best performance of our experiments in terms of Average Precision (AP) for the top 20 results reaches 96.82%.
Downloads
References
A.C. Valente, F.V.M. Perez, G.A.S. Megeto, M.H. Cascone, O. Gomes, T.S. Paula and Q. Lin, “Comparison of texture retrieval techniques using deep convolutional features”, in Proc. IS&T Int’l. Symp. on Electronic Imaging: Imaging and Multimedia Analytics in a Web and Mobile World, 8, pp. 406-1–406-7, 2019.
A. Ahmed, Pre-trained CNNs Models for Content based Image Retrieval, International Journal of Advanced Computer Science and Applications, 2021.
A. Alzu’bi, A. Amira and N. Ramzan, “Content-Based Image Retrieval with Compact Deep Convolutional Features,” Neurocomputing, 249, pp. 95–105, 2017.
L.D. Nguyen, D. Lin, Z. Lin and J. Cao, “Deep CNNs for Microscopic Image Classification by Exploiting Transfer Learning and Feature Concatenation,” Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, Seville, Spain, pp. 1–5, May, 2018.
R. Rajkumar and M.V. Sudhamani, “Image Retrieval System Using Residual Neural Network in a Distributed Environment,” International Journal of Recent Technology and Engineering, vol. 8, no. 6, pp. 2277–3878, 2020.
S. Kumar, M.K. Singh and M. K. Mishra, “Improved Content-Based Image Retrieval Using Deep Learning Model,” Journal of Physics: Conference Series, Ser. 2327 012028, 2022.
O. Mohamed, E.A. Khalid, O. Mohammed and A. Brahim, “Content-Based Image Retrieval Using Convolutional Neural Networks,” In First International Conference on Real-Time Intelligent Systems, Springer, Cham, Switzerland, pp. 463–476, 2019.
A. Ahmed and S. Mohamedc, “Implementation of Early and Late Fusion Methods for Content-Based Image Retrieval,” Int. J. Adv. Appl. Sci. vol. 8, no. 7, pp. 97–105, 2022.
J.Z. Wang, J. Li and G. Wiederhold, “Simplicity: Semantics-Sensitive Integrated Matching for Picture Libraries,” IEEE Trans. Pattern Anal. Mach. Intell. Vol. 23(9), pp. 947–963, 2001.
K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778, 2016.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going Deeper with Convolutions,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1–9, 2015.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 2818–2826, 2016.