Feature-level Fusion vs. Score-level Fusion for Image Retrieval Based on Pre-trained Deep Neural Networks

Authors

  • Nikolay Neshov Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria
  • Krasmir Tonchev Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria
  • Agata Manolova Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria
  • Vladimir Poulkov Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria
  • Georgi Balabanov Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria

DOI:

https://doi.org/10.13052/jmm1550-4646.2041

Keywords:

Content-based image retrieval system (CBIR), late fusion, early fusion, deep convolutional neural networks (DCNNs), ResNet152, GoogLeNet, InceptionV3

Abstract

Today’s complex multimedia content made retrieving images similar to the user’s query from the database a challenging task. The performance of a Content-Based Image Retrieval System (CBIR) system highly depends on the image representation in a form of low-level features and similarity measurement. The traditional visual descriptors that do not provide good prior domain knowledge could lead to poor performance retrieval results. On the other hand, Deep Convolutional Neural Networks (DCNNs) have recently achieved a remarkable success as methods for image classification in various domains. Recently, pre-trained deep convolution neural networks on thousands of classes have the ability to extract very accurate and representative features which, in addition to classification, can also be successfully used in image retrieval systems. ResNet152, GoogLeNet and InceptionV3 are some of the effective and successful examples of pre-trained DCNNs recently applied in a computer vision tasks such as object recognition, clustering, and classification. In this paper, two approaches for a CBIR system, namely early fusion and late fusion, have been presented and compared. The early fusion utilizes concatenation of the features extracted by each possible pair of DCNNs, that is ResNet152-GoogLeNet, ResNet152-InceptionV3, and GoogLeNet-InceptionV3, and the late fusion apply CombSum method with Z-Score standardization to combine the score results provided by each DCNN of the aforementioned pairs. In the experiments on a popular WANG dataset it has been shown that late fusion approach slightly outperforms early fusion approach. The best performance of our experiments in terms of Average Precision (AP) for the top 20 results reaches 96.82%.

Downloads

Download data is not yet available.

Author Biographies

Nikolay Neshov, Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria

Nikolay Neshov holds a PhD in Communication and Computer Technology from Technical University of Sofia. His doctoral research was concentrated in optimization of Content-Based Image Retrieval (CBIR) systems based on probabilistic models. Currently, his research interest covers computer vision, machine learning, decision analysis, video and image indexing and retrieval, text mining, and facial analysis.

Krasmir Tonchev, Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria

Krasmir Tonchev a senior researcher leading research activities at the “TeleInfrastructure Lab”, Faculty of Telecommunications, TU-Sofa. His research interests include, on the theoretical side, large scale Kernel Machines, modelling of dynamical behaviour, Bayesian modelling, and on the application side, 2D and 3D facial analysis for soft biometrics, afective computing and general scene understanding from video. He is an IEEE member.

Agata Manolova, Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria

Agata Manolova is associate professor with the Faculty of Telecommunications at the Technical University of Sofa (TU-Sofa), Bulgaria and the head of the research laboratory “Electronic systems for visual information”. Her domains of interest are machine learning, pattern recognition, computer vision, image and video processing, biometrics, augmented and virtual reality. She has received her PhD from Universite de Grenoble, France. She is laureate of Fulbright scholarship and an IEEE member.

Vladimir Poulkov, Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria

Vladimir Poulkov is a full time professor at the Faculty of Telecommunications at TU-Sofia. His expertise is in the field of information transmission theory, modulation and coding interference suppression, power control and resource management for next generation telecommunications net-works, cyber physical systems. Currently he is Head of “Te-leInfrastructure” and “Electromagnetic Compatibility of Communication Systems” R&D Laboratories, chairman of Bulgarian Cluster Telecommunications, Vice-Chairman of European Telecommunications Standardization Institute (ETSI) General Assembly, Senior IEEE Member.

Georgi Balabanov, Faculty of Telecommunications, Technical University of Sofia, bul. Kl. Ohridski 8, Sofia 1000, Bulgaria

Georgi Balabanov is associate professor with the Faculty of Telecommunications at the Technical University of Sofia (TU-Sofia), Bulgaria. He received his PhD degree in Communication Networks and Systems from TU-Sofia. He is an affiliate researcher at TeleInfrastructure R&D Lab. Dr. Balabanov has participated in several scientific projects – both national and international. His research interests include Embedded Systems, Teletraffic Engineering, QoS, Internet of Things, Ambient Assisting Living Systems. He is an IEEE member.

References

A.C. Valente, F.V.M. Perez, G.A.S. Megeto, M.H. Cascone, O. Gomes, T.S. Paula and Q. Lin, “Comparison of texture retrieval techniques using deep convolutional features”, in Proc. IS&T Int’l. Symp. on Electronic Imaging: Imaging and Multimedia Analytics in a Web and Mobile World, 8, pp. 406-1–406-7, 2019.

A. Ahmed, Pre-trained CNNs Models for Content based Image Retrieval, International Journal of Advanced Computer Science and Applications, 2021.

A. Alzu’bi, A. Amira and N. Ramzan, “Content-Based Image Retrieval with Compact Deep Convolutional Features,” Neurocomputing, 249, pp. 95–105, 2017.

L.D. Nguyen, D. Lin, Z. Lin and J. Cao, “Deep CNNs for Microscopic Image Classification by Exploiting Transfer Learning and Feature Concatenation,” Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, Seville, Spain, pp. 1–5, May, 2018.

R. Rajkumar and M.V. Sudhamani, “Image Retrieval System Using Residual Neural Network in a Distributed Environment,” International Journal of Recent Technology and Engineering, vol. 8, no. 6, pp. 2277–3878, 2020.

S. Kumar, M.K. Singh and M. K. Mishra, “Improved Content-Based Image Retrieval Using Deep Learning Model,” Journal of Physics: Conference Series, Ser. 2327 012028, 2022.

O. Mohamed, E.A. Khalid, O. Mohammed and A. Brahim, “Content-Based Image Retrieval Using Convolutional Neural Networks,” In First International Conference on Real-Time Intelligent Systems, Springer, Cham, Switzerland, pp. 463–476, 2019.

A. Ahmed and S. Mohamedc, “Implementation of Early and Late Fusion Methods for Content-Based Image Retrieval,” Int. J. Adv. Appl. Sci. vol. 8, no. 7, pp. 97–105, 2022.

J.Z. Wang, J. Li and G. Wiederhold, “Simplicity: Semantics-Sensitive Integrated Matching for Picture Libraries,” IEEE Trans. Pattern Anal. Mach. Intell. Vol. 23(9), pp. 947–963, 2001.

K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778, 2016.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going Deeper with Convolutions,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1–9, 2015.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 2818–2826, 2016.

Published

2024-10-01

How to Cite

Neshov, N., Tonchev, K., Manolova, A., Poulkov, V., & Balabanov, G. (2024). Feature-level Fusion vs. Score-level Fusion for Image Retrieval Based on Pre-trained Deep Neural Networks. Journal of Mobile Multimedia, 20(04), 769–784. https://doi.org/10.13052/jmm1550-4646.2041

Issue

Section

Articles

Most read articles by the same author(s)