Efficient Position Estimation Based on GPU-Accelerated Content-based Image Retrieval


  • Yuta Kusamura Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, 305–8573, Japan
  • Toshiyuki Amagasa Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305–8573, Japan
  • Hiroyuki Kitagawa Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305–8573, Japan
  • Yusuke Kozawa Artificial Intelligence Research Center, AIST, Koto-ku, Tokyo, 130–0064, Japan




GPU, LSH, SIFT, Content-based Image Retrieval, Position Estimation


We propose an efficient position estimation method based on GPUaccelerated content-based image retrieval (CBIR). The idea is to use videos of first-person vision associated with geographical position information as the database. When a user sends a current subjective image, the system estimates the position using CBIR. Since features extracted from images are in general high-dimensional vectors, thousands of vectors are extracted even from a single image, resulting in high processing cost. On the other hand, GPUs (graphics processing unit) have contributed to accelerate various processing, while they are originally for graphics processing. Therefore, we utilize GPU to accelerate CBIR with appropriate data structures and algorithms. Moreover, our proposed method considers spatial locality of pedestrians in position estimation applications in order to improve accuracy. We demonstrate the efficiency and accuracy of the proposed method through experiments using a video dataset.



Download data is not yet available.


Kameda, Y., and Ohta, Y. (2010). Image retrieval of first-person

vision for pedestrian navigation in urban area. In 20th International

Conference on Pattern Recognition (ICPR), pp. 364–367.

Kurata, T., Kourogi, M., Ishikawa, T., Kameda, Y., Aoki, K., and

Ishikawa, J. (2011). Indoor-outdoor navigation system for visuallyimpaired

pedestrians: Preliminary evaluation of position measurement

and obstacle display. In 15th Annual International Symposium on

Wearable Computers (ISWC), pp. 123–124.

Takizawa, H., Orita, K., Aoyagi, M., Ezaki, N., and Mizuno, S.

(2017). A Spot Reminder System for the Visually Impaired Based on

a Smartphone Camera. Sensors, 17(2), 291.

Lowe, D. G. (1999). Object recognition from local scale-invariant features.

In Proceedings of the 7th International Conference on Computer

Vision (ICCV 1999), pp. 1150–1157.

Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf: Speeded

up robust features. In European conference on computer vision,

(pp. 404–417). Springer, Berlin, Heidelberg.

Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., and

Phillips, J. C. (2008). GPU computing. In Proceedings of the IEEE,

(5), 879–899.

Indyk, P., and Motwani, R. (1998). Approximate nearest neighbors:

towards removing the curse of dimensionality. In Proceedings of

the Thirtieth Annual ACM Symposium on Theory of Computing,

pp. 604–613.

Baeza-Yates, R., and Ribeiro-Neto, B. (2011). Modern Information

Retrieval: The Concepts and Technology Behind Search, volume 2.

Addison Wesley: Boston.

Alcantarilla, P. F., Nuevo, J., and Bartoli, A. (2013). Fast explicit diffusion

for accelerated features in nonlinear scale spaces. In Proceedings

of the British Machine Vision Conference (BMVC 2013), pp. 1–11.

Cheng, J., Leng, C.,Wu, J., Cui, H., and Lu, H. (2014). Fast and accurate

image matching with cascade hashing for 3d reconstruction. In 2014

IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

pp. 1–8.

Andoni, A., and Indyk, P. (2008). Near-optimal hashing algorithms

for approximate nearest neighbor in high dimensions. Commun. ACM,

(1), 117–122.

Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S. (2004).

Locality-sensitive hashing scheme based on p-stable distributions. In

Proceedings of the Twentieth Annual Symposium on Computational

Geometry, pp. 253–262.

Guttman, A. (1984). R-trees: A dynamic index structure for spatial

searching. In Proceedings of the 1984 ACM SIGMOD International

Conference on Management of Data, SIGMOD ’84, pp. 47–57,

New York, NY, USA.

Kamasaka, K. and Kitahara, I., and Kameda, Y. (2017). Image based

location estimation for walking out of visual impaired person. In Proceedings

of the 14th European Conference on the Advancement of

Assistive Technology, AAATE Conf. 2017, Sheffield, UK, September

–15, pp. 709–716.

Sivic, J., and Zisserman, A. (2003). Video google: A text retrieval

approach to object matching in videos. In Proceedings of the 9th

IEEE International Conference on Computer Vision (ICCV 2003),

pp. 1470–1477.

Cevahir, A., and Torii, J. (2012). GPU-enabled high performance

online visual search with high accuracy. In 2012 IEEE International

Symposium on Multimedia (ISM), pp. 413–420.

Chandrasekhar, V., et al. (2010). Survey of SIFT compression schemes.

In Proceedings of the International Workshop Mobile Multimedia

Processing, pp. 35–40.