Energy Interpolated Template Coding for Video Compression in Traffic Surveillance Application

  • Shivprasad P. Patil Department of Business Development and Technology, Aarhus University, Herning, Denmark
  • Rajarshi Sanyal Belgacom International Carrier Services, Brussels, Belgium
  • Ramjee Prasad Department of Business Development and Technology, Aarhus University, Herning, Denmark
Keywords: Video compression, Template coding, Energy interpolation, Traffic surveillance


In video coding, exploitation of temporal correlation between frames is an important step for reduction of redundant data in successive video frames. However, dynamic nature of video content introduces difficulty in finding temporal correlation. In this paper we propose a novel template coding approach to compress the video data for traffic surveillance which addresses above said difficulty. In this work, the conventional approach of the template coding, wherein two successive frames are considered, is improved by a ‘dynamic model’ of the template. The dynamism of template selection is achieved through energy interpolation of successive frame data over some time period, rather than only two successive frame data.Acoherent histogram model is developed to build accurate template to achieve improvement in compression. The proposed efficient template matching approach predicts exact template thereby minimizing the processing overheads and reduction in processing time. The obtained simulation result unveils that, the proposed approach results in accurate template localization, thereby improving the accuracy in coding and the coding speed in comparison to conventional template based compression approaches.



Download data is not yet available.


Wang, M., Hua, X. S., Tang, J., and Hong, R. (2009). Beyond distance

measurement: constructing neighborhood similarity for video annotation.

IEEE Transactions on Multimedia, 11(3), 465–476.

Yamato, J., Ohya, J., and Ishii, K. (1992). Recognizing human action in

time-sequential images using hidden markov model. In IEEE Computer

Society Conference on Computer Vision and Pattern Recognition, 1992.

Proceedings CVPR’92., (pp. 379–385). IEEE.

Davis, J.W., and Bobick,A. F. (1997). The representation and recognition

of human movement using temporal templates. In IEEE Computer

Society Conference on Computer Vision and Pattern Recognition, 1997.

Proceedings., (pp. 928–934). IEEE.

Laptev, I. (2005). On space-time interest points. International journal of

computer vision, 64(2–3), 107–123.

Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005). Behavior

recognition via sparse spatio-temporal features. In 2nd Joint IEEE International

Workshop on Visual Surveillance and Performance Evaluation

of Tracking and Surveillance, (pp. 65–72). IEEE.

Thi, T. H., Zhang, J., Cheng, L.,Wang, L., and Satoh, S. (2010). Human

action recognition and localization in video using structured learning of

local space-time features. In Seventh IEEE International Conference on

Advanced Video and Signal Based Surveillance (AVSS), (pp. 204–211).


Ryoo, M. S., and Aggarwal, J. K. (2009). Spatio-temporal relationship

match: Video structure comparison for recognition of complex human

activities. In IEEE 12th international conference on Computer vision,

(pp. 1593–1600). IEEE.

Mikolajczyk, K., and Schmid, C. (2002). An affine invariant interest

point detector. In European conference on computer vision (pp. 128–142).

Springer, Berlin,

Lowe, D. G. (2004). Distinctive image features from scaleinvariant

keypoints. International journal of computer vision, 60(2),


Scovanner, P., Ali, S., and Shah, M. (2007).A3-dimensional sift descriptor

and its application to action recognition. In Proceedings of the 15th

ACM international conference on Multimedia (pp. 357–360). ACM.

Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008).

Learning realistic human actions from movies. In IEEE Conference on

Computer Vision and Pattern Recognition, 2008. CVPR 2008. (pp. 1–8).


Kl¨aser, A., Marszalek, M., and Schmid, C. (2008). A spatio-temporal

descriptor based on 3d-gradients. In BMVC 2008-19th British Machine

Vision Conference (pp. 275–1). British Machine Vision Association.

Shao, L., Jones, S., and Li, X. (2014). Efficient search and localization

of human actions in video databases. IEEE Transactions on Circuits and

Systems for Video Technology, 24(3), 504–512.

Zepeda, J., Turkan, M., and Thoreau, D. (2015). Block prediction using

approximate template matching. In 23rd European Signal Processing

Conference (EUSIPCO), (pp. 96–100). IEEE.

Chen, T., Sun, X., and Wu, F. (2010). Predictive patch matching for

inter-frame coding. In Visual Communications and Image Processing,

(Vol. 7744, p. 774412). International Society for Optics and Photonics.