STAFF: Automated Signature Generation for Fine-Grained Function Traffic Identification


  • Yazhe Tang Xi’an Jiaotong University, Shaanxi, China
  • Xun Li Xi’an Jiaotong University, Shaanxi, China
  • Lishui Chen The 54th Research Institute of China Electronics Technology Group Corporation, Shijiazhuang, China


automated signature generation, application function signature, application traffic identification


Identifying a user operating application function can reflect the user behavior, or even can help to improve the user experience. It is the focus of the real application in big data analytics technology. Unlike Coarse-grained Traffic Identification (CTI) which only identify application/protocol that a packet is related to, Fine-grained Function Traffic Identification (FFTI) maps the traffic packet to a meaningful user operation or an application function. In this paper, our focus is to identify the fine-grained function signature. We propose an automatic and stable signature generation method, so-called STAFF, to identify different application functions. STAFF treats data packets as long strings. The aim of our method is to find all the string fragments whose length is longer than a prescribed length and whose occurrence is higher than a prescribed frequency. The final signature will be presented as pairs of string fragments and their corresponding occurrence frequency. The experimental results show that STAFF can automatically generate finegrained function signatures in different applications with average 93.65% identification accuracy and the method is noise insensitive.



Download data is not yet available.


Moore, A. W., and Papagiannaki, K. (2005). Toward the accurate identification

of network applications. In International Workshop on Passive

and Active Network Measurement (Vol. 5, pp. 41–54). Springer, Berlin,


Nguyen, T. T., and Armitage, G. (2008).Asurvey of techniques for internet

traffic classification using machine learning. IEEE Communications

Surveys & Tutorials, 10(4), 56–76.

Choi, Y., Chung, J. Y., Park, B., and Hong, J. W. K. (2012). Automated

classifier generation for application-level mobile traffic identification.

In Network Operations and Management Symposium (NOMS), 2012

IEEE (pp. 1075–1081). IEEE.

Dainotti, A., Gargiulo, F., Kuncheva, L. I., Pescapè, A., and Sansone,

C. (2010). Identification of traffic flows hiding behind TCP port 80.

In International Conference on Communications (ICC), 2010 IEEE

(pp. 1–6). IEEE.

Agrawal, R., and Srikant, R. (1995). Mining sequential patterns. In Data

Engineering, in Proceedings of the Eleventh International Conference

on (pp. 3–14). IEEE.

Srikant, R., and Agrawal, R. (1996). Mining sequential patterns: Generalizations

and performance improvements. In International Conference

on Extending Database Technology (pp. 1–17). Springer, Berlin,

Heidelberg, Chicago.

Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and

Hsu, M. C. (2001). Prefixspan: Mining sequential patterns efficiently by

prefix-projected pattern growth. In Proceedings of the 17th International

Conference on Data Engineering (pp. 215–224).

Benson, G. (1999). Tandem repeats finder: a program to analyze DNA

sequences. Nucleic Acids Research, 27(2), 573.

Apostolico, A., and Preparata, F. P. (1983). Optimal off-line detection of

repetitions in a string. Theoretical Computer Science, 22(3), 297–315.

Kolpakov, R., and Kucherov, G. (1999). Finding maximal repetitions in

a word in linear time. In 40th Annual Symposium on Foundations of

Computer Science, (pp. 596–604), IEEE.

Wang, D., Wang, G., Wu, Q., and Chen, B. (2005). Finding LPRs in

DNA sequences based on a new index-SUA. In Fifth IEEE Symposium

on Bioinformatics and Bioengineering, BIBE, (pp. 281–284).

Singh, S., Estan, C., Varghese, G., and Savage, S. (2004). Automated

Worm Fingerprinting. In OSDI (Vol. 4, pp. 4–4).

Kim, H. A., and Karp, B. (2004). Autograph: Toward Automated, Distributed

Worm Signature Detection. In USENIX security symposium

(Vol. 286).

Newsome, J., Karp, B., and Song, D. (2005). Polygraph: Automatically

generating signatures for polymorphic worms. In IEEE symposium on

Security and privacy, (pp. 226–241), IEEE.

Park, B. C., Won, Y. J., Kim, M. S., and Hong, J. W. (2008).

Towards automated application signature generation for traffic identification.

In IEEE Network Operations and Management Symposium,

(pp. 160–167), IEEE.

Wang,Y., et al., (2012).Asemantics aware approach to automated reverse

engineering unknown protocols. In 20th IEEE International Conference

on Network Protocols (ICNP), (pp. 1–10), IEEE.

Park, B., Hong, J. W. K., and Won, Y. J. (2011). Toward fine-grained

traffic classification. IEEE Communications Magazine, 49(7).

Yoon, S. H., Park, J. S., and Kim, M. S. (2015). Behavior signature for

fine-grained traffic identification. Appl. Math, 9(2L), 523–534.

Erman, J., Mahanti, A., and Arlitt, M. (2006). Qrp05-4: Internet traffic

identification using machine learning. In Global Telecommunications

Conference, 2006. GLOBECOM’06. IEEE (pp. 1–6), IEEE.

Williams, N., Zander, S., and Armitage, G. (2006).Apreliminary performance

comparison of five machine learning algorithms for practical IP

traffic flow classification. ACM SIGCOMM Computer Communication

Review, 36(5), 5–16.

Dahmouni, H., Vaton, S., and Rossé, D. (2007). A markovian signaturebased

approach to IP traffic classification. In Proceedings of the 3rd

Annual ACM Workshop on Mining Network Data, (pp. 29–34), ACM.

Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Discovery of

frequent episodes in event sequences. Data mining and knowledge

discovery, 1(3), 259–289.

Norvig, P. (2013). English letter frequency counts: Mayzner revisited

or etaoin srhldcu. Dostopno na

[obiskano 2016-08-07].

Wang,Y., et al., (2012).Asemantics aware approach to automated reverse

engineering unknown protocols. In 20th IEEE International Conference

on Network Protocols (ICNP), (pp. 1–10). IEEE.