STAFF: Automated Signature Generation for Fine-Grained Function Traffic Identification
Keywords:
automated signature generation, application function signature, application traffic identificationAbstract
Identifying a user operating application function can reflect the user behavior, or even can help to improve the user experience. It is the focus of the real application in big data analytics technology. Unlike Coarse-grained Traffic Identification (CTI) which only identify application/protocol that a packet is related to, Fine-grained Function Traffic Identification (FFTI) maps the traffic packet to a meaningful user operation or an application function. In this paper, our focus is to identify the fine-grained function signature. We propose an automatic and stable signature generation method, so-called STAFF, to identify different application functions. STAFF treats data packets as long strings. The aim of our method is to find all the string fragments whose length is longer than a prescribed length and whose occurrence is higher than a prescribed frequency. The final signature will be presented as pairs of string fragments and their corresponding occurrence frequency. The experimental results show that STAFF can automatically generate finegrained function signatures in different applications with average 93.65% identification accuracy and the method is noise insensitive.
Downloads
References
Moore, A. W., and Papagiannaki, K. (2005). Toward the accurate identification
of network applications. In International Workshop on Passive
and Active Network Measurement (Vol. 5, pp. 41–54). Springer, Berlin,
Heidelberg.
Nguyen, T. T., and Armitage, G. (2008).Asurvey of techniques for internet
traffic classification using machine learning. IEEE Communications
Surveys & Tutorials, 10(4), 56–76.
Choi, Y., Chung, J. Y., Park, B., and Hong, J. W. K. (2012). Automated
classifier generation for application-level mobile traffic identification.
In Network Operations and Management Symposium (NOMS), 2012
IEEE (pp. 1075–1081). IEEE.
Dainotti, A., Gargiulo, F., Kuncheva, L. I., Pescapè, A., and Sansone,
C. (2010). Identification of traffic flows hiding behind TCP port 80.
In International Conference on Communications (ICC), 2010 IEEE
(pp. 1–6). IEEE.
Agrawal, R., and Srikant, R. (1995). Mining sequential patterns. In Data
Engineering, in Proceedings of the Eleventh International Conference
on (pp. 3–14). IEEE.
Srikant, R., and Agrawal, R. (1996). Mining sequential patterns: Generalizations
and performance improvements. In International Conference
on Extending Database Technology (pp. 1–17). Springer, Berlin,
Heidelberg, Chicago.
Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., and
Hsu, M. C. (2001). Prefixspan: Mining sequential patterns efficiently by
prefix-projected pattern growth. In Proceedings of the 17th International
Conference on Data Engineering (pp. 215–224).
Benson, G. (1999). Tandem repeats finder: a program to analyze DNA
sequences. Nucleic Acids Research, 27(2), 573.
Apostolico, A., and Preparata, F. P. (1983). Optimal off-line detection of
repetitions in a string. Theoretical Computer Science, 22(3), 297–315.
Kolpakov, R., and Kucherov, G. (1999). Finding maximal repetitions in
a word in linear time. In 40th Annual Symposium on Foundations of
Computer Science, (pp. 596–604), IEEE.
Wang, D., Wang, G., Wu, Q., and Chen, B. (2005). Finding LPRs in
DNA sequences based on a new index-SUA. In Fifth IEEE Symposium
on Bioinformatics and Bioengineering, BIBE, (pp. 281–284).
Singh, S., Estan, C., Varghese, G., and Savage, S. (2004). Automated
Worm Fingerprinting. In OSDI (Vol. 4, pp. 4–4).
Kim, H. A., and Karp, B. (2004). Autograph: Toward Automated, Distributed
Worm Signature Detection. In USENIX security symposium
(Vol. 286).
Newsome, J., Karp, B., and Song, D. (2005). Polygraph: Automatically
generating signatures for polymorphic worms. In IEEE symposium on
Security and privacy, (pp. 226–241), IEEE.
Park, B. C., Won, Y. J., Kim, M. S., and Hong, J. W. (2008).
Towards automated application signature generation for traffic identification.
In IEEE Network Operations and Management Symposium,
(pp. 160–167), IEEE.
Wang,Y., et al., (2012).Asemantics aware approach to automated reverse
engineering unknown protocols. In 20th IEEE International Conference
on Network Protocols (ICNP), (pp. 1–10), IEEE.
Park, B., Hong, J. W. K., and Won, Y. J. (2011). Toward fine-grained
traffic classification. IEEE Communications Magazine, 49(7).
Yoon, S. H., Park, J. S., and Kim, M. S. (2015). Behavior signature for
fine-grained traffic identification. Appl. Math, 9(2L), 523–534.
Erman, J., Mahanti, A., and Arlitt, M. (2006). Qrp05-4: Internet traffic
identification using machine learning. In Global Telecommunications
Conference, 2006. GLOBECOM’06. IEEE (pp. 1–6), IEEE.
Williams, N., Zander, S., and Armitage, G. (2006).Apreliminary performance
comparison of five machine learning algorithms for practical IP
traffic flow classification. ACM SIGCOMM Computer Communication
Review, 36(5), 5–16.
Dahmouni, H., Vaton, S., and Rossé, D. (2007). A markovian signaturebased
approach to IP traffic classification. In Proceedings of the 3rd
Annual ACM Workshop on Mining Network Data, (pp. 29–34), ACM.
Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Discovery of
frequent episodes in event sequences. Data mining and knowledge
discovery, 1(3), 259–289.
Norvig, P. (2013). English letter frequency counts: Mayzner revisited
or etaoin srhldcu. Dostopno na http://www.norvig.com/mayzner.html
[obiskano 2016-08-07].
Wang,Y., et al., (2012).Asemantics aware approach to automated reverse
engineering unknown protocols. In 20th IEEE International Conference
on Network Protocols (ICNP), (pp. 1–10). IEEE.