TEXT-MINING AND PATTERN-MATCHING BASED PREDICTION MODELS FOR DETECTING VULNERABLE FILES IN WEB APPLICATIONS

Authors

  • MUKESH KUMAR GUPTA Department of Computer Science and Engineering Malaviya National Institute of Technology, Jaipur, Rajasthan, India
  • MAHESH CHANDRA GOVIL Department of Computer Science and Engineering Malaviya National Institute of Technology, Jaipur, Rajasthan, India
  • GIRDHARI SINGH Department of Computer Science and Engineering Malaviya National Institute of Technology, Jaipur, Rajasthan, India

Keywords:

Cross-Site Scripting vulnerability, Web Security, Vulnerability Detection, Machine Learning

Abstract

The proliferation of technology has empowered the web applications. At the same time, the presences of Cross-Site Scripting (XSS) vulnerabilities in web applications have become a major concern for all. Despite the many current detection and prevention approaches, attackers are exploiting XSS vulnerabilities continuously and causing significant harm to the web users. In this paper, we formulate the detection of XSS vulnerabilities as a prediction model based classification problem. A novel approach based on text-mining and pattern-matching techniques is proposed to extract a set of features from source code files. The extracted features are used to build prediction models, which can discriminate the vulnerable code files from the benign ones. The efficiency of the developed models is evaluated on a publicly available labeled dataset that contains 9408 PHP labeled (i.e. safe, unsafe) source code files. The experimental results depict the superiority of the proposed approach over existing ones.

Downloads

Download data is not yet available.

References

Shashank Gupta and B.B. Gupta (2016), XSS-SAFE: A Server-Side Approach to Detect and

Mitigate Cross-Site Scripting (XSS) Attacks in JavaScript Code, Arabian Journal for Science and

Engineering, Vol. 41(3), pp. 897–920.

Divya Rishi Sahu and Deepak Singh Tomar (2017), Analysis of Web Application Code Vulnerabilities

using Secure Coding Standards, Arabian Journal for Science and Engineering, Vol. 42, pp.

-895.

Isatou Hydara and Abu Bakar Md. Sultan and Hazura Zulzalil and Novia Admodisastro (2015),

Current state of research on cross-site scripting A systematic literature review, Information and

Software Technology, Vol. 58, pp. 170 - 186.

WhiteHat Security Statistics Report, https://www.whitehatsec.com/categories/

statistics-report/, Accessed: 2015-09-21

Open Web Application Security Project, https://www.owasp.org/index.php/Top_10_2013-Top_

, Accessed: 2016-06-26

G. Deepa and P. Santh Thilagam (2016), Securing Web Applications from Injection and Logic

Vulnerabilities, Inf. Softw. Technol., Vol. 74, pp. 160–180.

D.B. Lowe and J. Eklund (2016), Web application protection techniques: A taxonomy, Journal of

Network and Computer Applications, Vol. 60, pp. 95-112.

Yonghee Shin, A. Meneely, L.Williams, and J.A. Osborne (2011), Evaluating Complexity, Code

Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities, IEEE Transactions

on Software Engineering, Vol. 37(6), pp. 772-787.

Istehad Chowdhury and Mohammad Zulkernine (2011), Using complexity, coupling, and cohesion

metrics as early indicators of vulnerabilities, Journal of Systems Architecture, Vol. 57(3), pp. 294

- 313.

Lwin Khin Shar and Hee Beng Kuan Tan (2013), Predicting SQL injection and cross site scripting

vulnerabilities through mining input sanitization patterns, Information and Software Technology,

Vol. 55(10), pp. 1767 - 1780.

Text-Mining and Pattern-Matching based Prediction Models for ...

Riccardo Scandariato and James Walden and Aram Hovsepyan and Wouter Joosen (2014), Predicting

Vulnerable Components: Software Metrics vs Text Mining, IEEE 25th International Symposium

on Software Reliability Engineering (ISSRE), pp.23-33.

R. Scandariato and J. Walden and A. Hovsepyan and W. Joosen (2014), Predicting Vulnerable

Software Components via Text Mining, IEEE Transactions on Software Engineering, Vol.40(10),

pp. 993-1006.

Lwin Khin Shar and Hee Beng Kuan Tan and Lionel C. Briand, (2013), Mining SQL Injection

and Cross Site Scripting Vulnerabilities Using Hybrid Program Analysis, Proceedings of the 2013

International Conference on Software Engineering(ICSE ’13), pp. 642 - 651.

Extensive and reliable web technology surveys, http://w3techs.com/technologies/overview/

programming_language/all, Accessed: 2015-09-10

Nenad Jovanovic and Christopher Kruegel and Engin Kirda (2006), Pixy: A Static Analysis Tool

for Detecting Web Application Vulnerabilities (Short Paper), Proceedings of the 2006 IEEE Symposium

on Security and Privacy, pp. 258-263.

Ibéria Medeiros and Nuno F. Neves and Miguel Correia, (2014), Automatic Detection and Correction

of Web Application Vulnerabilities Using Data Mining to Predict False Positives, Proceedings

of the 23rd International Conference on World Wide Web (WWW ’14), pp. 63-74.

Johannes Dahse, Static Source Code Vulnerability Analyzer, http://rips-scanner.sourceforge.

net/, Accessed: 2016-07-13

Lwin Khin Shar and Hee Beng Kuan Tan (2012), Automated removal of cross site scripting vulnerabilities

in web applications, Information and Software Technology, Vol. 54, pp. 467-478.

Yao-Wen Huang and Fang Yu and Christian Hang and Chung-Hung Tsai and Der-Tsai Lee and

Sy-Yen Kuo (2004), Securing Web Application Code by Static Analysis and Runtime Protection,

Proceedings of the 13th International Conference on World Wide Web (WWW ’04), pp. 40–52.

Nenad Jovanovic and Christopher Kruegel and Engin Kirda (2006), Precise Alias Analysis for

Static Detection of Web Application Vulnerabilities, Proceedings of the 2006 Workshop on Programming

Languages and Analysis for Security ( PLAS ’06), pp. 27-36.

G. Wassermann and Su Zhendong (2008), Static detection of cross-site scripting vulnerabilities,

ACM/IEEE 30th International Conference on Software Engineering(ICSE ’08), pp. 171-180.

Lwin Khin Shar and Hee Beng Kuan Tan (2012), Predicting Common Web Application Vulnerabilities

from Input Validation and Sanitization Code Patterns, Proceedings of the 27th IEEE/ACM

International Conference on Automated Software Engineering (ASE 2012), pp. 310-313.

Aram Hovsepyan and Riccardo Scandariato, and Wouter Joosen and James Walden, (2012), Software

Vulnerability Prediction Using Text Analysis Techniques, Proceedings of the 4th International

Workshop on Security Measurements and Metrics (MetriSec ’12), pp. 7-10.

Eibe Frank and Mark Hall and Peter Reutemann and Len Trigg, WEKA: Data Mining Tool,

http://www.cs.waikato.ac.nz/ml/weka, Accessed: 2016-06-26

Ian H. Witten and Eibe Frank and Mark A. Hall (2011), Data Mining: Practical Machine Learning

Tools and Techniques, Morgan Kaufmann Publishers Inc.

Aurelien DELAITRE, Bertrand STIVALET, PHP Vulnerabilities Test Suite, https://github.

com/stivalet/PHP-Vulnerability-test-suite, Accessed: 2014-07-13

Prateek Saxena and David Molnar and Benjamin Livshits, (2011), SCRIPTGARD: Automatic

Context-sensitive Sanitization for Large-scale Legacy Web Applications, Proceedings of the 18th

ACM Conference on Computer and Communications Security (CCS ’11), pp. 601–614.

Peng Li and Baojiang Cui (2010), A comparative study on software vulnerability static analysis

techniques and tools, IEEE International Conference on Information Theory and Information

Security (ICITIS), pp. 521-524.

G. Agosta and A. Barenghi and A. Parata and G. Pelosi (2012), Automated Security Analysis of

Dynamic Web Applications through Symbolic Code Execution, Ninth International Conference on

Information Technology: New Generations (ITNG), pp. 189-194.

Common Vulnerabilities and Exposures, https://cve.mitre.org/

XSS Attack Information, http://www.xssed.com/, Accessed: 2015-09-20

Downloads

Issue

Section

Articles