TEXT-MINING AND PATTERN-MATCHING BASED PREDICTION MODELS FOR DETECTING VULNERABLE FILES IN WEB APPLICATIONS
Keywords:
Cross-Site Scripting vulnerability, Web Security, Vulnerability Detection, Machine LearningAbstract
The proliferation of technology has empowered the web applications. At the same time, the presences of Cross-Site Scripting (XSS) vulnerabilities in web applications have become a major concern for all. Despite the many current detection and prevention approaches, attackers are exploiting XSS vulnerabilities continuously and causing significant harm to the web users. In this paper, we formulate the detection of XSS vulnerabilities as a prediction model based classification problem. A novel approach based on text-mining and pattern-matching techniques is proposed to extract a set of features from source code files. The extracted features are used to build prediction models, which can discriminate the vulnerable code files from the benign ones. The efficiency of the developed models is evaluated on a publicly available labeled dataset that contains 9408 PHP labeled (i.e. safe, unsafe) source code files. The experimental results depict the superiority of the proposed approach over existing ones.
Downloads
References
Shashank Gupta and B.B. Gupta (2016), XSS-SAFE: A Server-Side Approach to Detect and
Mitigate Cross-Site Scripting (XSS) Attacks in JavaScript Code, Arabian Journal for Science and
Engineering, Vol. 41(3), pp. 897–920.
Divya Rishi Sahu and Deepak Singh Tomar (2017), Analysis of Web Application Code Vulnerabilities
using Secure Coding Standards, Arabian Journal for Science and Engineering, Vol. 42, pp.
-895.
Isatou Hydara and Abu Bakar Md. Sultan and Hazura Zulzalil and Novia Admodisastro (2015),
Current state of research on cross-site scripting A systematic literature review, Information and
Software Technology, Vol. 58, pp. 170 - 186.
WhiteHat Security Statistics Report, https://www.whitehatsec.com/categories/
statistics-report/, Accessed: 2015-09-21
Open Web Application Security Project, https://www.owasp.org/index.php/Top_10_2013-Top_
, Accessed: 2016-06-26
G. Deepa and P. Santh Thilagam (2016), Securing Web Applications from Injection and Logic
Vulnerabilities, Inf. Softw. Technol., Vol. 74, pp. 160–180.
D.B. Lowe and J. Eklund (2016), Web application protection techniques: A taxonomy, Journal of
Network and Computer Applications, Vol. 60, pp. 95-112.
Yonghee Shin, A. Meneely, L.Williams, and J.A. Osborne (2011), Evaluating Complexity, Code
Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities, IEEE Transactions
on Software Engineering, Vol. 37(6), pp. 772-787.
Istehad Chowdhury and Mohammad Zulkernine (2011), Using complexity, coupling, and cohesion
metrics as early indicators of vulnerabilities, Journal of Systems Architecture, Vol. 57(3), pp. 294
- 313.
Lwin Khin Shar and Hee Beng Kuan Tan (2013), Predicting SQL injection and cross site scripting
vulnerabilities through mining input sanitization patterns, Information and Software Technology,
Vol. 55(10), pp. 1767 - 1780.
Text-Mining and Pattern-Matching based Prediction Models for ...
Riccardo Scandariato and James Walden and Aram Hovsepyan and Wouter Joosen (2014), Predicting
Vulnerable Components: Software Metrics vs Text Mining, IEEE 25th International Symposium
on Software Reliability Engineering (ISSRE), pp.23-33.
R. Scandariato and J. Walden and A. Hovsepyan and W. Joosen (2014), Predicting Vulnerable
Software Components via Text Mining, IEEE Transactions on Software Engineering, Vol.40(10),
pp. 993-1006.
Lwin Khin Shar and Hee Beng Kuan Tan and Lionel C. Briand, (2013), Mining SQL Injection
and Cross Site Scripting Vulnerabilities Using Hybrid Program Analysis, Proceedings of the 2013
International Conference on Software Engineering(ICSE ’13), pp. 642 - 651.
Extensive and reliable web technology surveys, http://w3techs.com/technologies/overview/
programming_language/all, Accessed: 2015-09-10
Nenad Jovanovic and Christopher Kruegel and Engin Kirda (2006), Pixy: A Static Analysis Tool
for Detecting Web Application Vulnerabilities (Short Paper), Proceedings of the 2006 IEEE Symposium
on Security and Privacy, pp. 258-263.
Ibéria Medeiros and Nuno F. Neves and Miguel Correia, (2014), Automatic Detection and Correction
of Web Application Vulnerabilities Using Data Mining to Predict False Positives, Proceedings
of the 23rd International Conference on World Wide Web (WWW ’14), pp. 63-74.
Johannes Dahse, Static Source Code Vulnerability Analyzer, http://rips-scanner.sourceforge.
net/, Accessed: 2016-07-13
Lwin Khin Shar and Hee Beng Kuan Tan (2012), Automated removal of cross site scripting vulnerabilities
in web applications, Information and Software Technology, Vol. 54, pp. 467-478.
Yao-Wen Huang and Fang Yu and Christian Hang and Chung-Hung Tsai and Der-Tsai Lee and
Sy-Yen Kuo (2004), Securing Web Application Code by Static Analysis and Runtime Protection,
Proceedings of the 13th International Conference on World Wide Web (WWW ’04), pp. 40–52.
Nenad Jovanovic and Christopher Kruegel and Engin Kirda (2006), Precise Alias Analysis for
Static Detection of Web Application Vulnerabilities, Proceedings of the 2006 Workshop on Programming
Languages and Analysis for Security ( PLAS ’06), pp. 27-36.
G. Wassermann and Su Zhendong (2008), Static detection of cross-site scripting vulnerabilities,
ACM/IEEE 30th International Conference on Software Engineering(ICSE ’08), pp. 171-180.
Lwin Khin Shar and Hee Beng Kuan Tan (2012), Predicting Common Web Application Vulnerabilities
from Input Validation and Sanitization Code Patterns, Proceedings of the 27th IEEE/ACM
International Conference on Automated Software Engineering (ASE 2012), pp. 310-313.
Aram Hovsepyan and Riccardo Scandariato, and Wouter Joosen and James Walden, (2012), Software
Vulnerability Prediction Using Text Analysis Techniques, Proceedings of the 4th International
Workshop on Security Measurements and Metrics (MetriSec ’12), pp. 7-10.
Eibe Frank and Mark Hall and Peter Reutemann and Len Trigg, WEKA: Data Mining Tool,
http://www.cs.waikato.ac.nz/ml/weka, Accessed: 2016-06-26
Ian H. Witten and Eibe Frank and Mark A. Hall (2011), Data Mining: Practical Machine Learning
Tools and Techniques, Morgan Kaufmann Publishers Inc.
Aurelien DELAITRE, Bertrand STIVALET, PHP Vulnerabilities Test Suite, https://github.
com/stivalet/PHP-Vulnerability-test-suite, Accessed: 2014-07-13
Prateek Saxena and David Molnar and Benjamin Livshits, (2011), SCRIPTGARD: Automatic
Context-sensitive Sanitization for Large-scale Legacy Web Applications, Proceedings of the 18th
ACM Conference on Computer and Communications Security (CCS ’11), pp. 601–614.
Peng Li and Baojiang Cui (2010), A comparative study on software vulnerability static analysis
techniques and tools, IEEE International Conference on Information Theory and Information
Security (ICITIS), pp. 521-524.
G. Agosta and A. Barenghi and A. Parata and G. Pelosi (2012), Automated Security Analysis of
Dynamic Web Applications through Symbolic Code Execution, Ninth International Conference on
Information Technology: New Generations (ITNG), pp. 189-194.
Common Vulnerabilities and Exposures, https://cve.mitre.org/
XSS Attack Information, http://www.xssed.com/, Accessed: 2015-09-20