MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: \MENU" AND \PROBABILITY" MODELS

  • SURYAKANT CHOUDHARY EECS, Faculty of Engineering, University of Ottawa, 800, King Edward Avenue Ottawa, Ontario, K1N 6N5, Canada
  • EMRE DINCTURK EECS, Faculty of Engineering, University of Ottawa, 800, King Edward Avenue Ottawa, Ontario, K1N 6N5, Canada
  • SEYED MIRTAHERI EECS, Faculty of Engineering, University of Ottawa, 800, King Edward Avenue Ottawa, Ontario, K1N 6N5, Canada
  • GREGOR v. BOCHMANN EECS, Faculty of Engineering, University of Ottawa, 800, King Edward Avenue Ottawa, Ontario, K1N 6N5, Canada
  • GUY-VINCENT JOURDAN EECS, Faculty of Engineering, University of Ottawa, 800, King Edward Avenue Ottawa, Ontario, K1N 6N5, Canada
  • IOSIF VIOREL ONUT ONUT IBM Canada Software Lab Canada, Research and Development, IBMR Security AppScan R Enterprise Ottawa, Ontario, Canada
Keywords: Crawling, RIAs, AJAX, Modelling

Abstract

Strategies for \crawling" Web sites eciently have been described more than a decade ago. Since then, Web applications have come a long way both in terms of adoption to provide information and services and in terms of technologies to develop them. With the emergence of richer and more advanced technologies such as AJAX, \Rich Internet Applications" (RIAs) have become more interactive, more responsive and generally more user friendly. Unfortunately, we have also lost our ability to crawl them. Building models of applications automatically is important not only for indexing content, but also to do automated testing, automated security assessments, automated accessibility assessment and in general to use software engineering tools. We must regain our ability to eciently construct models for these RIAs. In this paper, we present two methods, based on \Model-Based Crawling" (MBC) rst introduced in [1]: the \menu" model and the \probability" model. These two methods are shown to be more eective at extracting models than previously published methods, and are much simpler to im- plement than previous models for MBC. A distributed implementation of the probability model is also discussed. We compare these methods and others against a set of experi- mental and \real" RIAs, showing that in our experiments, these methods nd the set of client states faster than other approaches, and often nish the crawl faster as well.

 

Downloads

Download data is not yet available.

References

Kamara Benjamin, Gregor von Bochmann, Mustafa Emre Dincturk, Guy-Vincent Jourdan, and

Iosif Viorel Onut. A strategy for ecient crawling of rich internet applications. In Proceedings of

the 11th international conference on Web engineering, ICWE'11, pages 74{89, 2011.

Christopher Olston and Marc Najork. Web crawling. Found. Trends Inf. Retr., 4(3):175{246,

March 2010.

Seyed M Mirtaheri, Mustafa Emre Dincturk, Salman Hooshmand, Gregor V Bochmann, Guy-

Vincent Jourdan, and Iosif-Viorel Onut. A brief history of web crawlers. In CASCON, 2013.

World Wide Web Consortium (W3C). Document Object Model (DOM). http://www.w3.org/

DOM/, 2005. [Online].

Jesse James Garrett. Ajax: A new approach to web applications. http://www.adaptivepath.

com/publications/essays/archives/000385.php, 2005. [Online].

Suryakant Choudhary, Mustafa Emre Dincturk, Seyed M. Mirtaheri, Ali Moosavi, Gregor von

Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut. Crawling rich internet applications: the

state of the art. In Proceedings of the 2012 Conference of the Center for Advanced Studies on

Collaborative Research, CASCON '12, pages 146{160, 2012.

Suryakant Choudhary, Mustafa Emre Dincturk, Gregor V. Bochmann, Guy-Vincent Jourdan,

Iosif Viorel Onut, and Paul Ionescu. Solving some modeling challenges when testing rich inter-

net applications for security. Software Testing, Veri cation, and Validation, 2012 International

Conference on, pages 850{857, 2012.

Mustafa Emre Dincturk, Guy-Vincent Jourdan, Gregor von Bochmann, and Iosif Viorel Onut. A

model-based approach for crawling rich internet applications. ACM Transactions on the WEB,

page to appear, 2014.

Suryakant Choudhary, Mustafa Emre Dincturk, Seyed M. Mirtaheri, Guy-Vincent Jourdan, Gregor

v. Bochmann, and Iosif Viorel Onut. Building rich internet applications models: Example of a

better strategy. In Florian Daniel, Peter Dolog, and Qing Li, editors, Web Engineering, volume

of Lecture Notes in Computer Science, pages 291{305. Springer, 2013.

Mustafa Emre Dincturk, Suryakant Choudhary, Gregor von Bochmann, Guy-Vincent Jourdan,

and Iosif Viorel Onut. A statistical approach for ecient crawling of rich internet applications.

In Proceedings of the 12th international conference on Web engineering, ICWE'12, pages 74{89,

Mustafa Emre Dincturk. Model-based Crawling - An Approach to Design Ecient Crawling

Strategies for Rich Internet Applications. PhD thesis, EECS - University of Ottawa, 2013.

http://ssrg.site.uottawa.ca/docs/Dincturk_MustafaEmre_2013_thesis.pdf.

H. A. Eiselt, Michel Gendreau, and Gilbert Laporte. Arc routing problems, part ii: The rural

postman problem. Operations Research, 43(3):pp. 399{414, 1995.

Suryakant Choudhary. M-crawler: Crawling rich internet applications using menu meta-model.

Master's thesis, EECS - University of Ottawa, 2012. http://ssrg.site.uottawa.ca/docs/

Surya-Thesis.pdf.

Sandy L. Zabell. The rule of succession. Erkenntnis, 31:283{321, 1989.

Seyed M. Mirtaheri, Di Zou, Gregor V. Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut.

Dist-ria crawler: A distributed crawler for rich internet applications. In In Proc. 8TH International

Conference On P2P, Parallel, Grid, Cloud And Internet Computing, 2013.

Zhaomeng Peng, Nengqiang He, Chunxiao Jiang, Zhihua Li, Lei Xu, Yipeng Li, and Yong Ren.

Graph-based ajax crawl: Mining data from rich internet applications. In Computer Science and

Electronics Engineering (ICCSEE), 2012 International Conference on, volume 3, pages 590 {594,

march 2012.

G. Carpaneto, M. Dell'Amico, and P. Toth. Exact solution of large-scale, asymmetric traveling

salesman problems. ACM Trans. Math. Softw., 21(4):394{409, December 1995.

Kamara Benjamin, Gregor v. Bochmann, Guy-Vincent Jourdan, and Iosif-Viorel Onut. Some

modeling challenges when testing rich internet applications for security. In Proceedings of the

Third International Conference on Software Testing, Veri cation, and Validation Workshops,

ICSTW '10, pages 403{409, Washington, DC, USA, 2010. IEEE Computer Society.

Cristian Duda, Gianni Frey, Donald Kossmann, and Chong Zhou. Ajaxsearch: crawling, indexing

and searching web 2.0 applications. Proc. VLDB Endow., 1(2):1440{1443, August 2008.

Cristian Duda, Gianni Frey, Donald Kossmann, Reto Matter, and Chong Zhou. Ajax crawl: Making

ajax applications searchable. In Proceedings of the 2009 IEEE International Conference on Data

Engineering, ICDE '09, pages 78{89, Washington, DC, USA, 2009. IEEE Computer Society.

Gianni Frey. Indexing ajax web applications. Master's thesis, ETH Zurich, 2007. http:

//e-collection.library.ethz.ch/eserv/eth:30111/eth-30111-01.pdf.

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation rank-

ing: Bringing order to the web, 1998. Standford University, Technical Report.

Guoshi Wu and Fanfan Liu. Web crawler for event-driven crawling of ajax-based web applications.

In Emerging Technologies for Information Systems, Computing, and Management, pages 191{200.

Springer, 2013.

Ali Moosavi. Component-based crawling of complex rich internet applications. Master's thesis,

EECS - University of Ottawa, 2014. http://ssrg.site.uottawa.ca/docs/Ali-Moosavi-Thesis.

pdf.

Danny Roest, Ali Mesbah, and Arie van Deursen. Regression testing ajax applications: Coping

with dynamism. In ICST, pages 127{136. IEEE Computer Society, 2010.

Cor-Paul Bezemer, Ali Mesbah, and Arie van Deursen. Automated security testing of web wid-

get interactions. In Proceedings of the the 7th joint meeting of the European software engineer-

ing conference and the ACM SIGSOFT symposium on The foundations of software engineering,

ESEC/FSE '09, pages 81{90, New York, NY, USA, 2009. ACM.

Ali Mesbah and Arie van Deursen. Invariant-based automatic testing of ajax user interfaces. In

Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on, pages 210 {220,

may 2009.

Ali Mesbah, Engin Bozdag, and Arie van Deursen. Crawling ajax by inferring user interface state

changes. In Proceedings of the 2008 Eighth International Conference on Web Engineering, ICWE

'08, pages 122{134, Washington, DC, USA, 2008. IEEE Computer Society.

Ali Mesbah, Arie van Deursen, and Stefan Lenselink. Crawling ajax-based web applications

through dynamic analysis of user interface state changes. ACM Transactions on the WEB, 6(1):3,

VI Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet

Physics Doklady, 10:707, 1966.

Alessandro Marchetto, Paolo Tonella, and Filippo Ricca. State-based testing of ajax web applica-

tions. In Proceedings of the 2008 International Conference on Software Testing, Veri cation, and

Validation, ICST '08, pages 121{130, Washington, DC, USA, 2008. IEEE Computer Society.

Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Mller, and Frank Tip. A framework for

automated testing of JavaScript web applications. In Proc. 33rd International Conference on

Software Engineering (ICSE), May 2011.

Shabnam Mirshokraie, Ali Mesbah, and Karthik Pattabiraman. Pythia: Generating test cases with

oracles for javascript applications. In Proceedings of the ACM/IEEE International Conference

on Automated Software Engineering (ASE), New Ideas Track, pages 610{615. IEEE Computer

Society, 2013.

Amin Milani Fard and Ali Mesbah. Feedback-directed exploration of web applications to derive

test models. In Proceedings of the 24th IEEE International Symposium on Software Reliability

Engineering (ISSRE), page 10 pages. IEEE Computer Society, 2013.

Domenico Amal tano, Anna Rita Fasolino, and Por rio Tramontana. Reverse engineering nite

state machines from rich internet applications. In Proceedings of the 2008 15th Working Conference

on Reverse Engineering, WCRE '08, pages 69{73, Washington, DC, USA, 2008. IEEE Computer

Society.

Domenico Amal tano, Anna Rita Fasolino, and Por rio Tramontana. Rich internet application

testing using execution trace data. In Proceedings of the 2010 Third International Conference on

Software Testing, Veri cation, and Validation Workshops, ICSTW '10, pages 274{283, Washing-

ton, DC, USA, 2010. IEEE Computer Society.

Creria. http://wpage.unina.it/ptramont/downloads.htm.

Domenico Amal tano, Anna Rita Fasolino, Armando Polcaro, and Por rio Tramontana. Dynaria:

A tool for ajax web application comprehension. In ICPC, pages 46{47. IEEE Computer Society,

Atif M. Memon, Ishan Banerjee, and Adithya Nagarajan. GUI ripping: Reverse engineering of

graphical user interfaces for testing. In Proceedings of The 10th Working Conference on Reverse

Engineering, November 2003.

Domenico Amal tano, Anna Rita Fasolino, and Por rio Tramontana. A gui crawling-based tech-

nique for android mobile application testing. In Proceedings of the 2011 IEEE Fourth International

Conference on Software Testing, Veri cation and Validation Workshops, ICSTW '11, pages 252{

, Washington, DC, USA, 2011. IEEE Computer Society.

Domenico Amal tano, Anna Rita Fasolino, Por rio Tramontana, Salvatore De Carmine, and

Atif M. Memon. Using gui ripping for automated testing of android applications. In Proceed-

ings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE

, pages 258{261, New York, NY, USA, 2012. ACM.

M. Erfani and A. Mesbah. Reverse engineering ios mobile applications. In 19th Working Confer-

ence on Reverse Engineering, (WCRE'12), 2012.

James Lo, Eric Wohlstadter, and Ali Mesbah. Imagen: Runtime migration of browser sessions

for javascript web applications. In Proceedings of the International World Wide Web Conference

(WWW), pages 815{825. ACM, 2013.

Published
2014-04-30
Section
Articles