MODEL-BASED RICH INTERNET APPLICATIONS CRAWLING: \MENU" AND \PROBABILITY" MODELS
Keywords:
Crawling, RIAs, AJAX, ModellingAbstract
Strategies for \crawling" Web sites eciently have been described more than a decade ago. Since then, Web applications have come a long way both in terms of adoption to provide information and services and in terms of technologies to develop them. With the emergence of richer and more advanced technologies such as AJAX, \Rich Internet Applications" (RIAs) have become more interactive, more responsive and generally more user friendly. Unfortunately, we have also lost our ability to crawl them. Building models of applications automatically is important not only for indexing content, but also to do automated testing, automated security assessments, automated accessibility assessment and in general to use software engineering tools. We must regain our ability to eciently construct models for these RIAs. In this paper, we present two methods, based on \Model-Based Crawling" (MBC) rst introduced in [1]: the \menu" model and the \probability" model. These two methods are shown to be more eective at extracting models than previously published methods, and are much simpler to im- plement than previous models for MBC. A distributed implementation of the probability model is also discussed. We compare these methods and others against a set of experi- mental and \real" RIAs, showing that in our experiments, these methods nd the set of client states faster than other approaches, and often nish the crawl faster as well.
Downloads
References
Kamara Benjamin, Gregor von Bochmann, Mustafa Emre Dincturk, Guy-Vincent Jourdan, and
Iosif Viorel Onut. A strategy for ecient crawling of rich internet applications. In Proceedings of
the 11th international conference on Web engineering, ICWE'11, pages 74{89, 2011.
Christopher Olston and Marc Najork. Web crawling. Found. Trends Inf. Retr., 4(3):175{246,
March 2010.
Seyed M Mirtaheri, Mustafa Emre Dincturk, Salman Hooshmand, Gregor V Bochmann, Guy-
Vincent Jourdan, and Iosif-Viorel Onut. A brief history of web crawlers. In CASCON, 2013.
World Wide Web Consortium (W3C). Document Object Model (DOM). http://www.w3.org/
DOM/, 2005. [Online].
Jesse James Garrett. Ajax: A new approach to web applications. http://www.adaptivepath.
com/publications/essays/archives/000385.php, 2005. [Online].
Suryakant Choudhary, Mustafa Emre Dincturk, Seyed M. Mirtaheri, Ali Moosavi, Gregor von
Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut. Crawling rich internet applications: the
state of the art. In Proceedings of the 2012 Conference of the Center for Advanced Studies on
Collaborative Research, CASCON '12, pages 146{160, 2012.
Suryakant Choudhary, Mustafa Emre Dincturk, Gregor V. Bochmann, Guy-Vincent Jourdan,
Iosif Viorel Onut, and Paul Ionescu. Solving some modeling challenges when testing rich inter-
net applications for security. Software Testing, Veri cation, and Validation, 2012 International
Conference on, pages 850{857, 2012.
Mustafa Emre Dincturk, Guy-Vincent Jourdan, Gregor von Bochmann, and Iosif Viorel Onut. A
model-based approach for crawling rich internet applications. ACM Transactions on the WEB,
page to appear, 2014.
Suryakant Choudhary, Mustafa Emre Dincturk, Seyed M. Mirtaheri, Guy-Vincent Jourdan, Gregor
v. Bochmann, and Iosif Viorel Onut. Building rich internet applications models: Example of a
better strategy. In Florian Daniel, Peter Dolog, and Qing Li, editors, Web Engineering, volume
of Lecture Notes in Computer Science, pages 291{305. Springer, 2013.
Mustafa Emre Dincturk, Suryakant Choudhary, Gregor von Bochmann, Guy-Vincent Jourdan,
and Iosif Viorel Onut. A statistical approach for ecient crawling of rich internet applications.
In Proceedings of the 12th international conference on Web engineering, ICWE'12, pages 74{89,
Mustafa Emre Dincturk. Model-based Crawling - An Approach to Design Ecient Crawling
Strategies for Rich Internet Applications. PhD thesis, EECS - University of Ottawa, 2013.
http://ssrg.site.uottawa.ca/docs/Dincturk_MustafaEmre_2013_thesis.pdf.
H. A. Eiselt, Michel Gendreau, and Gilbert Laporte. Arc routing problems, part ii: The rural
postman problem. Operations Research, 43(3):pp. 399{414, 1995.
Suryakant Choudhary. M-crawler: Crawling rich internet applications using menu meta-model.
Master's thesis, EECS - University of Ottawa, 2012. http://ssrg.site.uottawa.ca/docs/
Surya-Thesis.pdf.
Sandy L. Zabell. The rule of succession. Erkenntnis, 31:283{321, 1989.
Seyed M. Mirtaheri, Di Zou, Gregor V. Bochmann, Guy-Vincent Jourdan, and Iosif Viorel Onut.
Dist-ria crawler: A distributed crawler for rich internet applications. In In Proc. 8TH International
Conference On P2P, Parallel, Grid, Cloud And Internet Computing, 2013.
Zhaomeng Peng, Nengqiang He, Chunxiao Jiang, Zhihua Li, Lei Xu, Yipeng Li, and Yong Ren.
Graph-based ajax crawl: Mining data from rich internet applications. In Computer Science and
Electronics Engineering (ICCSEE), 2012 International Conference on, volume 3, pages 590 {594,
march 2012.
G. Carpaneto, M. Dell'Amico, and P. Toth. Exact solution of large-scale, asymmetric traveling
salesman problems. ACM Trans. Math. Softw., 21(4):394{409, December 1995.
Kamara Benjamin, Gregor v. Bochmann, Guy-Vincent Jourdan, and Iosif-Viorel Onut. Some
modeling challenges when testing rich internet applications for security. In Proceedings of the
Third International Conference on Software Testing, Veri cation, and Validation Workshops,
ICSTW '10, pages 403{409, Washington, DC, USA, 2010. IEEE Computer Society.
Cristian Duda, Gianni Frey, Donald Kossmann, and Chong Zhou. Ajaxsearch: crawling, indexing
and searching web 2.0 applications. Proc. VLDB Endow., 1(2):1440{1443, August 2008.
Cristian Duda, Gianni Frey, Donald Kossmann, Reto Matter, and Chong Zhou. Ajax crawl: Making
ajax applications searchable. In Proceedings of the 2009 IEEE International Conference on Data
Engineering, ICDE '09, pages 78{89, Washington, DC, USA, 2009. IEEE Computer Society.
Gianni Frey. Indexing ajax web applications. Master's thesis, ETH Zurich, 2007. http:
//e-collection.library.ethz.ch/eserv/eth:30111/eth-30111-01.pdf.
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation rank-
ing: Bringing order to the web, 1998. Standford University, Technical Report.
Guoshi Wu and Fanfan Liu. Web crawler for event-driven crawling of ajax-based web applications.
In Emerging Technologies for Information Systems, Computing, and Management, pages 191{200.
Springer, 2013.
Ali Moosavi. Component-based crawling of complex rich internet applications. Master's thesis,
EECS - University of Ottawa, 2014. http://ssrg.site.uottawa.ca/docs/Ali-Moosavi-Thesis.
pdf.
Danny Roest, Ali Mesbah, and Arie van Deursen. Regression testing ajax applications: Coping
with dynamism. In ICST, pages 127{136. IEEE Computer Society, 2010.
Cor-Paul Bezemer, Ali Mesbah, and Arie van Deursen. Automated security testing of web wid-
get interactions. In Proceedings of the the 7th joint meeting of the European software engineer-
ing conference and the ACM SIGSOFT symposium on The foundations of software engineering,
ESEC/FSE '09, pages 81{90, New York, NY, USA, 2009. ACM.
Ali Mesbah and Arie van Deursen. Invariant-based automatic testing of ajax user interfaces. In
Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on, pages 210 {220,
may 2009.
Ali Mesbah, Engin Bozdag, and Arie van Deursen. Crawling ajax by inferring user interface state
changes. In Proceedings of the 2008 Eighth International Conference on Web Engineering, ICWE
'08, pages 122{134, Washington, DC, USA, 2008. IEEE Computer Society.
Ali Mesbah, Arie van Deursen, and Stefan Lenselink. Crawling ajax-based web applications
through dynamic analysis of user interface state changes. ACM Transactions on the WEB, 6(1):3,
VI Levenshtein. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet
Physics Doklady, 10:707, 1966.
Alessandro Marchetto, Paolo Tonella, and Filippo Ricca. State-based testing of ajax web applica-
tions. In Proceedings of the 2008 International Conference on Software Testing, Veri cation, and
Validation, ICST '08, pages 121{130, Washington, DC, USA, 2008. IEEE Computer Society.
Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Mller, and Frank Tip. A framework for
automated testing of JavaScript web applications. In Proc. 33rd International Conference on
Software Engineering (ICSE), May 2011.
Shabnam Mirshokraie, Ali Mesbah, and Karthik Pattabiraman. Pythia: Generating test cases with
oracles for javascript applications. In Proceedings of the ACM/IEEE International Conference
on Automated Software Engineering (ASE), New Ideas Track, pages 610{615. IEEE Computer
Society, 2013.
Amin Milani Fard and Ali Mesbah. Feedback-directed exploration of web applications to derive
test models. In Proceedings of the 24th IEEE International Symposium on Software Reliability
Engineering (ISSRE), page 10 pages. IEEE Computer Society, 2013.
Domenico Amal tano, Anna Rita Fasolino, and Por rio Tramontana. Reverse engineering nite
state machines from rich internet applications. In Proceedings of the 2008 15th Working Conference
on Reverse Engineering, WCRE '08, pages 69{73, Washington, DC, USA, 2008. IEEE Computer
Society.
Domenico Amal tano, Anna Rita Fasolino, and Por rio Tramontana. Rich internet application
testing using execution trace data. In Proceedings of the 2010 Third International Conference on
Software Testing, Veri cation, and Validation Workshops, ICSTW '10, pages 274{283, Washing-
ton, DC, USA, 2010. IEEE Computer Society.
Creria. http://wpage.unina.it/ptramont/downloads.htm.
Domenico Amal tano, Anna Rita Fasolino, Armando Polcaro, and Por rio Tramontana. Dynaria:
A tool for ajax web application comprehension. In ICPC, pages 46{47. IEEE Computer Society,
Atif M. Memon, Ishan Banerjee, and Adithya Nagarajan. GUI ripping: Reverse engineering of
graphical user interfaces for testing. In Proceedings of The 10th Working Conference on Reverse
Engineering, November 2003.
Domenico Amal tano, Anna Rita Fasolino, and Por rio Tramontana. A gui crawling-based tech-
nique for android mobile application testing. In Proceedings of the 2011 IEEE Fourth International
Conference on Software Testing, Veri cation and Validation Workshops, ICSTW '11, pages 252{
, Washington, DC, USA, 2011. IEEE Computer Society.
Domenico Amal tano, Anna Rita Fasolino, Por rio Tramontana, Salvatore De Carmine, and
Atif M. Memon. Using gui ripping for automated testing of android applications. In Proceed-
ings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE
, pages 258{261, New York, NY, USA, 2012. ACM.
M. Erfani and A. Mesbah. Reverse engineering ios mobile applications. In 19th Working Confer-
ence on Reverse Engineering, (WCRE'12), 2012.
James Lo, Eric Wohlstadter, and Ali Mesbah. Imagen: Runtime migration of browser sessions
for javascript web applications. In Proceedings of the International World Wide Web Conference
(WWW), pages 815{825. ACM, 2013.