FlakyLoc: Flakiness Localization for Reliable Test Suites in Web Applications
Web application testing is a great challenge due to the management of complex asynchronous communications, the concurrency between the clients-servers, and the heterogeneity of resources employed. It is difficult to ensure that a test case is re-running in the same conditions because it can be executed in undesirable ways according to several environmental factors that are not easy to fine-grain control such as network bottlenecks, memory issues or screen resolution. These environmental factors can cause flakiness, which occurs when the same test case sometimes obtains one test outcome and other times another outcome in the same application due to the execution of environmental factors. The tester usually stops relying on flaky test cases because their outcome varies during the re-executions. To fix and reduce the flakiness it is very important to locate and understand which environmental factors cause the flakiness. This paper is focused on the localization of the root cause of flakiness in web applications based on the characterization of the different environmental factors that are not controlled during testing. The root cause of flakiness is located by means of spectrum-based localization techniques that analyse the test execution under different combinations of the environmental factors that can trigger the flakiness. This technique is evaluated with an educational web platform called FullTeaching. As a result, our technique was able to locate automatically the root cause of flakiness and provide enough information to both understand it and fix it.
A. Bertolino, “Software Testing Research: Achievements, Challenges, Dreams,” in 2007 Future of Soft. Eng., 2007, pp. 85–103.
Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, “An empirical analysis of flaky tests,” in Proceedings of the ACM SIGSOFT Symposium on the Foundations of Soft. Eng., 2014, vol. 16-21-Nove, pp. 643–653.
M. Eck, F. Palomba, M. Castelluccio, and A. Bacchelli, “Understanding flaky tests: the developer’s perspective,” in to appear FSE19/ESEC, 2019, pp. 830–840.
W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A survey on software fault localization,” IEEE Transactions on Soft. Eng., vol. 42, no. 8. pp. 707–740, 2016.
J. Morán, C. Augusto, A. Bertolino, C. de la Riva, and J. Tuya, “Debugging Flaky Tests on Web Applications,” in Proceedings of the 15th Int. Conf. on Web Information Systems and Technologies, 2019, pp. 454–461.
M. J. Escalona and G. Aragón, “NDT. A model-driven approach for web requirements,” IEEE Trans. Softw. Eng., vol. 34, no. 3, pp. 377–394, May 2008.
J. A. García-García, M. J. Escalona, F. J. Domínguez-Mayo, and A. Salido, “NDT-Suite: A Methodological Tool Solution in the Model-Driven Engineering Paradigm,” J. Softw. Eng. Appl., vol. 07, no. 04, pp. 206–217, 2014.
Selenium HQ, “Selenium – Web Browser Automation,” 2019. [Online]. Available: https://www.seleniumhq.org/. [Accessed: 29-Jun-2019].
M. Harman and P. O’Hearn, “From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis,” in Proceedings – 18th IEEE Int. Working Conf. on Source Code Analysis and Manipulation, 2018, pp. 1–23.
S. Bechtold, S. Brannen, J. Link, M. Merdes, M. Philipp, and C. Stein, “RepeatedTest (JUnit 5.2.0 API),” 2019. [Online]. Available: https://ju nit.org/junit5/docs/5.2.0/api/org/junit/jupiter/api/RepeatedTest.html. [Accessed: 29-Jun-2019].
Pivotal Software, “Repeat (Spring Framework 5.1.8.RELEASE API),” 2014. [Online]. Available: https://docs.spring.io/spring/docs/current/jav adoc-api/org/springframework/test/annotation/Repeat.html. [Accessed: 28-Jun-2019].
Google, “FlakyTest |Android Developers,” 2019. [Online]. Available: ht tps://developer.android.com/reference/android/support/test/filters/Flaky Test.html. [Accessed: 28-Jun-2019].
F. Apache Software, “Maven Surefire Plugin – Rerun failing tests,” 2018. [Online]. Available: https://maven.apache.org/surefire/mave n-surefire-plugin/examples/rerun-failing-tests.html. [Accessed: 29-Jun-2019].
Q. Luo and J. Micco, “Flaky Test Handler v1.04,” 2015. [Online]. Available: https://plugins.jenkins.io/flaky-test-handler. [Accessed: 29-Jun-2019].
G. Catolino, F. Palomba, A. Zaidman, and F. Ferrucci, “Not all bugs are the same: Understanding, characterizing, and classifying bug types,” J. Syst. Softw., vol. 152, pp. 165–181, Jul. 2019.
R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, B. K. Ray, and D. S. Moebus, “Orthogonal Defect Classification—A Concept for In-Process Measurements,” IEEE Trans. Softw. Eng., vol. 18, no. 11, pp. 943–956, 1992.
Google, “Google Testing Blog: TotT: Avoiding Flakey Tests,” 2008. [Online]. Available: https://testing.googleblog.com/2008/04/tott-av oiding-flakey-tests.html. [Accessed: 02-Nov-2019].
S. Thorve, C. Sreshtha, and N. Meng, “An empirical study of flaky tests in android apps,” in Proceedings – 2018 IEEE Int.Conf.on Soft. Maintenance and Evolution, 2018, pp. 534–538.
K. Herzig and N. Nagappan, “Empirically Detecting False Test Alarms Using Association Rules,” in Proc. – Int. Conf. on Soft. Eng., 2015, vol. 2, pp. 39–48.
H. Jiang, X. Li, Z. Yang, and J. Xuan, “What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing,” in Proceedings – 2017 IEEE/ACM 39th Int. Conf. on Soft. Eng., 2017, pp. 712–723.
A. Vahabzadeh, A. M. Fard, and A. Mesbah, “An empirical study of bugs in test code,” in Proceeding – 2015 IEEE 31st International Conf. on Soft. Maintenance and Evolution, 2015, pp. 101–110.
F. Palomba and A. Zaidman, “Does refactoring of test smells induce fixing flaky tests?,” in Proceedings – 2017 IEEE Int. Conf. on Soft. Maintenance and Evolution, ICSME 2017, 2017, pp. 1–12.
K. Mužlu, B. Soran, and J. Wuttke, “Finding bugs by isolating unit tests,” in SIGSOFT/FSE 2011 – Proceedings of the 19th ACM SIGSOFT Symposium on Foundations of Soft. Eng., 2011, pp. 496–499.
J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov, “DeFlaker: Automatically detecting flaky tests,” in Proceedings of the 40th Int. Conf. on Soft. Eng.- ICSE ’18, 2018, pp. 433–444.
W. Lam, P. Godefroid, S. Nath, A. Santhiar, and S. Thummalapenta, “Root causing flaky tests in a large-scale industrial setting,” in Proceedings of the 28th ACM SIGSOFT Int. Symposium on Soft. Testing and Analysis, 2019, pp. 101–111.
W. Lam, R. Oei, A. Shi, D. Marinov, and T. Xie, “IDFlakies: A framework for detecting and partially classifying flaky tests,” in Proceedings – 2019 IEEE 12th Int. Conf. on Soft. Testing, Verification and Validation, 2019, pp. 312–322.
M. Fowler, “Eradicating Non-Determinism in Tests,” Martin Fowler Personal Blog, 2011. [Online]. Available: https://martinfowler.com/a rticles/nonDeterminism.html. [Accessed: 11-Nov-2019].
J. Micco, “Flaky Tests at Google and How We Mitigate Them,” Google Testing Blog. p. 4, 2016.
A. Shi, W. Lam, R. Oei, T. Xie, and D. Marinov, “iFixFlakies: a framework for automatically fixing order-dependent flaky tests,” in Proceedings of the 2019 27th ACM Joint Meeting on Eur. Soft. Eng. Conf. and Symposium on the Foundations of Soft. Eng., 2019, pp. 545–555.
Z. Gao, “Quantifying Flakiness and Minimizing Its Effects on Software Testing,” University of Maryland, 2017.
M. Grindal, J. Offutt, and S. F. Andler, “Combination testing strategies: A survey,” Softw. Test. Verif. Reliab., vol. 15, no. 3, pp. 167–199, Sep. 2005.
C. Nie and H. Leung, “A survey of combinatorial testing,” ACM Comput. Surv., vol. 43, no. 2, pp. 1–29, Jan. 2011.
ISO/IEC/IEEE, “29119-4:2015 -ISO/IEC/IEEE International Standard for Software and systems engineering —Software testing — TR-2017-35316 Part 4: Test techniques.” pp. 1–150, 2015.
P. Ammann and J. Offutt, “Using formal methods to derive test frames in category-partition testing,” in COMPASS – Proceedings of the Annual Conf, on Computer Assurance, 1994, pp. 69–79.
D. R. Kuhn and M. J. Reilly, “An investigation of the applicability of design of experiments to software testing,” in Proceedings – 27th Annual NASA Goddard / IEEE Soft. Eng. Work., 2003, pp. 91–95.
J. Huller, “Reducing Time to Market With Combinatorial Design Method Testing,” Int. Council on Systems Eng. (INCOSE) Conf. 2000.
M. J. Harrold, G. Rothermel, R. Wu, and L. Yi, “An Empirical Investigation of Program Spectra,” SIGPLAN Not. (ACM Spec. Interes. Gr. Program. Lang., vol. 33, no. 7, pp. 83–90, 1998.
M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi, “Empirical investigation of the relationship between spectra differences and regression faults,” Softw. Test. Verif. Reliab., vol. 10, no. 3, pp. 171–194, Sep. 2000.
R. Abreu, P. Zoeteweij, and A. J. C. Van Gemund, “On the accuracy of spectrum-based fault localization,” in Proceedings – Testing: Academic and Industrial Conf. Practice and Research Techniques, TAIC PART-Mutation 2007, 2007, pp. 89–98.
J. A. Jones and M. J. Harrold, “Empirical evaluation of the tarantula automatic fault-localization technique,” in 20th IEEE/ACM Int. Conf. on Automated Soft. Eng., 2005, pp. 273–282.
P. F. Pérez, “Fullteaching: A web application to make teaching online easy.” Universidad Rey Juan Carlos, 2017.