A Unified Model Representation of Machine Learning Knowledge

Authors

  • J. G. Enríquez Computer Languages and Systems Department. Escuela Técnica Superior de Ingeniería Informática, Avenida Reina Mercedes, s/n, 41012, Sevilla. Spain
  • A. Martínez-Rojas Computer Languages and Systems Department. Escuela Técnica Superior de Ingeniería Informática, Avenida Reina Mercedes, s/n, 41012, Sevilla. Spain
  • D. Lizcano Universidad a distancia de Madrid. Carretera de La Coruña, KM.38,500, vía de Servicio, no 15, 28400, Collado Villalba, Madrid. Spain
  • A Jiménez-Ramírez Computer Languages and Systems Department. Escuela Técnica Superior de Ingeniería Informática, Avenida Reina Mercedes, s/n, 41012, Sevilla. Spain

DOI:

https://doi.org/10.13052/jwe1540-9589.1929

Keywords:

Machine Learning, Automated Machine Learning, Knowledge Representation, Model-Driven Engineering

Abstract

Nowadays, Machine Learning (ML) algorithms are being widely applied in virtually all possible scenarios. However, developing a ML project entails the effort of many ML experts who have to select and configure the appropriate algorithm to process the data to learn from, between other things. Since there exist thousands of algorithms, it becomes a time-consuming and challenging task. To this end, recently, AutoML emerged to provide mechanisms to automate parts of this process. However, most of the efforts focus on applying brute force procedures to try different algorithms or configuration and select the one which gives better results. To make a smarter and more efficient selection, a repository of knowledge is necessary. To this end, this paper proposes (1) an approach towards a common language to consolidate the current distributed knowledge sources related the algorithm selection in ML, and (2) a method to join the knowledge gathered through this language in a unified store that can be exploited later on, and (3) a traceability links maintenance. The preliminary evaluations of this approach allow to create a unified store collecting the knowledge of 13 different sources and to identify a bunch of research lines to conduct.

Downloads

Download data is not yet available.

Author Biographies

J. G. Enríquez, Computer Languages and Systems Department. Escuela Técnica Superior de Ingeniería Informática, Avenida Reina Mercedes, s/n, 41012, Sevilla. Spain

J. G. Enríquez is Ph.D. in Computer Science at the University of Seville since 2017. He is a Lecturer with the Department of Computing Languages and Systems, University of Seville. He has been part of the organizing committee of different international conferences. He has collaborated with many universities in different countries such as: Southampton, Cuba or Berkeley among others. In 2015, he received the Innovation Award for its innovative activity in the field of research and development of the Platform for Dynamic Data Integration Andalusian Historical Heritage by Fujitsu Laboratories of Europe.

A. Martínez-Rojas, Computer Languages and Systems Department. Escuela Técnica Superior de Ingeniería Informática, Avenida Reina Mercedes, s/n, 41012, Sevilla. Spain

A. Martínez-Rojas is a senior computer engineering student at the University of Seville. He has a great interest in research, intends to pursue his doctoral studies. He already has several high impact publications. His main areas of interest are RPA and Machine Learning.

D. Lizcano, Universidad a distancia de Madrid. Carretera de La Coruña, KM.38,500, vía de Servicio, no 15, 28400, Collado Villalba, Madrid. Spain

D. Lizcano holds a Ph.D. in Computer Science from the UPM (2010), and a M.Sc. degree in Research in Complex Software Development (2008) also from UPM. He held a research grant from the European Social Fund under their Research Personnel Training program, the Extraordinary Graduation Prize for best academic record UPM and the National Accenture Prize for the Best Final-Year Computing Project. He is Professor and Senior Researcher at the Madrid Open University (UDIMA). He is currently involved in several national and European funded projects related to EUP, Web Engineering, Paradigms of Programming and HCI. He has published more than 25 papers in prestigious international journals and attended more than 70 international conferences.

A Jiménez-Ramírez, Computer Languages and Systems Department. Escuela Técnica Superior de Ingeniería Informática, Avenida Reina Mercedes, s/n, 41012, Sevilla. Spain

A. Jiménez-Ramírez is lecturer and researcher at the University of Seville, Spain. In 2014 he obtained his Ph.D degree in Computer Science. His research focuses on intelligent techniques for Business Process Management and Flexible Business Processes, hereby combining different disciplines, among which Constraint Programming, Imperative and Declarative Business Process Modeling, and Planning and Scheduling. His current research interests are related to Decision Support Systems applied to Flexible Business Processes. Andrés has published his research at international journals and conferences like Data and Knowledge Engineering, Information and Software Technology, Journal of Web Engineering or CAiSE.

References

Hadil Abukwaik, Andreas Burger, Berima Kweku Andam, and Thorsten Berger. Semi-automated feature traceability with embedded annotations. 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 529–533, 2018.

Danilo Ardagna, Elisabetta Di Nitto, Parastoo Mohagheghi, Sébastien Mosser, Cyril Ballagny, Francesco D’Andria, Giuliano Casale, Peter Matthews, Cosmin-Septimiu Nechifor, Dana Petcu, et al. Modaclouds: A model-driven approach for the design and execution of applications on multiple clouds. In 2012 4th International Workshop on Modeling in Software Engineering (MISE), pages 50–56. IEEE, 2012.

Jean Bézivin. On the unification power of models. Software & Systems Modeling, 4(2):171–188, 2005.

BigML. Bigml. Available at: https://bigml.com/, 2019. Last accessed: July 2019.

Francis Bordeleau and Edgard Fiallos. Model-based engineering: A new era based on papyrus and open source tooling. In OSS4MDE@MoDELS, pages 2–8. Citeseer, 2014.

Marco Brambilla, Jordi Cabot, and Manuel Wimmer. Model-driven software engineering in practice. Synthesis Lectures on Software Engineering, 1(1):1–182, 2012.

Dataiku. Dataiku blog. Available at: https://blog.dataiku.com/, 2019. Last accessed: July 2019.

DataRobot. Datarobot. Available at: https://www.datarobot.com/, 2019. Last accessed: July 2019.

Steven L. Dixon, Jianxin Duan, E. Drybrough Smith, Christopher D Von Bargen, Woody Sherman, and Matthew P. Repasky. Autoqsar: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling. Future medicinal chemistry, 8 15:1825–1839, 2016.

María José Escalona, Julián Alberto García-García, Fernando Mas, Manuel Oliva, and Carmelo Del Valle. Applying model-driven paradigm: Calipsoneo experience. In CAiSE Industrial Track, pages 25–32. Citeseer, 2013.

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Sprin-genberg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2962–2970. Curran Associates, Inc., 2015.

Frédéric Fondement and Raul Silaghi. Defining model driven engineering processes. In Third International Workshop in Software Model Engineering (WiSME), held at the 7th International Conference on the Unified Modeling Language (UML), 2004.

Julián Alberto García-García, Laura García-Borgoñón, María José Escalona, and Manuel Mejías. A model-based solution for process modeling in practice environments: Plm4bs. Journal of Software: Evolution and Process, 30(12):e1982, 2018.

H20. Automl: Automatic machine learning. Available at: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, 2019. Last accessed: September 2019.

Antonio Martínez-Rojas., Andrés Jiménez-Ramírez., and Jose G. Enríquez. Towards a unified model representation of machine learning knowledge. In Proceedings of the 15th International Conference on Web Information Systems and Technologies - Volume 1: APMDWE,, pages 470–476. INSTICC, SciTePress, 2019.

Stephen J Mellor, Kendall Scott, Axel Uhl, and Dirk Weise. MDA distilled: principles of model-driven architecture. Addison-Wesley Professional, 2004.

Microsoft. Machine learning algorithm cheat sheet for azure machine learning studio. Available at: https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-cheat-sheet, 2019. Last accessed: July 2019.

Thomas M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997.

Tom Michael Mitchell. The discipline of machine learning, volume 9. Carnegie Mellon University, School of Computer Science, Machine Learning . . . , 2006.

Parastoo Mohagheghi, Wasif Gilani, Alin Stefanescu, and Miguel A Fernandez. An empirical study of the state of the practice and acceptance of model-driven engineering in four industrial cases. Empirical Software Engineering, 18(1):89–116, 2013.

Felix Mohr, Marcel Wever, and Eyke Hüllermeier. Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8):1495–1515, Sep 2018.

Randal S Olson, Nathan Bartley, Ryan J Urbanowicz, and Jason H Moore. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pages 485–492, 2016.

OMG. Meta object facility (MOF) 2.5 core specification. Version 2.5.1. Available at: https://www.omg.org/spec/MOF/2.5.1/PDF, 2016. Last accessed: July 2019.

Maria S. Panagopoulou, Makrina Karaglani, Ioanna Balgkouranidou, Eirini Biziota, Triantafillia Koukaki, Evaggelos Karamitrousis, Evangelia Nena, Ioannis Tsamardinos, George Kolios, Evi S Lianidou, Stylianos Souglakos John Kakolyris, and Ekaterini Chatzaki. Circulating cell-free dna in breast cancer: size profiling, levels, and methylation patterns lead to prognostic and predictive classifiers. Oncogene, 38:3387–3401, 2018.

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.

Douglas C Schmidt. Model-driven engineering. COMPUTER-IEEE COMPUTER SOCIETY-, 39(2):25, 2006.

Sckit-learn. sckikit-learn algorithm cheat-sheet. Available at: https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html, 2019. Last accessed: July 2019.

Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto-weka: Automated selection and hyper-parameter optimization of classification algorithms. CoRR, abs/1208.3719, 2012.

Emília Villani, Rodrigo Pastl Pontes, Guilherme Kisselofl Coracini, and Ana Maria Ambrósio. Integrating model checking and model based testing for industrial software development. Computers in Industry, 104:88–102, 2019.

Published

2020-06-03

How to Cite

Enríquez, J. G., Martínez-Rojas, A., Lizcano, D., & Jiménez-Ramírez, A. (2020). A Unified Model Representation of Machine Learning Knowledge. Journal of Web Engineering, 19(2), 319–340. https://doi.org/10.13052/jwe1540-9589.1929

Issue

Section

SPECIAL ISSUE: Advanced Practices in Web Engineering 2020