A Unified Model Representation of Machine Learning Knowledge
Nowadays, Machine Learning (ML) algorithms are being widely applied in virtually all possible scenarios. However, developing a ML project entails the effort of many ML experts who have to select and configure the appropriate algorithm to process the data to learn from, between other things. Since there exist thousands of algorithms, it becomes a time-consuming and challenging task. To this end, recently, AutoML emerged to provide mechanisms to automate parts of this process. However, most of the efforts focus on applying brute force procedures to try different algorithms or configuration and select the one which gives better results. To make a smarter and more efficient selection, a repository of knowledge is necessary. To this end, this paper proposes (1) an approach towards a common language to consolidate the current distributed knowledge sources related the algorithm selection in ML, and (2) a method to join the knowledge gathered through this language in a unified store that can be exploited later on, and (3) a traceability links maintenance. The preliminary evaluations of this approach allow to create a unified store collecting the knowledge of 13 different sources and to identify a bunch of research lines to conduct.
Hadil Abukwaik, Andreas Burger, Berima Kweku Andam, and Thorsten Berger. Semi-automated feature traceability with embedded annotations. 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 529–533, 2018.
Danilo Ardagna, Elisabetta Di Nitto, Parastoo Mohagheghi, Sébastien Mosser, Cyril Ballagny, Francesco D’Andria, Giuliano Casale, Peter Matthews, Cosmin-Septimiu Nechifor, Dana Petcu, et al. Modaclouds: A model-driven approach for the design and execution of applications on multiple clouds. In 2012 4th International Workshop on Modeling in Software Engineering (MISE), pages 50–56. IEEE, 2012.
Jean Bézivin. On the unification power of models. Software & Systems Modeling, 4(2):171–188, 2005.
BigML. Bigml. Available at: https://bigml.com/, 2019. Last accessed: July 2019.
Francis Bordeleau and Edgard Fiallos. Model-based engineering: A new era based on papyrus and open source tooling. In OSS4MDE@MoDELS, pages 2–8. Citeseer, 2014.
Marco Brambilla, Jordi Cabot, and Manuel Wimmer. Model-driven software engineering in practice. Synthesis Lectures on Software Engineering, 1(1):1–182, 2012.
Dataiku. Dataiku blog. Available at: https://blog.dataiku.com/, 2019. Last accessed: July 2019.
DataRobot. Datarobot. Available at: https://www.datarobot.com/, 2019. Last accessed: July 2019.
Steven L. Dixon, Jianxin Duan, E. Drybrough Smith, Christopher D Von Bargen, Woody Sherman, and Matthew P. Repasky. Autoqsar: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling. Future medicinal chemistry, 8 15:1825–1839, 2016.
María José Escalona, Julián Alberto García-García, Fernando Mas, Manuel Oliva, and Carmelo Del Valle. Applying model-driven paradigm: Calipsoneo experience. In CAiSE Industrial Track, pages 25–32. Citeseer, 2013.
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Sprin-genberg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2962–2970. Curran Associates, Inc., 2015.
Frédéric Fondement and Raul Silaghi. Defining model driven engineering processes. In Third International Workshop in Software Model Engineering (WiSME), held at the 7th International Conference on the Unified Modeling Language (UML), 2004.
Julián Alberto García-García, Laura García-Borgoñón, María José Escalona, and Manuel Mejías. A model-based solution for process modeling in practice environments: Plm4bs. Journal of Software: Evolution and Process, 30(12):e1982, 2018.
H20. Automl: Automatic machine learning. Available at: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html, 2019. Last accessed: September 2019.
Antonio Martínez-Rojas., Andrés Jiménez-Ramírez., and Jose G. Enríquez. Towards a unified model representation of machine learning knowledge. In Proceedings of the 15th International Conference on Web Information Systems and Technologies - Volume 1: APMDWE,, pages 470–476. INSTICC, SciTePress, 2019.
Stephen J Mellor, Kendall Scott, Axel Uhl, and Dirk Weise. MDA distilled: principles of model-driven architecture. Addison-Wesley Professional, 2004.
Microsoft. Machine learning algorithm cheat sheet for azure machine learning studio. Available at: https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-cheat-sheet, 2019. Last accessed: July 2019.
Thomas M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997.
Tom Michael Mitchell. The discipline of machine learning, volume 9. Carnegie Mellon University, School of Computer Science, Machine Learning . . . , 2006.
Parastoo Mohagheghi, Wasif Gilani, Alin Stefanescu, and Miguel A Fernandez. An empirical study of the state of the practice and acceptance of model-driven engineering in four industrial cases. Empirical Software Engineering, 18(1):89–116, 2013.
Felix Mohr, Marcel Wever, and Eyke Hüllermeier. Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8):1495–1515, Sep 2018.
Randal S Olson, Nathan Bartley, Ryan J Urbanowicz, and Jason H Moore. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pages 485–492, 2016.
OMG. Meta object facility (MOF) 2.5 core specification. Version 2.5.1. Available at: https://www.omg.org/spec/MOF/2.5.1/PDF, 2016. Last accessed: July 2019.
Maria S. Panagopoulou, Makrina Karaglani, Ioanna Balgkouranidou, Eirini Biziota, Triantafillia Koukaki, Evaggelos Karamitrousis, Evangelia Nena, Ioannis Tsamardinos, George Kolios, Evi S Lianidou, Stylianos Souglakos John Kakolyris, and Ekaterini Chatzaki. Circulating cell-free dna in breast cancer: size profiling, levels, and methylation patterns lead to prognostic and predictive classifiers. Oncogene, 38:3387–3401, 2018.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
Douglas C Schmidt. Model-driven engineering. COMPUTER-IEEE COMPUTER SOCIETY-, 39(2):25, 2006.
Sckit-learn. sckikit-learn algorithm cheat-sheet. Available at: https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html, 2019. Last accessed: July 2019.
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto-weka: Automated selection and hyper-parameter optimization of classification algorithms. CoRR, abs/1208.3719, 2012.
Emília Villani, Rodrigo Pastl Pontes, Guilherme Kisselofl Coracini, and Ana Maria Ambrósio. Integrating model checking and model based testing for industrial software development. Computers in Industry, 104:88–102, 2019.