JITA4DS: Disaggregated Execution of Data Science Pipelines Between the Edge and the Data Centre

Authors

  • Genoveva Vargas-Solar French Council of Scientific Research (CNRS)-LIRIS, France
  • Md Sahil Hassan University of Arizona, USA
  • Ali Akoglu University of Arizona, USA

DOI:

https://doi.org/10.13052/jwe1540-9589.2111

Keywords:

Disaggregated data centers, data science pipelines, edge computing

Abstract

This paper targets the execution of data science (DS) pipelines supported by data processing, transmission and sharing across several resources executing greedy processes. Current data science pipelines environments provide various infrastructure services with computing resources such as general-purpose processors (GPP), Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs) and Tensor Processing Unit (TPU) coupled with platform and software services to design, run and maintain DS pipelines. These one-fits-all solutions impose the complete externalization of data pipeline tasks. However, some tasks can be executed in the edge, and the backend can provide just in time resources to ensure ad-hoc and elastic execution environments.
This paper introduces an innovative composable “Just in Time Architecture” for configuring DCs for Data Science Pipelines (JITA-4DS) and associated resource management techniques. JITA-4DS is a cross-layer management system that is aware of both the application characteristics and the underlying infrastructures to break the barriers between applications, middleware/operating system, and hardware layers. Vertical integration of these layers is needed for building a customizable Virtual Data Center (VDC) to meet the dynamically changing data science pipelines’ requirements such as performance, availability, and energy consumption. Accordingly, the paper shows an experimental simulation devoted to run data science workloads and determine the best strategies for scheduling the allocation of resources implemented by JITA-4DS.

Downloads

Download data is not yet available.

Author Biographies

Genoveva Vargas-Solar, French Council of Scientific Research (CNRS)-LIRIS, France

Genoveva Vargas-Solar (http://www.vargas-solar.com) is senior scientist of the French CNRS at LIRIS. Her research interests concern distributed and heterogeneous databases architectures, efficient execution of data science pipelines and service-based database systems. She conducts fundamental and applied research activities for addressing these challenges on different architectures ARM, raspberry, cluster, cloud, and HPC.

Md Sahil Hassan, University of Arizona, USA

Md Sahil Hassan is pursuing his PhD at the Electrical and Computer Engineering department of the University of Arizona. His research works involve resource management for Heterogeneous computing platforms, and the design, validation and efficient implementation of innovative architectures on FPGA platform. He is also involved with research topics in the field of Neuromorphic Computing.

Ali Akoglu, University of Arizona, USA

Ali Akoglu (https://uweb.engr.arizona.edu/~akoglu/) is a Professor in the Department of Electrical and Computer Engineering and the BIO5 Institute at the University of Arizona. He is the site-director of the National Science Foundation (NSF) Industry-University Cooperative Research Center on Cloud and Autonomic Computing. His research interests lie in the fields of high performance computing, reconfigurable computing, and cloud computing with the goal of solving the challenges of bridging the gap between the domain scientist, programming environment, and highly-parallel hardware architectures.

References

ASTERIX open-source big data management system. https://asterixdb.apache.org. Accessed: 2021-07-24.

DriveScale Composable Platform.

Google Colab. https://colab.research.google.com. Accessed: 2021-07-24.

Google Kaggle. http://www.kaggle.com. Accessed: 2021-07-24.

HPE Synergy.

Liqid Composable Infrastructure.

MARKDOWN. https://guides.github.com/features/mastering-markdown/. Accessed: 2021-07-24.

Microsoft Azure Notebooks. https://notebooks.azure.com. Accessed: 2021-07-24.

The Hypervisor (x86 & ARM).

Unified Computing.

VMware vSphere Hypervisor.

Ali Akoglu and Genoveva Vargas-Solar. Putting data science pipelines on the edge. To appear in the proceedings of the 2021 International Workshop on Big data driven Edge Cloud Services (BECS 2021), May 18, 2021.

Hao Chen, Yijia Zhang, Michael C. Caramanis, and Ayse K. Coskun. Energyqare: Qos-aware data center participation in smart grid regulation service reserve provision. ACM Trans. Model. Perform. Eval. Comput. Syst., 4(1), January 2019.

Andy Davis, Jay Parikh, and William E Weihl. Edgecomputing: extending enterprise applications to the edge of the internet. In Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 180–187, 2004.

Pedro Garcia Lopez, Alberto Montresor, Dick Epema, Anwitaman Datta, Teruo Higashino, Adriana Iamnitchi, Marinho Barcellos, Pascal Felber, and Etienne Riviere. Edge-centric computing: Vision and challenges, 2015.

Adnan Ghayas. Average 4g lte speed: How fast is 4g lte?, 2021.

Lukasz Golab and M Tamer Özsu. Data stream management. Synthesis Lectures on Data Management, 2(1):1–73, 2010.

Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. EuroSys ’19, New York, NY, USA, 2019. Association for Computing Machinery.

Jürgen Krämer and Bernhard Seeger. Semantics and implementation of continuous sliding window queries over data streams. ACM Transactions on Database Systems (TODS), 34(1):4, 2009.

Nirmal Kumbhare, Ali Akoglu, Aniruddha Marathe, Salim Hariri, and Ghaleb Abdulla. Dynamic power management for value-oriented schedulers in power-constrained hpc system. Parallel Computing, 99:102686, 2020.

Nirmal Kumbhare, Aniruddha Marathe, Ali Akoglu, Salim Hariri, and Ghaleb Abdulla. Adaptive power reallocation for value-oriented schedulers in power-constrained hpc. In 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pages 133–139. IEEE, 2019.

Nirmal Kumbhare, Aniruddha Marathe, Ali Akoglu, Howard Jay Siegel, Ghaleb Abdulla, and Salim Hariri. A value-oriented job scheduling approach for power-constrained and oversubscribed hpc systems. IEEE Transactions on Parallel and Distributed Systems, 31(6):1419–1433, 2020.

Nirmal Kumbhare, Cihan Tunc, Dylan Machovec, Ali Akoglu, Salim Hariri, and Howard Jay Siegel. Value based scheduling for oversubscribed power-constrained homogeneous hpc systems. In 2017 International Conference on Cloud and Autonomic Computing (ICCAC), pages 120–130. IEEE, 2017.

Dylan Machovec, Bhavesh Khemka, Nirmal Kumbhare, Sudeep Pasricha, Anthony A Maciejewski, Howard Jay Siegel, Ali Akoglu, Gregory A Koenig, Salim Hariri, Cihan Tunc, Michael Wright, Marcia Hilton, Rajendra Rambharos, Christopher Blandin, Farah Fargo, Ahmed Louri, and Neena Imam. Utility-based resource management in an oversubscribed energy-constrained heterogeneous environment executing parallel applications. In Parallel Computing, volume 83, pages 48–72, Apr. 2019.

Joshua Mack et al. User-Space Emulation Framework for Domain-Specific SoC Design. In 2020 IEEE Int. Parallel and Distrib. Process. Symp. Workshops), pages 44–53, 2020.

Massimo Merenda, Carlo Porcaro, and Demetrio Iero. Edge machine learning for ai-enabled iot devices: A review. Sensors, 20(9):2533, 2020.

A. D. Papaioannou, R. Nejabati, and D. Simeonidou. The benefits of a disaggregated data centre: A resource allocation approach. In 2016 IEEE Global Communications Conference (GLOBECOM), pages 1–7, 2016.

Xiaolong Xu, Wanchun Dou, Xuyun Zhang, and Jinjun Chen. EnReal: An energy-aware resource allocation method for scientific workflow executions in cloud environment. IEEE Transactions on Cloud Computing, 4(2):166–179, Sep. 2015.

Downloads

Published

2021-11-28

Issue

Section

Articles