Towards Byzantine Fault-resistance in Workflows: Challenges and Directions

Sofia  Vaz; Vitor A.  Cunha; Rui L.  Aguiar

doi:10.13052/jmm1550-4646.2214

Authors

Sofia Vaz Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal https://orcid.org/0009-0000-2976-3608
Vitor A. Cunha Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal https://orcid.org/0000-0002-4566-0198
Rui L. Aguiar Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

DOI:

https://doi.org/10.13052/jmm1550-4646.2214

Keywords:

Byzantine Fault Tolerance, Distributed Workflows

Abstract

As distributed workflows increase in complexity, they become more vulnerable to Byzantine faults, wherein components may demonstrate adversarial behavior. Addressing such faults is critical for ensuring system reliability, especially in fields such as MLOps, Blockchain, and quantum computing. Byzantine Fault Tolerance (BFT) encompasses concepts wherein systems can continue to operate correctly in the presence of faulty nodes. This paper explores core concepts in Byzantine fault-tolerant systems and discusses their relevance to modern distributed environments, such as serverless approaches applied to scientific workloads, IoT, and APIs that may power 6G and the compute continuum. Several defensive strategies and aggregation techniques are analyzed to underscore their role in alleviating the impact of Byzantine faults. Ultimately, this study highlights the continued significance of Byzantine fault tolerance in safeguarding distributed systems’ resilience, security, and functionality for a new era facing increasing threats.

Downloads

Download data is not yet available.

Author Biographies

Sofia Vaz, Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

Sofia Vaz gained her B.Sc. in computer engineering in 2021 and her M.Sc. in cybersecurity in 2024 from the University of Aveiro. She is currently a Ph.D. student in a joint program with the Universities of Aveiro, Porto, and Minho. Additionally, she is a research fellow for EXIGENCE, with her research interests centered on security-based observation in distributed systems.

Vitor A. Cunha, Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

Vítor A. Cunha (Ph.D.’2022) is an Assistant Professor at the Univ. of Aveiro and a researcher at Instituto de Telecomunicações. He is currently working on sustainable networks and dynamic security mechanisms for softwarized and virtualized networks. Interests include network security, SDN, NFV, and the computing continuum.

Rui L. Aguiar, Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

Rui L. Aguiar received his degree in telecommunication engineering in 1990 and his Ph.D. degree in electrical engineering in 2001 from the University of Aveiro. He is currently a full professor at the University of Aveiro, responsible for the networking area, and has been previously an adjunct professor at the INI, Carnegie Mellon University. He was a visiting research scholar at Universidade Federal de Uberlândia Brazil and served as advisor to the portuguese governement on 5G policies. He is coordinating a research line nationwide in Instituto de Telecomunicações, on the area of networks and services. Over six years, he led the Technological Platform on Connected Communities, a regional cross-disciplinary industry-oriented activity on smart environments. His current research interests are centred on the implementation of advanced wireless networks and systems, with special emphasis on 5G networks and the Future Internet.

References

Ahmed Al Salih and Yongge Wang. BDLS as a Blockchain Finality Gadget: Improving Byzantine Fault Tolerance in Hyperledger Fabric. IEEE Access, 2024.

Dan Alistarh, Zeyuan Allen-Zhu, and Jerry Li. Byzantine Stochastic Gradient Descent. Advances in neural information processing systems, 31, 2018.

Adam Barker and Jano van Hemert. Scientific Workflow: A Survey and Research Directions. pages 746–753. 2008.

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, and R Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.

Djamila Bouhata, Hamouma Moumen, Jocelyn Ahmed Mazari, and Ahcène Bounceur. Byzantine fault tolerance in distributed machine learning: a survey. Journal of Experimental & Theoretical Artificial Intelligence, pages 1–59, 2024.

Xinyang Cao and Lifeng Lai. Distributed Gradient Descent Algorithm Robust to an Arbitrary Number of Byzantine Attackers. IEEE Transactions on Signal Processing, 67(22):5850–5864, nov 2019.

Miguel Castro, Barbara Liskov, et al. Practical byzantine fault tolerance. In OsDI, volume 99, pages 173–186, 1999.

Tainã Coleman, Henri Casanova, Loïc Pottier, Manav Kaushik, Ewa Deelman, and Rafael Ferreira da Silva. WfCommons: A framework for enabling scientific workflow research and development. Future Generation Computer Systems, 128:16–27, 2022.

Rafael Ferreira da Silva, Loïc Pottier, Tainã Coleman, Ewa Deelman, and Henri Casanova. WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development. In 2020 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS), pages 49–56, 2020.

Ewa Deelman, Dennis Gannon, Matthew Shields, and Ian Taylor. Workflows and e-Science: An overview of workflow system features and capabilities. Future Generation Computer Systems, 25(5):528–540, may 2009.

Ron Dorfman, Naseem Yehya, and Kfir Y Levy. Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers. arXiv preprint arXiv:2402.02951, 2024.

Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM (JACM), 35(2):288–323, 1988.

Yue Huang, Huizhong Li, Yi Sun, and Sisi Duan. Byzantine Fault Tolerance with Non-Determinism, Revisited. IEEE Transactions on Information Forensics and Security, 2024.

Quentin Kniep and Roger Wattenhofer. Byzantine Fault-Tolerant Aggregate Signatures. In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security, pages 1831–1843, 2024.

Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem, page 203–226. Association for Computing Machinery, New York, NY, USA, 2019.

Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP), pages 739–753, 2019.

Jing Qiao, Zuyuan Zhang, Sheng Yue, Yuan Yuan, Zhipeng Cai, Xiao Zhang, Ju Ren, and Dongxiao Yu. BR-DeFedRL: Byzantine-Robust Decentralized Federated Reinforcement Learning with Fast Convergence and Communication Efficiency. In IEEE INFOCOM 2024-IEEE Conference on Computer Communications, pages 141–150. IEEE, 2024.

Qingqing Ren, Shuyong Zhu, Lu Lu, Zhiqiang Li, Guangyu Zhao, and Yujun Zhang. Netshield: An in-network architecture against byzantine failures in distributed deep learning. Computer Networks, 237:110081, 2023.

Nuria Rodríguez-Barroso, Javier Del Ser, M Victoria Luzón, and Francisco Herrera. Defense Strategy against Byzantine Attacks in Federated Machine Learning: Developments towards Explainability. In 2024 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pages 1–8. IEEE, 2024.

Marko Vukolić. The quest for scalable blockchain fabric: Proof-of-work vs. bft replication. In Jan Camenisch and Doğan Kesdoğan, editors, Open Problems in Network Security, pages 112–125, Cham, 2016. Springer International Publishing.

Yongge Wang. Byzantine fault tolerance for distributed ledgers revisited. Distributed Ledger Technologies: Research and Practice, 1(1):1–26, 2022.

Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Generalized byzantine-tolerant sgd. arXiv preprint arXiv:1802.10116, 2018.

Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Phocas: dimensional byzantine-resilient stochastic gradient descent, 2018.

Towards Byzantine Fault-resistance in Workflows: Challenges and Directions

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Sofia Vaz, Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

Vitor A. Cunha, Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

Rui L. Aguiar, Instituto de Telecomunicações, Universidade de Aveiro, 3810-164 Aveiro, Portugal, Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro, 3810-164 Aveiro, Portugal

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

IEEE Xplore

interview

splissue

award

2020 Best Paper Award

issn

cover

Open Access

Make a Submission

subreq

indexed

logo