Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

Yue  Zhao; Kenji  Yoshigoe; Mengjun  Xie; Suijian  Zhou; Remzi  Seker; Jiang  Bian

doi:10.13052/jcsm2245-1439.333

Authors

Yue Zhao Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA
Kenji Yoshigoe Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA
Mengjun Xie Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA
Suijian Zhou Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA
Remzi Seker Department of ECSSE, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114, USA
Jiang Bian Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA

DOI:

https://doi.org/10.13052/jcsm2245-1439.333

Keywords:

Big data, Graph-parallel computing, Distributed processing

Abstract

A number of graph-parallel processing frameworks have been proposed to address the needs of processing complex and large-scale graph structured datasets in recent years. Although significant performance improvement made by those frameworks were reported, comparative advantages of each of these frameworks over the others have not been fully studied, which impedes the best utilization of those frameworks for a specific graph computing task and setting. In this work, we conducted a comparison study on parallel processing systems for large-scale graph computations in a systematic manner, aiming to reveal the characteristics of those systems in performing common graph algorithms with real-world datasets on the same ground. We selected three popular graph-parallel processing frameworks (Giraph, GPS and GraphLab) for the study and also include a representative general data-parallel computing system– Spark–in the comparison in order to understand how well a general data-parallel system can run graph problems. We applied basic performance metrics measuring speed, resource utilization, and scalability to answer a basic question of which graph-parallel processing platform is better suited for what applications and datasets. Three widely-used graph algorithms–clustering coefficient, shortest path length, and PageRank score–were used for benchmarking on the targeted computing systems. We ran those algorithms against three real world network datasets with diverse characteristics and scales on a research cluster and have obtained a number of interesting observations. For instance, all evaluated systems showed poor scalability (i.e., the runtime increases with more computing nodes) with small datasets likely due to communication overhead. Further, out of the evaluated graph-parallel computing platforms, PowerGraph consistently exhibits better performance than others.

Downloads

Download data is not yet available.

Author Biographies

Yue Zhao, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Yue Zhao received his B.S and M.S. degree in Computer Science and Technique in 2006 and 2009, respectively, from Jilin University. Currently he is a Ph.D candidate in Integrated Computing program at University of Arkansas at Little Rock. His research interest includes big-data analytic, distributed computing, High performance computing, and wireless network.

Kenji Yoshigoe, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Kenji Yoshigoe is an Associate Professor in the Department of Computer Science and the Director of Computational Research Center (CRC) at UALR. He received his Ph.D. degree in Computer Science and Engineering from the University of South Florida. He is currently investigating the reliability, security, and scalability of various interconnected systems ranging from tightly coupled high performance computing systems to resource-constrained wireless sensor networks.

Mengjun Xie, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Mengjun Xie is an Assistant Professor in the Department of Computer Science at the University of Arkansas at Little Rock. He received his Ph.D. degree in Computer Science from the College of William and Mary. His research interests include cyber security, information assurance, mobile computing, and big data analytics.

Suijian Zhou, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Suijian Zhou is a Postdoctoral Researcher in the Department of Computer Science at UALR. He received his Ph.D degree in Particle Physics from the Institute of High Energy Physics in China. He participated in the Atlas Grid Computing project at CERN and the IGE Grid Computing project in Sweden during the past few years. He is interested in the Big Data analysis and Cloud Computing techniques.

Remzi Seker, Department of ECSSE, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114, USA

Remzi Seker is a Professor in the Department of Electrical, Computer, Software, and Systems Engineering at Embry-Riddle Aeronautical University at Daytona Beach, Florida. He received his Ph.D. degree in Computer Engineering from the University of Alabama at Birmingham. His research interests are safety and security critical systems and computer forensics. He is co-author of one of the first papers that was published on Mobile Phishing, and possible techniques for preventing it.

Jiang Bian, Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA

Jiang Bian received the M.S. degree in Computer Science in 2007 and his Ph.D. degree in Integrated Computing in 2010 both from University of Arkansas at Little Rock, Little Rock. He is currently an Assistant Professor of Biomedical Informatics at University of Arkansas for Medical Sciences, Little Rock. His research interest includes big-data analytic, network science, machine learning, and knowledge discovery and representation.

References

Forum, M.P., MPI: A Message-Passing Interface Standard. 1994, University of Tennessee.

Dagum, L. and R. Menon, OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Comput. Sci. Eng., 1998. 5(1): p. 46–55.

Dean, J. and S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM, 2008. 51(1): p. 107–113.

Low, Y., et al., Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 2012. 5(8): p. 716–727.

Low, Y., et al., Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1006.4990, 2010.

Gonzalez, J.E., et al., PowerGraph: distributed graph-parallel computation on natural graphs, in Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation. 2012, USENIX Association: Hollywood, CA, USA. p. 17–30.

Katz, R.F., et al., Numerical simulation of geodynamic processes with the Portable Extensible Toolkit for Scientific Computation. Physics of the Earth and Planetary Interiors, 2007. 163(1–4): p. 52–68.

Chen, W.-Y., et al., Parallel spectral clustering in distributed systems. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2011. 33(3): p. 568–586.

Zaharia, M., et al., Spark: cluster computing with working sets, in Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. 2010, USENIX Association: Boston, MA. p. 10–10.

Zaharia, M., et al., Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. 2012, USENIX Association: San Jose, CA. p. 2–2.

Guo, Y., et al. How well do graph-processing platforms perform? an empirical performance evaluation and analysis.

Scott, J. and P.J. Carrington, The SAGE handbook of social network analysis. 2011: SAGE publications.

Newman, M., Networks: An Introduction. 2010: Oxford University Press, Inc. 720.

Malewicz, G., et al., Pregel: a system for large-scale graph processing, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 2010, ACM: Indianapolis, Indiana, USA. p. 135–146.

Elser, B. and A. Montresor. An evaluation study of Big Data frameworks for graph processing. In Big Data, 2013 IEEE International Conference on. 2013.

Guo, Y., et al., Towards Benchmarking Graph-Processing Platforms.

Salihoglu, S. and J. Widom. Gps: A graph processing system. in Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 2013. ACM.

The Apache Software Foundation. Apache Giraph. 2014 cited 2014; Available from: http://giraph.apache.org/.

Page, L., et al., The PageRank citation ranking: Bringing order to the web. 1999.

Haveliwala, T., S. Kamvar, and G. Jeh, An analytical comparison of approaches to personalizing PageRank. 2003.

Langville, A.N. and C.D. Meyer, Deeper inside pagerank. Internet Mathematics, 2004. 1(3): p. 335–380.

Watts, D.J. and S.H. Strogatz, Collective dynamics of ‘small-world’networks. nature, 1998. 393(6684): p. 440–442.

Minas Gjoka, Maciej Kurant, Carter T. Butts and Athina Markopoulou, Walking in Facebook: A Case Study of Unbiased Sampling of OSNs, Proceedings of IEEE INFOCOM ’10, San Diego, CA, 2010.

Crankshaw, Daniel, Ankur Dave, Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica, The GraphX Graph Processing System.

D. Gregor and A. Lumsdaine, “The Parallel BGL: A Generic Library for Distributed Graph Computations,” POOSC, 2005.

K. Kambatla, G. Kollias, and A. Grama, “Efficient Large-Scale Graph Analysis in MapReduce,” in PMAA, 2012.

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Fast and interactive analytics over Hadoop data with Spark. USENIX; login, 37(4), 2012.

Yue Zhao, Kenji Yoshigoe, Mengjun Xie, Suijian Zhou, Remzi Seker, and Jiang Bian, LightGraph: Lighten Communication in Distributed Graph-Parallel Processing, in Proceedings of the 3rd IEEE International Congress on Big Data (BigData 2014), Anchorage, Alaska, USA, 2014.

POWER, R., AND LI, J. Piccolo: building fast, distributed programs with partitioned tables. In OSDI (2010).

http://en.m.wikipedia.org/wiki/Ps_(Unix)

Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Yue Zhao, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Kenji Yoshigoe, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Mengjun Xie, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Suijian Zhou, Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

Remzi Seker, Department of ECSSE, Embry-Riddle Aeronautical University, Daytona Beach, FL 32114, USA

Jiang Bian, Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA

References

Downloads

Published

How to Cite

Issue

Section

specialissue

award

Alvaro Garrido

2022 Best Paper

Interview

Interview

magazine

issn

cover

Make a Submission

indexed

openaccess

opinions

riverlogo