SCALABLE RDF GRAPH QUERYING USING CLOUD COMPUTING

REN  LI; DAN  YANG; HAIBO  HU; JUAN  XIE; LI  FU

Authors

REN LI College of Computer Science, Chongqing University, Chongqing, China
DAN YANG School of Software Engineering, Chongqing University, Chongqing, China
HAIBO HU School of Software Engineering, Chongqing University, Chongqing, China
JUAN XIE School of Software Engineering, Chongqing University, Chongqing, China
LI FU School of Software Engineering, Chongqing University, Chongqing, China

Keywords:

Semantic Web, RDF, SPARQL, Cloud Computing, MapReduce, HBase

Abstract

With the explosion of the semantic web technologies, conventional SPARQL processing tools do not scale well for large amounts of RDF data because they are designed for use on a single-machine context. Several optimization solutions combined with cloud computing technologies have been proposed to overcome these drawbacks. However, these approaches only consider the SPARQL Basic Graph Pattern processing, and their file system-based schema can barely modify large-scale RDF data randomly. This paper presents a scalable SPARQL Group Graph Pattern (GGP) processing framework for large RDF graphs. We design a novel storage schema on HBase to store RDF data. Furthermore, a query plan generation algorithm is proposed to determine jobs based on a greedy selection strategy. Several query algorithms are also presented to answer SPARQL GGP queries in the MapReduce paradigm. An experiment on a simulation cloud computing environment shows that our framework is more scalable and efficient than traditional approaches when storing and retrieving large volumes of RDF data.

Downloads

Download data is not yet available.

References

Berners-Lee, T., Hendler, J. and Lassila, O., The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 2001. 2. Mishra, R.B. and Kumar, S., Semantic Web Reasoners and Languages. Artificial Intelligence Review, vol. 35, no. 4, pp. 339–368, 2011. 3. W3C, Resource Description Framework (RDF): concepts and abstract syntax, 2004, http://www. w3.org/TR/rdf-concepts/. 4. W3C, SPARQL query language for RDF, 2008, http://www.w3.org/TR/rdf-sparql-query/. 5. Pérez, J., Arenas, M. and Gutierrez, C., Semantics and Complexity of SPARQL. In Proceedings of the 5th International Semantic Web Conference, pp. 30–43, 2006. 6. Cyganiak, R., A relational algebra for SPARQL, HP-Labs Technical Report, HPL-2005-170. http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html.

W3C, SPARQL 1.1 Query Language, 2012, http://www.w3.org/TR/sparql11-query/ 8. Bizer, C., Jentzsch, A. and Cyganiak, R., State of the LOD Cloud, http://www4. wiwiss.fuberlin. de/lodcloud/state/ 9. Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A. and Wilkinson, K., Jena: Implementing the Semantic Web Recommendations, In Proceedings of the 13th International World Wide Web Conference, 2004, pp. 806–815. 10. Broekstra, J. and Kampman, A., Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, In Proceedings of the 1st International Semantic Web Conference, 2002. 11. Neumann, T. and Weikum, G., The RDF-3X Engine for Scalable Management of RDF Data, VLDB Journal, vol. 19, pp. 91–113, 2010. 12. Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C. and Reynolds, D., SPARQL Basic Graph Pattern Optimization using Selectivity Estimation, In Proc. 17th International Conference on World Wide Web 2008, WWW ’08, pp. 595–604, 2008. 13. Vidal, M. E., Ruckhaus, E., Lampo, T., Martinez, A., Sierra, J. and Polleres, A., Efficiently Joining Group Patterns in SPARQL Queries, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), The Semantic Web: Research and Applications –7th Extended Semantic Web Conference, ESWC 2010, Proceedings, pp. 228–242, 2010. 14. Groppe, J. and Groppe, S., Parallelizing Join Computations of SPARQL Queries for Large Semantic Web Databases, In Proceedings of the 26th Annual ACM Symposium on Applied Computing, pp. 1681–1686, 2011. 15. Dean, J. and Ghemawat, S., MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. 16. Ghemawat, S., Gobioff, H., Leung, S-T., The Google File System, In Proc. 19th ACM Symposium on Operating Systems Principles, pp. 29–43, 2003. 17. Mika, P. and Tummarello, G., Web Semantics in the Clouds, IEEE Intelligent Systems, vol. 23, no. 5, pp. 82–87, 2008. 18. Alham, N. K., Li, M.Z, Liu. Y and Hammoud, S., A MapReduce-based Distributed SVM Algorithm for automatic image annotation, Computers & Mathematics with Applications, vol. 62, no. 7, pp. 2801–2811, 2011. 19. Xue, W., Shi, J. W. and Yang, B., X-RIME: Cloud-Based Large Scale Social Network Analysis, In Proceedings of 2010 IEEE International Conference on Services Computing, pp. 506–513, 2010. 20. Urbani, J., Kotoulas, S., Maassen, J., Harmelen, F.V. and Bal, H., WebPIE: A Web-scale Parallel Inference Engine using MapReduce, Journal of Web Semantics, vol. 10, pp. 59–75, 2012. 21. Grau, B.C., Horrocks, I., Motik, B., Parsia, B., Patel-Scheider, P. and Sattler, U., OWL 2: The Next Step for OWL, Journal of Web Semantics, vol. 6, no. 4, pp. 309–322, 2008. 22. Mutharaju, R., Maier, F. and Hitzler, P., A MapReduce Algorithm for EL+, In Proceedings of the 23rd International Workshop on Description Logics, pp. 464–474, 2010. 23. Myung, J., Yeon, J. and Lee, S., SPARQL Basic Graph Pattern Processing with Iterative MapReduce, In Proceedings of 2010 Workshop on Massive Data Analytics on the Cloud, MDAC 2010, in Association with the 19th Annual World Wide Web Conference, WWW 2010, 2010. 24. Husain, M. F., McGlothlin, J., Masud, M. M., Khan, L. R. and Thuraisingham, B., Heuristics-based Query Processing for Large RDF Graphs using Cloud Computing, IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 9, pp. 1312–1327, 2011. 25. Franke, C., Morin, S., Chebotko, A., Abraham, J. and Brazier, P., Distributed Semantic Web Data Management in HBase and MySQL Cluster, In Proceedings of 2011 IEEE 4th International Conference on Cloud Computing, pp. 105–112, 2011.

Sun, J. and Jin, Q., Scalable RDF Store based on HBase and MapReduce, In Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, vol. 1, pp. V1633–V1636, 2010.

Choi, H., Son, J., Cho, Y., Sung, M. K. and Chung, Y. D., SPIDER: A System for Scalable, Parallel / Distributed Evaluation of large-scale RDF Data, In Proceedings of International Conference on Information and Knowledge Management, pp. 2087–2088, 2009. 28. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A. and Gruber, R. E., Bigtable: A Distributed Storage System for Structured Data, ACM Transactions on Computer Systems, vol. 26, no. 2, 2008. 29. Weiss, C., Karras, P. and Bernstein, A., Hexastore: Sextuple Indexing for Semantic Web Data Management, In Proceedings of VLDB Endowment, vol.1, no.1, pp. 1008–1019, 2008. 30. Guo, Y., Pan, Z. and Heflin, J., LUBM: A benchmark for OWL knowledge base systems, Journal of Web Semantics, vol. 3, no. 2–3, pp. 158–182, 2005. 31. Schmidt, M., Hornung, T., Lausen, G. and Pinkel, C., SP2Bech: A SPARQL performance benchmark, In Proceedings of the 25th IEEE International Conference on Data Engineering, pp. 222–233, 2009. 32. Bizer, C. and Schultz, A., The Berlin SPARQL benchmark, International Journal on Semantic Web and Information Systems, vol. 5, no. 2, pp. 1–24, 2009.

SCALABLE RDF GRAPH QUERYING USING CLOUD COMPUTING

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

IEEE Xplore

ImpactScore

specialissue

issn

cover

Make a Submission

subreq

indexed