Geospatially Partitioning Public Transit Networks for Open Data Publishing

Authors

  • Harm Delva IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium https://orcid.org/0000-0001-8272-0754
  • Julián Andrés Rojas IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium https://orcid.org/0000-0002-6645-1264
  • Pieter Colpaert IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium https://orcid.org/0000-0001-6917-2167
  • Ruben Verborgh IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium

DOI:

https://doi.org/10.13052/jwe1540-9589.2045

Keywords:

Linked Data, Open Data, Mobility, Maintainability, Web API engineering

Abstract

Public transit operators often publish their open data in a data dump, but developers with limited computational resources may not have the means to process all this data efficiently. In our prior work we have shown that geospatially partitioning an operator’s network can improve query times for client-side route planning applications by a factor of 2.4. However, it remains unclear whether this works for all network types, or other kinds of applications. To answer these questions, we must evaluate the same method on more networks and analyze the effect of geospatial partitioning on each network separately. In this paper we process three networks in Belgium: (i) the national railways, (ii) the regional operator in Flanders, and (iii) the network of the city of Brussels, using both real and artificially generated query sets. Our findings show that on the regional network, we can make query processing 4 times more efficient, but we could not improve the performance over the city network by more than 12%. Both the network’s topography, and to a lesser extent how users interact with the network, determine how suitable the network is for partitioning. Thus, we come to a negative answer to our question: our method does not work equally well for all networks. Moreover, since the network’s topography is the main determining factor, we expect this finding to apply to other graph-based geospatial data, as well as other Link Traversal-based applications.

Downloads

Download data is not yet available.

Author Biographies

Harm Delva, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium

Harm Delva. He is a PhD student at IDLab, affiliated with both Ghent University and imec. My original research subject lay at the intersection of the Semantic Web technologies and Linked Open Data publishing, but with a focus on researching how applications can find and use this data – beyond the traditional SPARQL usecase. You can find more information on hdelva.be.

Julián Andrés Rojas, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium

Julián Andrés Rojas. He got his MSc on Telematics Engineering from University of Cauca, Colombia in 2014. He started his PhD at Ghent University – imec in Pieter Colpaert’s team in 2017 on Linked Data publishing and querying on the Web. As part of his research, he has investigated the design and performance of Web interfaces for cost-efficient publishing and querying of Linked Open Data. Julián has applied his research and participated on projects about decentralized data sharing ecosystems in different domains such as public transit route planning, bicycle infrastructure, railway infrastructure and scientific research. Julián is author of several peer-reviewed publications presented in recognized venues such as ISWC, ESWC, TheWebConference and ICWE.

Pieter Colpaert, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium

Pieter Colpaert. He is a post-doctoral researcher at IDLab – Ghent University – imec working on cost-efficient and maintable Linked Data Web APIs. Together with his team he applies this to different domains such as transport and mobility, cultural heritage, base registries, governmental data, environmental sensor data, or scholarly communication. He co-founded Open Knowledge Belgium, and is still a voluntary board member there, to fight for a world where knowledge creates power for the many, and not the few.

Ruben Verborgh, IDLab, Department of Electronics and Information Systems, Ghent University – imec, Ghent, Belgium

Ruben Verborgh. He is a professor of Decentralized Web technology at IDLab, Ghent University – imec, and a research affiliate at the Decentralized Information Group of CSAIL at MIT. He is also a technology advocate for Inrupt, supporting the Solid ecosystem that gives you back control and choice-online and offline. He loves discussing about the Web, Linked Data, decentralization, Web APIs, hypermedia clients, and much more.

References

T. Berners-Lee, “Linked data-design issues,” 2006. [Online]. Available: https://www.w3.org/DesignIssues/LinkedData.html.

P. Colpaert, A. Llaves, R. Verborgh, O. Corcho, E. Mannens and R. Van de Walle, “Intermodal public transit routing using Liked Connections,” in International Semantic Web Conference: Posters and Demos, 2015.

H. Delva, J. A. Rojas, P.-J. Vandenberghe, P. Colpaert and R. Verborgh, “Geospatial Partitioning of Open Transit Data.,” in International Conference on Web Engineering, Helsinki, 2020.

E. Miller, “An introduction to the resource description framework,” Bulletin of the American Society for Information Science and Technology, vol. 25, pp. 15–19, 1998.

R. Verborgh, O. Hartig, B. De Meester, G. Haesendonck, L. De Vocht, M. Vander Sande, R. Cyganiak, P. Colpaert, E. Mannens and R. Van de Walle, “Querying datasets on the web with high availability,” in International Semantic Web Conference, 2014.

M. Lanthaler and C. Gutl, “Hydra: A Vocabulary for Hypermedia-Driven Web APIs.,” LDOW, vol. 996, 2013.

O. Hartig, C. Bizer and J.-C. Freytag, “Executing SPARQL queries over the web of linked data,” in International Semantic Web Conference, 2009.

O. Hartig, “Zero-knowledge query planning for an iterator implementation of link traversal based query execution,” in Extended Semantic Web Conference, 2011.

O. Hartig and M. T. Ozsu, Walking without a map: optimizing response times of traversal-based linked data queries, 2016.

A. Antrim and S. J. Barbeau, “The many uses of GTFS data - opening the door to transit and multimodal applications,” Location-Aware Information Systems Laboratory at the University of South Florida, 2013.

A. Berger, D. Delling, A. Gebhardt and M. Müller-Hannemann, “Accelerating time-dependent multi-criteria timetable information is harder than expected,” in 9th Workshop on Algorithmic Approaches for Transportation Modeling, Optimization, and Systems (ATMOS’09), 2009.

R. Bauer, D. Delling and D. Wagner, “Experimental study of speed up techniques for timetable information systems,” Networks, 2011.

D. Delling, J. Dibbelt, T. Pajor and T. Zündorf, “Faster transit routing by hyper partitioning,” in 17th Workshop on Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS 2017), 2017.

H. Bast, M. Hertel and S. Storandt, “Scalable transfer patterns,” in 2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), 2016.

H. Bast, E. Carlsson, A. Eigenwillig, R. Geisberger, C. Harrelson, V. Raychev and F. Viger, “Fast routing in very large public transportation networks using transfer patterns,” in European Symposium on Algorithms, 2010.

G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM Journal on scientific Computing, vol. 20, pp. 359–392, 1998.

D. Delling, A. V. Goldberg, I. Razenshteyn and R. F. Werneck, “Graph partitioning with natural cuts,” in 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.

R. Battle and D. Kolas, “Geosparql: enabling a geospatial semantic web,” Semantic Web Journal, vol. 3, pp. 355–370, 2011.

J. Dibbelt, T. Pajor, B. Strasser and D. Wagner, “Connection Scan Algorithm,” 2017.

H. H. Wickham, K. Kafadar and Hadley, “Letter-value plots: Boxplots for large data,” 2011.

Downloads

Published

2021-07-08

Issue

Section

ICWE 2020