Towards Adaptive Continuous Trajectory Clustering Over a Distributed Web Data Stream

Authors

  • Yang Wu School of Computer Science and Technology, Soochow University, China
  • Junhua Fang School of Computer Science and Technology, Soochow University, China
  • Pingfu Chao School of Computer Science and Technology, Soochow University, China
  • Zhicheng Pan School of Data Science and Engineering, East China Normal University, Shanghai, China
  • Wei Chen School of Computer Science and Technology, Soochow University, China
  • Lei Zhao School of Computer Science and Technology, Soochow University, China

DOI:

https://doi.org/10.13052/jwe1540-9589.2216

Keywords:

Spatio-temporal data, continuous trajectory clustering, distributed stream processing, trajectory analysis

Abstract

With the popularity of modern mobile devices and GPS technology, big web stream data with location are continuously generated and collected. The sequential positions form a trajectory, and the clustering analysis on trajectories is beneficial to a wide range of applications, e.g., route recommendation. In the past decades, extensive efforts have been made to improve the efficiency of static trajectory clustering. However, trajectory stream data is received incrementally, and the continuous trajectory clustering inevitably faces the following two problems: (1) physical structure design for trajectory representation leads to severe space overhead, and (2) dynamic maintenance of trajectory semantics and its retrieval structure brings intensive computation. To overcome the above problems, an adaptive continuous trajectory clustering framework (ACTOR) is proposed in this paper. Overall, it covers three key components: (1) Simplifier represents trajectory with a well-designed PT structure. (2) Partitioner utilizes a hexagonal-based indexing strategy to enhance the local computational efficiency. (3) Executor accommodates an adaptive selection of P-clustering and R-clustering approaches according to the ROC (rate of change) matrix. Empirical studies on real-world data validate the usefulness of our proposal and prove the huge advantage of our approach over available solutions in the literature.

Downloads

Download data is not yet available.

Author Biographies

Yang Wu, School of Computer Science and Technology, Soochow University, China

Yang Wu is a postgraduate student at Soochow University. His research interests mainly include trajectory data analysis and trajectory data stream clustering.

Junhua Fang, School of Computer Science and Technology, Soochow University, China

Junhua Fang is an associate professor of Soochow University. His research interests mainly include spatio-temporal database, cloud computing, and distributed stream processing.

Pingfu Chao, School of Computer Science and Technology, Soochow University, China

Pingfu Chao is an associate professor of Soochow University. His research interests mainly include spatial-temporal data management, trajectory data mining, distributed database system and smart grid data analysis.

Zhicheng Pan, School of Data Science and Engineering, East China Normal University, Shanghai, China

Zhicheng Pan is a Ph.D. student at East China Normal University. His research interests mainly include spatio-temporal databases and machine learning for database.

Wei Chen, School of Computer Science and Technology, Soochow University, China

Wei Chen is an associate professor of Soochow University. His research interests mainly include spatio-temporal data analysis, knowledge graph, heterogeneous information network and data fusion.

Lei Zhao, School of Computer Science and Technology, Soochow University, China

Lei Zhao is a professor of Soochow University. His research focuses on graph databases, social media analysis, query outsourcing, parallel and distributed computing.

References

Pankaj K Agarwal, Kyle Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, and Erin Taylor. Subtrajectory clustering: Models and algorithms. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 75–87, 2018.

Mihael Ankerst, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander. Optics: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2):49–60, 1999.

Derya Birant and Alp Kut. St-dbscan: An algorithm for clustering spatial–temporal data. Data & knowledge engineering, 60(1):208–221, 2007.

Liang Chen, Pingfu Chao, Junhua Fang, Wei Chen, Jiajie Xu, and Lei Zhao. Disatra: A real-time distributed abstract trajectory clustering. In International Conference on Web Information Systems Engineering, pages 619–635. Springer, 2021.

Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on pattern analysis and machine intelligence, 24(5):603–619, 2002.

Ticiana L Coelho Da Silva, Karine Zeitouni, and José AF de Macêdo. Online clustering of trajectory data stream. In 2016 17th IEEE International Conference on Mobile Data Management (MDM), volume 1, pages 112–121. IEEE, 2016.

Ticiana L Coelho Da Silva, Karine Zeitouni, José AF de Macêdo, and Marco A Casanova. Cutis: optimized online clustering of trajectory data stream. In Proceedings of the 20th International Database Engineering & Applications Symposium, pages 296–301, 2016.

Uber Engineering. H3: Uber’s Hexagonal Hierarchical Spatial Index. https://eng.uber.com/h3/.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226–231, 1996.

Ziquan Fang, Yuntao Du, Lu Chen, Yujia Hu, Yunjun Gao, and Gang Chen. E 2 dtc: An end to end deep trajectory clustering framework via self-training. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 696–707. IEEE, 2021.

Joachim Gudmundsson and Nacho Valladares. A gpu approach to subtrajectory clustering using the fréchet distance. IEEE Transactions on Parallel and Distributed Systems, 26(4):924–937, 2014.

Chih-Chieh Hung, Wen-Chih Peng, and Wang-Chien Lee. Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. The VLDB Journal, 24(2):169–192, 2015.

Bogyeong Kim, Kyoseung Koo, Juhun Kim, and Bongki Moon. Disc: Density-based incremental clustering by striding over streaming data. In 2021 IEEE 37th International Conference on Data Engineering (ICDE), pages 828–839. IEEE, 2021.

Sirisup Laohakiat and Vera Sa-Ing. An incremental density-based clustering framework using fuzzy local clustering. Information Sciences, 547:404–426, 2021.

Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. Trajectory clustering: a partition-and-group framework. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 593–604, 2007.

Tianyi Li, Lu Chen, Christian S Jensen, Torben Bach Pedersen, Yunjun Gao, and Jilin Hu. Evolutionary clustering of moving objects. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 2399–2411. IEEE, 2022.

Zhenhui Li, Jae-Gil Lee, Xiaolei Li, and Jiawei Han. Incremental clustering for trajectories. In International Conference on Database Systems for Advanced Applications, pages 32–46. Springer, 2010.

Jiali Mao, Qiuge Song, Cheqing Jin, Zhigang Zhang, and Aoying Zhou. Tscluwin: Trajectory stream clustering over sliding window. In International Conference on Database Systems for Advanced Applications, pages 133–148. Springer, 2016.

Jiali Mao, Tao Wang, Cheqing Jin, and Aoying Zhou. Feature grouping-based outlier detection upon streaming trajectories. IEEE Transactions on Knowledge and Data Engineering, 29(12):2696–2709, 2017.

Peter D Grünwald In Jae Myung and Mark A Pitt. Advances in minimum description length: Theory and applications. MIT press, 2005.

Yang Wu, Zhicheng Pan, Pingfu Chao, Junhua Fang, Wei Chen, and Lei Zhao. Lunatory: A real-time distributed trajectory clustering framework for web big data. In International Conference on Web Engineering, pages 219–234. Springer, 2022.

Mingxuan Yue, Yaguang Li, Haoze Yang, Ritesh Ahuja, Yao-Yi Chiang, and Cyrus Shahabi. Detect: Deep trajectory clustering for mobility-behavior analysis. In 2019 IEEE International Conference on Big Data (Big Data), pages 988–997. IEEE, 2019.

Yu Zheng. Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3):1–41, 2015.

Downloads

Published

2023-04-20

How to Cite

Wu, Y. ., Fang, J. ., Chao, P. ., Pan, Z. ., Chen, W. ., & Zhao, L. . (2023). Towards Adaptive Continuous Trajectory Clustering Over a Distributed Web Data Stream. Journal of Web Engineering, 22(01), 105–130. https://doi.org/10.13052/jwe1540-9589.2216

Issue

Section

ICWE2022