Mining Simple Path Traversal Patterns in Knowledge Graph
Keywords:Knowledge graph, Path traversal, Data mining
The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.
Augenstein, I., Das, M., Riedel, S., Vikraman, L. and McCallum, A., 2017. Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint arXiv:1704.02853.
Ayres, J., Flannick, J., Gehrke, J. and Yiu, T., 2002, July. Sequential pattern mining using a bitmap representation. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 429–435).
Ashwin Balani. Another alternative for generating synthetic sequences databases. http://philippe-fournier-viger.com/spmf/SequentialPatternGenerator.m, 2015.
Chapela-Campa, D., Mucientes, M. and Lama, M., 2019. Mining frequent patterns in process models. Information Sciences, 472, pp. 235–257.
Chen, M.S., Park, J.S. and Yu, P.S., 1998. Efficient data mining for path traversal patterns. IEEE Transactions on knowledge and data engineering, 10(2), pp. 209–221.
Cormen, T.H., Leiserson, C.E., Rivest, R.L. and Stein, C., 2009. Introduction to algorithms. MIT press.
Costa, J.D.J., Bernardini, F., Artigas, D. and Viterbo, J., 2019. Mining direct acyclic graphs to find frequent substructures—an experimental analysis on educational data. Information Sciences, 482, pp. 266–278.
Galárraga, L. and Suchanek, F., 2016. Rule Mining in Knowledge Bases (Doctoral dissertation, Thèse, Spécialité “Informatique”, TELECOM ParisTech).
Deshpande, O., Lamba, D.S., Tourn, M., Das, S., Subramaniam, S., Rajaraman, A., Harinarayan, V. and Doan, A., 2013, June. Building, maintaining, and using knowledge bases: a report from the trenches. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (pp. 1209–1220).
Galárraga, L., 2014, November. Applications of rule mining in knowledge bases. In Proceedings of the 7th Workshop on Ph. D Students (pp. 45-49).
Galárraga, L.A., Preda, N. and Suchanek, F.M., 2013, October. Mining rules to align knowledge bases. In Proceedings of the 2013 workshop on Automated knowledge base construction (pp. 43–48).
Gao, J., Qiu, H., Jiang, X., Wang, T. and Yang, D., 2010, October. Fast top-k simple shortest paths discovery in graphs. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 509–518).
Hershberger, J., Maxel, M. and Suri, S., 2007. Finding the k shortest simple paths: A new algorithm and its implementation. ACM Transactions on Algorithms (TALG), 3(4), pp. 45–es.
Huan, J., Wang, W., Prins, J. and Yang, J., 2004, August. Spin: mining maximal frequent subgraphs from graph databases. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 581–586).
Ingalalli, V., Ienco, D. and Poncelet, P., 2018. Mining frequent subgraphs in multigraphs. Information Sciences, 451, pp. 50–66.
Jiang, C., Coenen, F. and Zito, M., 2010, August. Frequent sub-graph mining on edge weighted graphs. In International Conference on Data Warehousing and Knowledge Discovery (pp. 77–88). Springer, Berlin, Heidelberg.
Li, X., Taheri, A., Tu, L. and Gimpel, K., 2016, August. Commonsense knowledge base completion. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1445–1455).
Liang, J., Xiao, Y., Zhang, Y., Hwang, S.W. and Wang, H., 2017, February. Graph-based wrong isa relation detection in a large-scale lexical taxonomy. In Thirty-First AAAI Conference on Artificial Intelligence.
Luo, T. and Yu, J., 2019. Friends along supply chain and relationship-specific investments. Review of Quantitative Finance and Accounting, 53(3), pp. 895–931.
Muñoz, E. and Nickles, M., 2017, August. Mining cardinalities from knowledge bases. In International Conference on Database and Expert Systems Applications (pp. 447–462). Springer, Cham.
Nanopoulos, A. and Manolopoulos, Y., 2001. Mining patterns from graph traversals. Data & Knowledge Engineering, 37(3), pp. 243–266.
Nowak-Brzeziñska, A., 2017, July. Outlier mining in rule-based knowledge bases. In 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) (pp. 391–396). IEEE.
Oswald, C., Kumar, I.A., Avinash, J. and Sivaselvan, B., 2017, December. A graph-based frequent sequence mining approach to text compression. In International Conference on Mining Intelligence and Knowledge Exploration (pp. 371–380). Springer, Cham.
Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M., 2001, April. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In proceedings of the 17th international conference on data engineering (pp. 215–224). IEEE Washington, DC, USA.
Petitjean, F., Li, T., Tatti, N. and Webb, G.I., 2016. Skopus: Mining top-k sequential patterns under leverage. Data Mining and Knowledge Discovery, 30(5), pp. 1086–1111.
Poole, D.L. and Mackworth, A.K., 2010. Artificial Intelligence: foundations of computational agents. Cambridge University Press.
Pujara, J. and Singh, S., 2018, February. Mining knowledge graphs from text. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 789–790).
Rahman, M.M., Ahmed, C.F. and Leung, C.K.S., 2019. Mining weighted frequent sequences in uncertain databases. Information Sciences, 479, pp. 76–100.
Shang, F., Ding, Q., Du, R., Cao, M. and Chen, H., 2021. Construction and Application of the User Behavior Knowledge Graph in Software Platforms. J. Web Eng., 20(2), pp. 387–412.
Shin, J., Wu, S., Wang, F., De Sa, C., Zhang, C. and Ré, C., 2015, July. Incremental knowledge base construction using deepdive. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases (Vol. 8, No. 11, p. 1310). NIH Public Access.
Blue Martini Software. Bmswebview2 (gazelle) dataset. http://www.kdd.org/sites/default/files/kddcup/site/2000/files/KDDCup2000.zip
Srikant, R. and Agrawal, R., 1996, March. Mining sequential patterns: Generalizations and performance improvements. In International conference on extending database technology (pp. 1–17). Springer, Berlin, Heidelberg.
Symeonidou, D., Galárraga, L., Pernelle, N., Saïs, F. and Suchanek, F., 2017, October. Vickey: Mining conditional keys on knowledge bases. In International Semantic Web Conference (pp. 661–677). Springer, Cham.
Talukder, N. and Zaki, M.J., 2016. A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery, 30(5), pp. 1024–1052.
Ting, I.H., Kimble, C. and Kudenko, D., 2009. Finding unexpected navigation behaviour in clickstream data for website design improvement. Journal of Web Engineering, 8(1), p. 71.
Wu, W., Li, H., Wang, H. and Zhu, K.Q., 2012, May. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (pp. 481–492).
Yan, X., Cheng, H., Han, J. and Yu, P.S., 2008, June. Mining significant graph patterns by leap search. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 433–444).
Yan, X. and Han, J., 2002, December. gspan: Graph-based substructure pattern mining. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. (pp. 721–724). IEEE.
Yang, Z., Wang, Y. and Kitsuregawa, M., 2007, April. LAPIN: effective sequential pattern mining algorithms by last position induction for dense databases. In International Conference on Database systems for advanced applications (pp. 1020–1023). Springer, Berlin, Heidelberg.
Yen, J.Y., 1971. Finding the k shortest loopless paths in a network. management Science, 17(11), pp. 712–716.
Zaki, M.J., 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine learning, 42(1), pp. 31–60.