A NOVEL TWIG-JOIN SWIFT USING SST-BASED REPRESENTATION FOR EFFICIENT RETRIEVAL OF INTERNET XML

Authors

  • YI-WEI KUNG Department of Computer Science and Engineering, National Sun Yat-sen University
  • HSU-KUANG CHANG Department of Information Engineering, I-Shou University
  • CHUNG-NAN LEE Department of Computer Science and Engineering, National Sun Yat-sen University

Keywords:

XML, query tree pattern, structural summary tree, TJSwift, TJFast

Abstract

Compiling documents in extensible markup language (XML) plays an important role in accessing data services when both rapid response and the precise use of search engines are required. The main operation in XML query processing is to find nodes that match the given query tree pattern (QTP) in the document. An efficient query service should be based on a skillful representation that can support low complexity and high precision search capabilities. However, accessing too many useless nodes in order to match a query pattern is very time-consuming. This paper proposes a structural summary tree (SST) algorithm that is not only able to satisfy a query, but also has better time-saving efficiency compared with the existing twig-join algorithms such as the TJFast algorithm. A novel twig-join Swift (TJSwift) associated with adjacent linked (AL) lists for the provision of efficient XML query services is also proposed, in which queries can be versatile in terms of predicates. TJSwift can completely preserve hierarchical information, and the new index generated from SST is used to save semantic information. The TJSwift approach can provide template-based indexing for fast data searches. An experiment is also conducted for the evaluation of the TJSwift approach.

 

Downloads

Download data is not yet available.

References

World Wide Web Consortium. The document object model. http://www.w3.org/DOM/

A. Berglund, S. Boag, and D. Chamberlin, XML Path Language (XPath) 2.0, W3C

Recommendation, http://www.w3.org/TR/xpath20/, January 2007.

J. Robie and R. Hat, IEEE Internet Computing, "XML Processing and Data Integration with

XQuery", vol. 11 no. 4, pp. 62-67, August 2007.

T. Dalamagas et al., "Clustering XML Documents using Structural Summaries", EDBT Work-shop

on Clustering Information over the Web (ClustWeb04), Heraklion, Greece, 2004.

A. Nierman and H. V. Jagadish, "Evaluating Structural Similarity in XML Documents", Fifth

International Workshop on the Web and Databases (WebDB 2002), 2002.

S. Flesca et al., "Fast Detection of XML Structural Similarity", IEEE Transactions on Knowledge

and Data Engineering, vol. 17, no. 2, pp. 160-175, February 2004.

W. Lian et al., "An Efficient and Scalable Algorithm for Clustering XML Documents by

Structure", IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, pp. 82-96,

January 2004.

M. Kozielski, "Improving the Results and Performance of Clustering Bit-encoded XML

Documents", Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06),

J. S. Yuan, X. Y. Li and L. N. Ma, "An Improved XML Document Clustering Using Path Feature",

vol. 2, pp.400-404, 2008 Fifth International Conference on Fuzzy Systems and Knowledge

Discovery, 2008.

H. P. Leung, F. L. Chung, S. C. F. Chan and R. Luk, "XML Document Clustering Using Common

Xpath", Proceedings of the International Workshop on Challenges in Web Information Retrieval

and Integration, Tokyo, pp. 91-96, April 2005.

A. Termier, M. C. Rousset and M. Sebag, "treefinder: a first step towards XML data mining",

Proceedings of IEEE International Conference on Data Mining, Maebashi, pp. 450-457, December

J. Yang, W. K. Cheung and X. Chen, "Learning the Kernel Matrix for XML Document

Clustering", Proceedings of the 2005 IEEE International Conference on e-Technology, e-

Commerce and e-Service (EEE'05), Hong Kong, pp. 353-358, April 2005.

J. Liu, J. T. L. Wang, W. Hsu and K. G. Herbert, "XML Clustering by Principal Component

Analysis", Proceedings. Of the 16th IEEE International Conference on Tools with Artificial

Intelligence (ICTAI'04), Boca Raton, pp. 658-662, November 2004.

J. W. Lee, K. Lee and W. Kim, "Preparation For Semantic-Based XML Mining", The 2001 IEEE

International Conference on Data Mining, San Jose, pp. 345-352, November 2001.

M. H. Qureshi and M. H. Samadzadeh, "Determining the Complexity of XML Documents",

Proceedings of the International Conference on Information Technology: Coding and Computing

(ITCC'05), vol. 02, pp. 416 - 421, 2005.

X. Y. Li, "Using Clustering Technology to Improve XML Semantic Search", Proceedings of the

Seventh International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2635-2639,

July 2008.

B. Nicolas, K. Nick and S. Divesh, "Holistic Twig joins: Optimal XML pattern matching", In

Proceedings of the SIGMOD Conference, 2002.

S. Chen, H. G. Li, J. Tatemura, W. P. Hsiung, D. Agrawal and K. S. Candan, "Twig2Stack:

Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents", In

Proceedings of the VLDB Conference, pp. 283-294, September 12-15, 2007.

Jiang H., Wang W., Lu H., Yu J. X., "Holistic twig joins on indexed XML documents",

Proceedings of the VLDB Conference, pp. 273–284, 2003.

J. Lu, T. W. Ling, Z. Bao and C. Wang, "Extended XML Tree Pattern Matching: Theories and

Algorithms," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 3, pp. 402-416, Mar 2011.

J. Lu, T. W. Ling, C. Y. Chan and T. Chen, "From Region Encoding To Extended Dewey: On

Efficient Processing of XML Twig Pattern Matching", In Proceedings of VLDB Conference, pp.

-204, Norway 2005.

R. Busse, M. Carey, D. Florescu, M. Kersten, I. Manolescu, A. Schmidt and F. Waas, XMark an

XML benchmark project. http://monetdb.cwi.nl/xml/index.html

University of Washington XML Repository. http://www.cs.washington.edu/research/xmldatasets/

X. Wu and G. Liu, "XML Twig Pattern Matching Using Version Tree", Data and Knowledge Eng.,

vol. 64, no. 3, pp. 580-599, 2008.

S. K. Izadi, T. Harder, and M. S. Haghjoo, "S3: Evaluation of Tree-Pattern Queries supported by

Structural Summaries," Data and Knowledge Eng., vol. 68, no 1, pp. 126-145, 2009.

M. Hachicha and J. Darmont, "A Survey of XML Tree Patterns", IEEE transactions on knowledge

and Data Engineering, vol. 25, no. 1, January 2013.

H. K. Chang, K. C. Hung, I. C. Jou, "Efficient XML Retrieval Service with Complete Path

Representation" IEICE Transactions on Information and Systems, vol.E96-D, no. 4, pp. 906-917,

L. Qin, J. Xu Yu, B. Ding, TwigList: make twig pattern matching fast, in: Proceedings of the

DASFAA Conference, pp. 850–862, 2007.

M. Ley, DBLP Computer Science Bibliography. http://dblp.uni-trier.de/xml/dblp.xml.

Downloads

Published

2015-03-18

Issue

Section

Articles