A NOVEL TWIG-JOIN SWIFT USING SST-BASED REPRESENTATION FOR EFFICIENT RETRIEVAL OF INTERNET XML
Keywords:XML, query tree pattern, structural summary tree, TJSwift, TJFast
Compiling documents in extensible markup language (XML) plays an important role in accessing data services when both rapid response and the precise use of search engines are required. The main operation in XML query processing is to find nodes that match the given query tree pattern (QTP) in the document. An efficient query service should be based on a skillful representation that can support low complexity and high precision search capabilities. However, accessing too many useless nodes in order to match a query pattern is very time-consuming. This paper proposes a structural summary tree (SST) algorithm that is not only able to satisfy a query, but also has better time-saving efficiency compared with the existing twig-join algorithms such as the TJFast algorithm. A novel twig-join Swift (TJSwift) associated with adjacent linked (AL) lists for the provision of efficient XML query services is also proposed, in which queries can be versatile in terms of predicates. TJSwift can completely preserve hierarchical information, and the new index generated from SST is used to save semantic information. The TJSwift approach can provide template-based indexing for fast data searches. An experiment is also conducted for the evaluation of the TJSwift approach.
World Wide Web Consortium. The document object model. http://www.w3.org/DOM/
A. Berglund, S. Boag, and D. Chamberlin, XML Path Language (XPath) 2.0, W3C
Recommendation, http://www.w3.org/TR/xpath20/, January 2007.
J. Robie and R. Hat, IEEE Internet Computing, "XML Processing and Data Integration with
XQuery", vol. 11 no. 4, pp. 62-67, August 2007.
T. Dalamagas et al., "Clustering XML Documents using Structural Summaries", EDBT Work-shop
on Clustering Information over the Web (ClustWeb04), Heraklion, Greece, 2004.
A. Nierman and H. V. Jagadish, "Evaluating Structural Similarity in XML Documents", Fifth
International Workshop on the Web and Databases (WebDB 2002), 2002.
S. Flesca et al., "Fast Detection of XML Structural Similarity", IEEE Transactions on Knowledge
and Data Engineering, vol. 17, no. 2, pp. 160-175, February 2004.
W. Lian et al., "An Efficient and Scalable Algorithm for Clustering XML Documents by
Structure", IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 1, pp. 82-96,
M. Kozielski, "Improving the Results and Performance of Clustering Bit-encoded XML
Documents", Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06),
J. S. Yuan, X. Y. Li and L. N. Ma, "An Improved XML Document Clustering Using Path Feature",
vol. 2, pp.400-404, 2008 Fifth International Conference on Fuzzy Systems and Knowledge
H. P. Leung, F. L. Chung, S. C. F. Chan and R. Luk, "XML Document Clustering Using Common
Xpath", Proceedings of the International Workshop on Challenges in Web Information Retrieval
and Integration, Tokyo, pp. 91-96, April 2005.
A. Termier, M. C. Rousset and M. Sebag, "treefinder: a first step towards XML data mining",
Proceedings of IEEE International Conference on Data Mining, Maebashi, pp. 450-457, December
J. Yang, W. K. Cheung and X. Chen, "Learning the Kernel Matrix for XML Document
Clustering", Proceedings of the 2005 IEEE International Conference on e-Technology, e-
Commerce and e-Service (EEE'05), Hong Kong, pp. 353-358, April 2005.
J. Liu, J. T. L. Wang, W. Hsu and K. G. Herbert, "XML Clustering by Principal Component
Analysis", Proceedings. Of the 16th IEEE International Conference on Tools with Artificial
Intelligence (ICTAI'04), Boca Raton, pp. 658-662, November 2004.
J. W. Lee, K. Lee and W. Kim, "Preparation For Semantic-Based XML Mining", The 2001 IEEE
International Conference on Data Mining, San Jose, pp. 345-352, November 2001.
M. H. Qureshi and M. H. Samadzadeh, "Determining the Complexity of XML Documents",
Proceedings of the International Conference on Information Technology: Coding and Computing
(ITCC'05), vol. 02, pp. 416 - 421, 2005.
X. Y. Li, "Using Clustering Technology to Improve XML Semantic Search", Proceedings of the
Seventh International Conference on Machine Learning and Cybernetics, vol. 5, pp. 2635-2639,
B. Nicolas, K. Nick and S. Divesh, "Holistic Twig joins: Optimal XML pattern matching", In
Proceedings of the SIGMOD Conference, 2002.
S. Chen, H. G. Li, J. Tatemura, W. P. Hsiung, D. Agrawal and K. S. Candan, "Twig2Stack:
Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents", In
Proceedings of the VLDB Conference, pp. 283-294, September 12-15, 2007.
Jiang H., Wang W., Lu H., Yu J. X., "Holistic twig joins on indexed XML documents",
Proceedings of the VLDB Conference, pp. 273–284, 2003.
J. Lu, T. W. Ling, Z. Bao and C. Wang, "Extended XML Tree Pattern Matching: Theories and
Algorithms," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 3, pp. 402-416, Mar 2011.
J. Lu, T. W. Ling, C. Y. Chan and T. Chen, "From Region Encoding To Extended Dewey: On
Efficient Processing of XML Twig Pattern Matching", In Proceedings of VLDB Conference, pp.
-204, Norway 2005.
R. Busse, M. Carey, D. Florescu, M. Kersten, I. Manolescu, A. Schmidt and F. Waas, XMark an
XML benchmark project. http://monetdb.cwi.nl/xml/index.html
University of Washington XML Repository. http://www.cs.washington.edu/research/xmldatasets/
X. Wu and G. Liu, "XML Twig Pattern Matching Using Version Tree", Data and Knowledge Eng.,
vol. 64, no. 3, pp. 580-599, 2008.
S. K. Izadi, T. Harder, and M. S. Haghjoo, "S3: Evaluation of Tree-Pattern Queries supported by
Structural Summaries," Data and Knowledge Eng., vol. 68, no 1, pp. 126-145, 2009.
M. Hachicha and J. Darmont, "A Survey of XML Tree Patterns", IEEE transactions on knowledge
and Data Engineering, vol. 25, no. 1, January 2013.
H. K. Chang, K. C. Hung, I. C. Jou, "Efficient XML Retrieval Service with Complete Path
Representation" IEICE Transactions on Information and Systems, vol.E96-D, no. 4, pp. 906-917,
L. Qin, J. Xu Yu, B. Ding, TwigList: make twig pattern matching fast, in: Proceedings of the
DASFAA Conference, pp. 850–862, 2007.
M. Ley, DBLP Computer Science Bibliography. http://dblp.uni-trier.de/xml/dblp.xml.