A STRUCTURAL APPROACH TO EXTRACTING CHINESE POSITION RELATIONS FROM WEB PAGES

Authors

  • PEIQUAN JIN University of Science and Technology of China, China
  • JIA YANG University of Science and Technology of China, China
  • JIE ZHAO Anhui University, China
  • YANHONG LIU University of Science and Technology of China, China

Keywords:

Position Relation, Relation Extraction, Structural File Segment

Abstract

The use of position relations, which refer to the position of people in an organization, can serve for enterprises as a significant competitive intelligence method. The rapid growth of the data volume in the Web brings new opportunities for us to extract position relations of interest from the Web. In this paper, we propose a new algorithm to extract position relations from the Web. Our algorithm is based on the structural feature of position relations in the Web, i.e., a position relation is usually presented in Web pages as a table or a list. In order to define the structural feature of Web content, we first introduce a structural coefficient for each Web page, which is then used to generate structural file segments for Web pages. A structural file segment consists of all candidates of position relations having a similar structure. After that, we employ a pattern-matching method to extract position relations from the structural file segments. Finally, we conduct experiments on a real data set containing 6028 Chinese Web pages gathered by the Baidu search engine, and evaluate precision and recall of our approach. The experimental results confirm that our algorithm has a precision over 96% and a recall over 87%.

 

Downloads

Download data is not yet available.

References

Agichtein, E., & Gravano, L., Snowball: Extracting Relations from Large Plain-text Collections.

In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000, 85-94

Brin, S., Extracting Patterns and Relations from the World-Wide Web. In Proceedings of the 1998

International Workshop on the Web and Databases (WebDB’98), 1998, 172-183

Kim, S., Jeong, M., Lee, G. G., Ko, K., & Lee, Z., An Alignment-based Approach to Semisupervised

Relation Extraction Including Multiple Arguments. In Proceedings of AIRS, LNCS

, 2008, 526-536

Li, W. G., Liu, T., & Li, S. Automated Entity Relation Tuple Extraction Using Web Mining.

ACTA Electronica Sinica, 2007, 35(11): 2111-2116

Ravichandran, D., & Hovy, E. Learning Surface Text Patterns for a Question Answering System.

In Proceedings of the ACL Conference, 2002, 41-47

Reichartz, F., Korte, H. and Paass, G., Dependency Tree Kernels for Relation Extraction from

Natural Language Text. In Proceedings of ECML/PKDD, 2009, 270-285

Giuliano, C., Lavelli, A., Pighin, D., & Romano, L., FBK-IRST: Kernel Methods for Semantic

Relation Extraction. In Proceedings of the 4th International Workshop on Semantic Evaluations

(SemEval-2007), 2007, pp.141-144

Huang, R., Sun, L., & Feng, Y., Study of Kernel-Based Methods for Chinese Relation Extraction,

In Proceedings of AIRS, LNCS 4993, 2008, pp.598-604

Zelenko, D., Aone, C., & Richardella, A., Kernel Methods for Relation Extraction. Journal of

Machine Learning Research, 2003, 3: 1059-1082

Zhao, S. B., & Grishman, R., Extracting Relations with Integrated Information Using Kernel

Methods. In Proceedings of the 43rd Annual Meeting of the ACL, 2005, pp.419-426

Zhang, Y., Xu, X., & Zhang, T., Fusion of Multiple Features for Chinese Named Entity

Recognition Based on CRF Model, In Proceedings of AIRS, LNCS 4993, 2008, pp.95-106

Yao, L., Sun, C., Wang, X., & Wang, X., (2010) Combining Self Learning and Active Learning

for Chinese Named Entity Recognition, Journal of Software, 2011, 5(5): 530-537

Liu, Y., Jin, P., Yue, L., Extracting Position Relations from the Web, In Proceedings of 11th ACM

International Workshop on Web Information and Data Management (WIDM’09), Hong Kong,

China, 2009, pp. 59-62

ICTCLAS, http://www.ictclas.org (2008, accessed April 2012)

Jin, P., Chen, H., Lin, S., Zhao, X., Li, X., & Yue, L., Indexing Temporal Information for Web

Pages, Computer Science and Information Systems, 2011, 8(3): 711-737

Jin, P., Li, X., Chen, H., Yue, L., CT-Rank: A Time-aware Ranking Algorithm for Web Search,

Journal of Convergence Information Technology, 2010, 5(6): 99-111

Downloads

Published

2013-03-22

How to Cite

JIN, P. ., YANG, J. ., ZHAO, J. ., & LIU, Y. . (2013). A STRUCTURAL APPROACH TO EXTRACTING CHINESE POSITION RELATIONS FROM WEB PAGES. Journal of Web Engineering, 12(1-2), 363–382. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4187

Issue

Section

Articles