A Structural Approach to Extracting Chinese Position Relations from Web Pages

Authors

  • Peiquan Jin University of Science and Technology of China, China
  • JIA YANG University of Science and Technology of China, China
  • JIE ZHAO Anhui University, China
  • YANHONG LIU University of Science and Technology of China, China

Keywords:

Position Relation, Relation Extraction, Structural File Segment

Abstract

Abstract: The use of position relations, which refer to the position of people in an organization, can serve for enterprises as a significant competitive intelligence method. The rapid growth of the data volume in the Web brings new opportunities for us to extract position relations of interest from the Web. In this paper, we propose a new algorithm to extract position relations from the Web. Our algorithm is based on the structural feature of position relations in the Web, i.e., a position relation is usually presented in Web pages as a table or a list. In order to define the structural feature of Web content, we first introduce a structural coefficient for each Web page, which is then used to generate structural file segments for Web pages. A structural file segment consists of all candidates of position relations having a similar structure. After that, we employ a pattern-matching method to extract position relations from the structural file segments. Finally, we conduct experiments on a real data set containing 6028 Chinese Web pages gathered by the Baidu search engine, and evaluate precision and recall of our approach. The experimental results confirm that our algorithm has a precision over 96% and a recall over 87%.

 

Downloads

Download data is not yet available.

Downloads

Published

2013-03-28

How to Cite

Jin, P., YANG, J. ., ZHAO, J., & LIU, Y. (2013). A Structural Approach to Extracting Chinese Position Relations from Web Pages. Journal of Web Engineering, 12(5), 363–382. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/4437

Issue

Section

Articles