Abstract Concept Instantiation with Context Relevance Measurement

  • Shengwei Gu School of Computer Engineering and Science, Shanghai University, Shanghai, China and School of Computer and Information Engineering, Chuzhou University, Chuzhou, China https://orcid.org/0000-0003-1003-0185
  • Xiangfeng Luo School of Computer Engineering and Science, Shanghai University, Shanghai, China and Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China
  • Hao Wang School of Computer Engineering and Science, Shanghai University, Shanghai, China and Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China
  • Jing Huang Ant Financial Services Group, Hangzhou, China
  • Subin Huang School of Computer Engineering and Science, Shanghai University, Shanghai, China and School of Computer and Information, Anhui Polytechnic University, Wuhu, China
Keywords: Abstract concept instantiation, contextual constraint, instance ranking

Abstract

In different contexts, one abstract concept (e.g., fruit) may be mapped into different concrete instance sets, which is called abstract concept instantiation. It has been widely applied in many applications, such as web search, intelligent recommendation, etc. However, in most abstract concept instantiation models have the following problems: (1) the neglect of incorrect label and label incompleteness in the category structure on which instance selection relies; (2) the subjective design of instance profile for calculating the relevance between instance and contextual constraint. The above problems lead to false prediction in terms of abstract concept instantiation. To tackle these problems, we proposed a novel model to instantiate the abstract concept. Firstly, to alleviate the incorrect label and remedy label incompleteness in the category structure, an improved random-walk algorithm is proposed, called InstanceRank, which not only utilize the category information, but it also exploits the association information to infer the right instances of an abstract concept. Secondly, for better measuring the relevance between instances and contextual constraint, we learn the proper instance profile from different granularity ones. They are designed based on the surrounding text of the instance. Finally, noise reduction and instance filtering are introduced to further enhance the model performance. Experiments on Chinese food abstract concept set show that the proposed model can effectively reduce false positive and false negative of instantiation results.

Downloads

Download data is not yet available.

Author Biographies

Shengwei Gu, School of Computer Engineering and Science, Shanghai University, Shanghai, China and School of Computer and Information Engineering, Chuzhou University, Chuzhou, China

Shengwei Gu received the master’s degree in School of Mathematics and Computer Science from Nanjing Normal University in 2008, China. Currently, he is pursuing his PhD degree in the School of Computer Engineering and Science, Shanghai University, China. His main research interests include information retrieval and question answering systems.

Xiangfeng Luo, School of Computer Engineering and Science, Shanghai University, Shanghai, China and Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China

Xiangfeng Luo is a professor in the School of Computer Engineering and Science, Shanghai University, China. He received the master’s and PhD degrees from the Hefei University of Technology in 2000 and 2003, respectively. He was a postdoctoral researcher with the China Knowledge Grid Research Group, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), from 2003 to 2005. His main research interests include Web Wisdom, Cognitive Informatics, and Text Understanding. He has authored or co-authored more than 50 publications and his publications have appeared in IEEE Trans. on Automation Science and Engineering, IEEE Trans. on Systems, Man, and Cybernetics-Part C, and IEEE Trans. on Learning Technology, Concurrency and Computation: Practice and Experience, etc. He has served as the Guest Editor of ACM Transactions on Intelligent Systems and Technology, as well as more than 40 PC members of conferences and workshops.

Hao Wang, School of Computer Engineering and Science, Shanghai University, Shanghai, China and Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China

Hao Wang received the PhD degree from Waseda University in 2019, partly supported by Oversea Graduate Student Project of the China Scholarship Council. He is currently an assistant professor of Shanghai University. His research interests include natural language processing, especially machine translation.

Jing Huang, Ant Financial Services Group, Hangzhou, China

Jing Huang received his master’s degree from Nankai University and Boston University in 1995 and 1998, respectively. He is currently working at Ant Financial Services Group, Hangzhou, China. His research interests include data mining and knowledge graph.

Subin Huang, School of Computer Engineering and Science, Shanghai University, Shanghai, China and School of Computer and Information, Anhui Polytechnic University, Wuhu, China

Subin Huang received the master’s degree in School of Computer and Information from Anhui Polytechnic University in 2012, China. Currently, he is pursuing his PhD degree in the School of Computer Engineering and Science, Shanghai University, China. His main research interests include information retrieval, data mining, and knowledge graph.

References

Marcel Adam Just, Jing Wang, and Vladimir Cherkassky. Neural representations of the concepts in simple sentences: Concept activation prediction and context effects. NeuroImage, 157:511–520, 2017.

Subin Huang, Xiangfeng Luo, Jing Huang, Yike Guo, and Shengwei Gu. An unsupervised approach for learning a chinese IS-A taxonomy from an unstructured corpus. Knowl. Based Syst., 182, 2019.

Jack Sun, Franky, Kenny Q. Zhu, and Haixun Wang. Query suggestion by concept instantiation. In Proceedings of the ISWC 2013 Posters & Demonstrations Track, volume 1035 of CEUR Workshop Proceedings, pages 181–184, Sydney, Australia, 2013. CEUR-WS.org.

Yue Wang, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. Concept-based web search. In Conceptual Modeling - 31st International Conference ER, volume 7532 of Lecture Notes in Computer Science, pages 449–462, Florence, Italy, 2012. Springer.

Sheng-Jun Huang, Wei Gao, and Zhi-Hua Zhou. Fast multi-instance multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell., 41(11):2614–2627, 2019.

Yueguo Chen, Lexi Gao, Shuming Shi, Xiaoyong Du, and Ji-Rong Wen. Improving context and category matching for entity search. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pages 16–22, Québec, Canada, 2014. AAAI Press.

Krisztian Balog, Marc Bron, Maarten de Rijke, and Wouter Weerkamp. Combining term-based and category-based representations for entity search. In Focused Retrieval and Evaluation, 8th International Workshop of the Initiative for the Evaluation of XML Retrieval, volume 6203 of Lecture Notes in Computer Science, pages 265–272, Brisbane, Australia, 2009. Springer.

Yi Fang and Luo Si. Related entity finding by unified probabilistic models. World Wide Web, 18(3):521–543, 2015.

Rianne Kaptein and Jaap Kamps. Exploiting the category structure of wikipedia for entity ranking. Artif. Intell., 194:111–129, 2013.

Krisztian Balog, Marc Bron, and Maarten de Rijke. Query modeling for entity search based on terms, categories, and examples. ACM Trans. Inf. Syst., 29(4):22:1–22:31, 2011.

Denghao Ma, Yueguo Chen, Kevin Chen-Chuan Chang, Xiaoyong Du, Chuanfei Xu, and Yi Chang. Leveraging fine-grained wikipedia categories for entity search. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages 1623–1632, Lyon, France, 2018. ACM.

Krisztian Balog, Marc Bron, and Maarten de Rijke. Category-based query modeling for entity search. In Advances in Information Retrieval, 32nd European Conference on IR Research, volume 5993 of Lecture Notes in Computer Science, pages 319–331, Milton Keynes, UK, 2010. Springer.

Anne-Marie Vercoustre, Jovan Pehcevski, and James A. Thom. Using wikipedia categories and links in entity ranking. In Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, volume 4862 of Lecture Notes in Computer Science, pages 321–335, Dagstuhl Castle, Germany, 2007. Springer.

Theodora Tsikrika, Pavel Serdyukov, Henning Rode, Thijs Westerveld, Robin Aly, Djoerd Hiemstra, and Arjen P. de Vries. Structured document retrieval, multimedia retrieval, and entity ranking using pf/tijah. In Focused Access to XML Documents, 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, volume 4862 of Lecture Notes in Computer Science, pages 306–320, Dagstuhl Castle, Germany, 2007. Springer.

Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, and Karl Aberer. Trank: Ranking entity types using the web of data. In The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, volume 8218 of Lecture Notes in Computer Science, pages 640–656, Sydney, Australia, 2013. Springer.

Marco Gori and Augusto Pucci. Itemrank: A random-walk based scoring algorithm for recommender engines. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 2766–2771, Hyderabad, India, 2007.

Yi Fang, Luo Si, and Aditya P. Mathur. Discriminative models of integrating document evidence and document-candidate associations for expert search. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 683–690, Geneva, Switzerland, 2010. ACM.

Shinryo Uchida, Takehiro Yamamoto, Makoto P. Kato, Hiroaki Ohshima, and Katsumi Tanaka. Entity ranking by learning and inferring pairwise preferences from user reviews. In Information Retrieval Technology - 13th Asia Information Retrieval Societies Conference, volume 10648 of Lecture Notes in Computer Science, pages 141–153, Jeju Island, South Korea, 2017. Springer.

Le Li, Junyi Xu, Weidong Xiao, Shengze Hu, and Haiming Tong. Exploiting external knowledge and entity relationship for entity search. In Natural Language Understanding and Intelligent Applications - 5th CCF Conference on Natural Language Processing and Chinese Computing, and 24th International Conference on Computer Processing of Oriental Languages, volume 10102 of Lecture Notes in Computer Science, pages 689–700, Kunming, China, 2016. Springer.

Andrew Karem and Hichem Frigui. Multiple instance learning with multiple positive and negative target concepts. In 23rd International Conference on Pattern Recognition, pages 474–479, Cancún, Mexico, 2016. IEEE.

Tao Xu, Iker Gondra, and David K. Y. Chiu. A maximum partial entropy-based method for multiple-instance concept learning. Appl. Intell., 46(4):865–875, 2017.

Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, USA, 2010. AAAI Press.

Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of 14th International Conference on Computational Linguistics, pages 539–545, Nantes, France, 1992.

Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb, and Malek Ezzeddine. Derivation of “is a” taxonomy from wikipedia category graph. Eng. Appl. Artif. Intell., 50:265–286, 2016.

Krisztian Balog, Leif Azzopardi, and Maarten de Rijke. Formal models for expert finding in enterprise corpora. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43–50, Washington, USA, 2006. ACM.

Gemma Boleda, Abhijeet Gupta, and Sebastian Padó. Instances and concepts in distributional space. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 79–85, Valencia, Spain, 2017. Association for Computational Linguistics.

Xin Lv, Lei Hou, Juanzi Li, and Zhiyuan Liu. Differentiating concepts and instances for knowledge graph embedding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1971–1979, Brussels, Belgium, 2018. Association for Computational Linguistics.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pages 3111–3119, Lake Tahoe, United States, 2013.

Subin Huang, Xiangfeng Luo, Jing Huang, Hao Wang, Shengwei Gu, and Yike Guo. Improving taxonomic relation learning via incorporating relation descriptions into word embeddings. Concurrency and Computation: Practice and Experience, 2020.

Michael R. Smith and Tony R. Martinez. The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks. Artif. Intell. Rev., 49(1):105–130, 2018.

Wanxiang Che, Zhenghua Li, and Ting Liu. LTP: A chinese language technology platform. In 23rd International Conference on Computational Linguistics, pages 13–16, Beijing, China, 2010. Demonstrations Volume.

Stephen Roller, Douwe Kiela, and Maximilian Nickel. Hearst patterns revisited: Automatic hypernym detection from large text corpora. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 358–363, Melbourne, Australia, 2018. Association for Computational Linguistics.

Published
2020-09-27
Section
Articles