Semantic Relation Extraction from Cultural Heritage Archives
Keywords:Digital Archive, Relation Extraction, Cultural Archive, Word Vector Representation, Information Extraction
Digital preservation technologies are now being increasingly adopted by cultural heritage organizations. This cultural heritage data is often disseminated in the form of digital text through a variety of channels such as Wikipedia, cultural heritage archives, etc. To acquire knowledge from digital data, the extraction technique becomes an important part. However, in the case of digital text, which has characteristics such as ambiguity, complex grammar structures such as the Thai language, and others, it makes it more challenging to extract information with a high level of accuracy. We thus propose a method for improving the performance of data extraction techniques based on word features, multiple instance learning, and unseen word mapping. Word features are used to improve the quality of word definition by concatenating parts of speech (POS) and word position is used to establish the accurate definition of a word and convert all of this into a vector. In addition, we use multiple instance learning to solve issues where words do not fully express the meaning of the triple. We also cluster the particular word to find the predicate word by removing words that are irrelevant between the subject and the object. The difficulty of having a new set of words that have never been trained before can be overcome by using unseen word mapping with sub-word and nearest neighbor word mapping. We conducted several experiments on a cultural heritage knowledge graph to show the efficacy of the proposed method. The results demonstrated that our proposed technique outperforms existing models currently utilized in relation to extraction systems. It can achieve excellent accuracy since its precision, recall, and F1 score are 0.89, 0.88, and 0.89, respectively. Furthermore, it also performed well in terms of unseen word prediction, precision, recall, and F1 score, which were 0.81, 0.87, and 0.84, respectively.
What is meant by “cultural heritage” from www.unesco.org [online]. Access on December 27, 2021.
Marcos Garcia: Semantic Relation Extraction.Resources, Tools and Strategies, Computational Processing of the Portuguese Language, Volume 9727, 2016.
Tiezheng Nie, Derong Shen, Yue Kou, Ge Yu and Dejun Yue: An Entity Relation Extraction Model Based on Semantic Pattern Matching, Eighth Web Information Systems and Applications Conference, 2011.
Huu Nguyen, T., Grishman, R.: Relation Extraction. Perspective from Convolutional Neural Networks. the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015.
Khashabi, D. On the Recursive Neural Networks for Relattion Extraction and Entity Recognition, 2013.
Pennington, J., Socher, R., Manning, C.: GloVe. Global Vectors for Word Representation. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
Mikolov, T., Chen, K., Corrado, G., Dean, J. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.
Xia, Y., Liu, Y. Chinese Event Extraction Using DeepNeural Network with Word Embedding. ArXiv, 2016.
Soni, A., Viswanathan, D., Shavlik, J., Natarajan, S. Learning Relational Dependency Networks for Relation Extraction. Inductive Logic Programming, 2017.
Nguyen, D., Verspoor, K. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings. In Proceedings of the 17th ACL Workshop on Biomedical Natural Language Processing (BioNLP), 2018, pages 129–136.
Bouraoui, B., Jameel, S., Schockaert, S. Relation Induction in Word Embeddings Revisited. Proceedings of the 27th International Conference on Computational Linguistics, 2018.
Zied Bouraoui, Shoaib Jameel, Steven Schockaert: Probabilistic Relation Induction in Vector Space Embeddings, https://arxiv.org/, 2017.
Gormley, M., Yu M., Dredze, M. Improved Relation Extraction with Feature-Rich Compositional Embedding Models. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
Buranasing, W., Phoomvuthisarn, S. Information Extraction for Cultural Heritage Knowledge Acquisition using Word Vector Representation. International Conference on Complex, Intelligent and Software Intensive Systems, 2018.
Choi, D., Choi, K. Automatic Relation Triple Extraction Dependency Parse Tree Traversing. the 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns, 2008.
Wikipedia from https://th.wikipedia.org/wiki [online]. Access on December 30, 2021.
The Princess Maha Chakri Sirindhorn Anthropology Center’s archaeological archive from https://www.sac.or.th [online]. Access on December 30, 2021.