Metaheuristics for Ontology-Based Information Extraction Rule Learning

Authors

  • Michel Capelle Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, the Netherlands
  • Flavius Frasincar Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, the Netherlands https://orcid.org/0000-0002-8031-758X
  • Finn van der Knaap Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, the Netherlands

DOI:

https://doi.org/10.13052/jwe1540-9589.2549

Keywords:

metaheuristics, information extraction rules, ontology learning, Semantic Web

Abstract

The Semantic Web aims to make information intelligible for computers. In the Semantic Web, unstructured information from text is represented using ontologies, such that computers can understand text better. However, adding text information to existing ontologies by hand is time-consuming. Information extraction rules can help to automate this process. In the process of learning information extraction rules, patterns are constructed that consist of lexico-syntactic and lexico-semantic features from text, which aim to extract Resource Description Framework subject-predicate-object expressions. In this paper, we investigate the following four metaheuristics for learning ontology-based information extraction rules: Particle Swarm Optimization, 2-Phase Optimization, Ant Colony Optimization, and Genetic Algorithm (GA). We evaluate all methods using financial news data. GA gives the best F1-measure results, but the other metaheuristics are faster.

Downloads

Download data is not yet available.

Author Biographies

Michel Capelle, Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, the Netherlands

Michel Capelle received the B.S. degree in economics and informatics from Erasmus University Rotterdam, Rotterdam, the Netherlands, in 2012, and the M.S. degree in computational economics, specialized in Semantic Web technology, at Erasmus University Rotterdam, Rotterdam, the Netherlands, in 2015. His research interests include machine learning, linked data, information extraction rule learning, natural language processing, and sentiment analysis. He has published in various conferences and journals in the area of machine learning and the Semantic Web.

Flavius Frasincar, Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, the Netherlands

Flavius Frasincar received the M.S. degree in computer science, in 1996, and the M.Phil. degree in computer science, in 1997, from Politehnica University of Bucharest, Bucharest, Romania, and the P.D.Eng. degree in computer science, in 2000, and the Ph.D. degree in computer science, in 2005, from Eindhoven University of Technology, Eindhoven, the Netherlands.

Since 2005, he has been an Assistant Professor in computer science at Erasmus University Rotterdam, Rotterdam, the Netherlands. He has published in numerous conferences and journals in the areas of databases, Web information systems, personalization, machine learning, and the Semantic Web. He is a member of the editorial board of Computational Linguistics in the Netherlands Journal and co-editor-in-chief of the Journal of Web Engineering. He is also a member of the Association for Computing Machinery.

Finn van der Knaap, Erasmus School of Economics, Erasmus University Rotterdam, 3062 PA Rotterdam, the Netherlands

Finn van der Knaap received the B.S. degree cum laude in Econometrics and Operations Research from Erasmus University Rotterdam, the Netherlands, in 2023, and the M.S. degree summa cum laude in Econometrics and Management Science, specialized in Quantitative Finance, at Erasmus University Rotterdam, the Netherlands, in 2024. In 2025, he received the M.S. degree with distinction in Artificial Intelligence at the University of Edinburgh, Scotland. His research interests include machine learning, finance, and sentiment analysis. He has published in various conferences and journals in the areas of quantitative finance, machine learning, and the Semantic Web.

References

S. M. Abdulrahman and P. Brazdil. Measures for combining accuracy and time for meta-learning. In 2014 International Workshop on Meta-Learning and Algorithm Selection at the 21st European Conference on Artificial Intelligence, volume 1201, pages 49–50. CEUR-WS.org, 2014.

J. S. Aitken. Learning information extraction rules: An inductive logic programming approach. In 15th European Conference on Artificial Intelligence (ECAI 2002), volume 77, page 355. IOS Press, 2002.

A. Askarzadeh, L. dos Santos Coelho, C. E. Klein, and V. C. Mariani. A population-based simulated annealing algorithm for global optimization. In 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2016), pages 4626–4633. IEEE, 2016.

A. Ayadi, A. Samet, F. d. B. de Beuvron, and C. Zanni-Merk. Ontology population with deep learning-based NLP: A case study on the biomolecular network ontology. In 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2019), volume 159, pages 572–581. Elsevier, 2019.

S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein. OWL web ontology language reference – W3C recommendation 10 February 2004, 2004. http://www.w3.org/TR/owl-ref/.

T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34–43, 2001.

C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys, 35(3):268–308, 2003.

J. Borsje, L. Levering, and F. Frasincar. Hermes: A Semantic Web-based news decision support system. In 2008 ACM Symposium on Applied Computing (SAC 2008), pages 2415–2420. ACM, 2008.

M. E. Califf and R. J. Mooney. Bottom-up relational learning of pattern matching rules for information extraction. The Journal of Machine Learning Research, 4:177–210, 2003.

P. Cimiano and J. Völker. Text2Onto: A framework for ontology learning and data-driven change discovery. In 10th International Conference on Applications of Natural Language to Information Systems (NLDB 2005), volume 3513 of LNCS, pages 227–238. Springer, 2005.

W. W. Cohen and Y. Singer. A simple, fast, and effective rule learner. In 16th National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence, AAAI ’99/IAAI ’99, pages 335–342. AAAI, 1999.

M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the world wide Web. In Fifteenth National Conference on Artificial Intelligence and Tenth Innovative Applications of Artificial Intelligence, AAAI ’98/IAAI ’98, pages 509–516. AAAI, 1998.

H. Cunningham, D. Maynard, and V. Tablan. JAPE: A Java annotation patterns engine. Technical report, University of Sheffield, Department of Computer Science, 2000.

M. Dorigo, V. Maniezzo, and A. Colorni. Ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics (SMC 1996), 26(1):29–41, 1996.

O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in KnowItAll. In 13th International Conference on World Wide Web (WWW 2004), pages 100–110. ACM, 2004.

O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Methods for domain-independent information extraction from the Web: An experimental comparison. In 19th National Conference on Artificial Intelligence, pages 391–398. AAAI, 2004.

C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.

F. Frasincar, J. Borsje, and L. Levering. A Semantic Web-based approach for building personalized news services. International Journal of E-Business Research, 5(3):35–53, 2009.

D. Freitag. Machine learning for information extraction in informal domains. Machine Learning, 39(2-3):169–202, 2000.

R. M. Ghoniem, N. Alhelwa, and K. Shaalan. A novel hybrid genetic-whale optimization model for ontology learning from Arabic text. Algorithms, 12(9):182, 2019.

U. Hahn and K. Schnattinger. Towards text knowledge engineering. Hypothesis, 1(2):524–531, 1998.

M. Hajji, M. Qbadou, and K. Mansouri. An adaptation of Text2Onto for supporting the french language. International Journal of Electrical & Computer Engineering, 10(4):3743–3750, 2020.

M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In 14th International Conference on Computational Linguistics (COLING 1992), volume 2, pages 539–545. ACL, 1992.

M. A. Hearst. WordNet: An Electronic Lexical Database and Some of its Applications, chapter Automated Discovery of WordNet Relations, pages 131–151. MIT Press, 1998.

J. H. Holland. Genetic algorithms. Scientific American, 267(1):66–73, 1992.

W. IJntema, F. Hogenboom, F. Frasincar, and D. Vandic. A genetic programming approach for learning semantic information extraction rules from news. In 15th International Conference on Web Information Systems Engineering (WISE 2014), volume 8786 of LNCS, pages 418–432. Springer, 2014.

W. IJntema, J. Sangers, F. Hogenboom, and F. Frasincar. A lexico-semantic pattern language for learning ontology instances from text. Journal of Web Semantics, 15:37–50, 2012.

J. Kennedy and R. Eberhart. Particle swarm optimization. In International Conference on Neural Networks (ICNN 1995), pages 1942–1948. IEEE, 1995.

J.-U. Kietz and K. Morik. A polynomial approach to the constructive induction of structural knowledge. Machine Learning, 14(1):193–217, 1994.

M. S. Kiran. Particle swarm optimization with a new update mechanism. Applied Soft Computing, 60:670–678, 2017.

G. Klyne and J. J. Carroll. Resource description framework (RDF): Concepts and abstract syntax - W3C recommendation 10 February 2004, 2004. http://www.w3.org/TR/rdf-concepts/.

Kunder, M. de. The size of the world wide Web (the internet), 2024. http://www.worldwidewebsize.com.

S. M. Lim, A. B. M. Sultan, M. N. Sulaiman, A. Mustapha, and K. Y. Leong. Crossover and mutation operators of genetic algorithms. International Journal of Machine Learning and Computing, 7(1):9–12, 2017.

A. Maedche and S. Staab. Ontology learning for the Semantic Web. IEEE Intelligent Systems, 16(2):72–79, 2001.

S. Mittal and N. Mittal. Tools for ontology building from texts: Analysis and improvement of the results of Text2Onto. IOSR Journal of Computer Engineering, pages 2278–0661, 2013.

K. E. Parsopoulos and M. N. Vrahatis. UPSO: A unified particle swarm optimization scheme. In International Conference of Computational Methods in Sciences and Engineering, pages 868–873. CRC Press, 2019.

E. Prud’hommeaux and A. Seaborne. SPARQL query language for RDF – W3C recommendation 15 January 2008, 2008. http://www.w3.org/TR/rdf-sparql-query/.

D. K. Sharma, R. Pamula, and D. Chauhan. A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system. Journal of Ambient Intelligence and Humanized Computing, pages 1–20, 2019.

N. Singh and S. Singh. Hybrid algorithm of particle swarm optimization and grey wolf optimizer for improving convergence performance. Journal of Applied Mathematics, 2017:2030489, 2017.

S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34:233–272, 1999.

A. Swami and A. Gupta. Optimization of large join queries. In 1988 ACM SIGMOD International Conference on Management of Data, pages 8–17. ACM, 1988.

M. Vargas-Vera, E. Motta, J. Domingue, S. B. Shum, M. Lanzoni, et al. Knowledge extraction by using an ontology-based annotation tool. In K-CAP 2001 Workshop Knowledge Markup and Semantic Annotation, volume 99 of CEUR Workshop Proceedings. CEUR-WS.org, 2001.

Wikipedia. Laplace distribution, 2023. http://en.wikipedia.org/wiki/Laplace_distribution.

Wikipedia. Occam’s razor, 2023. http://en.wikipedia.org/wiki/Occam%27s_razor.

X.-S. Yang, S. Deb, and S. Fong. Accelerated particle swarm optimization and support vector machine for business optimization and applications. In Third International Conference on Networked Digital Technologies (NDT 2011), volume 136, pages 53–66. Springer, 2011.

R. Yangarber, R. Grishman, P. Tapanainen, and S. Huttunen. Automatic acquisition of domain knowledge for information extraction. In 18th Conference on Computational Linguistics (COLING 2000), volume 2, pages 940–946. Morgan Kaufmann, 2000.

Downloads

Published

2026-05-24

How to Cite

Capelle, M. ., Frasincar, F. ., & Knaap, F. van der . (2026). Metaheuristics for Ontology-Based Information Extraction Rule Learning. Journal of Web Engineering, 25(04), 699–736. https://doi.org/10.13052/jwe1540-9589.2549

Issue

Section

Articles