Ontology-Driven News Classification with Aethalides

Authors

  • Wouter Rijvordt Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, the Netherlands
  • Frederik Hogenboom Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, the Netherlands
  • Flavius Frasincar Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, the Netherlands

DOI:

https://doi.org/10.13052/jwe1540-9589.1873

Keywords:

News personalization, word sense disambiguation, ontology learning, semantic web

Abstract

The ever-increasing amount of Web information offered to news readers (e.g., news analysts) stimulates the need for news selection, so that informed decisions can be made with up-to-date knowledge. Hermes is an ontology-based framework for building news personalization services. It uses an ontology crafted from available news sources, allowing users to select and filter interesting concepts from a domain ontology. The Aethalides framework enhances the Hermes framework by enabling news classification through lexicographic and semantic properties. For this, Aethalides applies word sense disambiguation and ontology learning methods to news items. When tested on a set of news items on finance and politics, the Aethalides implementation yields a precision and recall of 74.4% and 49.4%, respectively, yielding an F0.5-measure of 67.6% when valuing precision more than recall.

Downloads

Download data is not yet available.

Author Biographies

Wouter Rijvordt, Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, the Netherlands

Wouter Rijvordt received his bachelor and master degrees in economics and informatics at the Erasmus University Rotterdam, the Netherlands, in 2013. His current research interests lie in the fields of machine learning and Big Data. Currently, he works at Eneco as a data engineer and developer.

Frederik Hogenboom, Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, the Netherlands

Frederik Hogenboom obtained the master degree with honours in (computational) economics and informatics at the Erasmus University Rotterdam, the Netherlands, in 2009. During his bachelor and master programmes, his published research mainly in the fields of the Semantic Web and learning agents. In 2014, he received the PhD degree in computer science from the Erasmus University Rotterdam, the Netherlands, where he focused on financial event extraction from news applied to algorithmic trading, disseminated in numerous publications. His current research interests and endeavours mainly go out to natural language processing and Semantic Web technologies.

Flavius Frasincar, Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, the Netherlands

Flavius Frasincar obtained the master degree in computer science from the Politehnica University Bucharest, Romania, in 1998. In 2000, he received the professional doctorate degree in software engineering from the Eindhoven University of Technology, the Netherlands. He got the PhD degree in computer science from the Eindhoven University of Technology, the Netherlands, in 2005. Since 2005, he is assistant professor in information systems at the Erasmus University Rotterdam, the Netherlands. He published numerous publications in the areas of databases, Web information systems, personalization, and the Semantic Web. He is a member of the editorial board of the Journal of Web Engineering, International Journal of Web Engineering and Technology, Decision Support Systems, and Computational Linguistics in the Netherlands.

References

Liliana Ardissono, Luca Console, and Ilaria Torre. ‘An Adaptive System for the Personalized Access to News’. AI Communications, 14(3):129–147, 2001.

Gilles Bisson, Claire Nédellec, and Dolores Ca namero. ‘Designing Clustering Methods for Ontology Building: The Mo’K Workbench’. In Workshop on Ontology Learning at 14th European Conference on Artificial Intelligence (ECAI 2000), volume 31 of CEUR Workshop Proceedings, pages 13–19. CEUR-WS.org, 2000.

Paul Buitelaar, Philipp Cimiano, and Bernardo Magnini. Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, 2005.

Paul Buitelaar, Philipp Cimiano, and Bernardo Magnini. OntoLT: Middleware for Ontology Extraction from Text. IOS Press, 2005.

Paul Buitelaar, Daniel Olejnik, and Michael Sintek. ‘A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis’. In 1st European Semantic Web Symposium (ESWS 2004), volume 3053 of Lecture Notes in Computer Science, pages 31–44. Springer, 2004.

Philipp Cimiano, Andreas Hotho, and Steffen Staab. ‘Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis’. Journal of Artificial Intelligence Research, 24(1):305–339, 2005.

Hamish Cunningham. ‘GATE, a General Architecture for Text Engineering’. Computers and the Humanities, 36(2):223–254, 2002.

Bart Decadt, Véronique Hoste, Walter Daelemans, and Antal van den Bosch. ‘GAMBL, Genetic Algorithm Optimization of Memory-Based WSD’. In 3rd International Work-shop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval–3) at 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pages 108–112. Association for Computational Linguistics, 2004.

DG-JRC and DG-Press. EMM News Explorer, 2018. From: http://emm.newsexplorer.eu/NewsExplorer/home/en/latest.html.

dlvr.it. Smart Social Media Automation, 2018. From: https://dlvrit.com/.

John Domingue and Enrico Motta. ‘PlanetOnto: From News Publishing to Integrated Knowledge Management Support’. IEEE Intelligent Systems, 15(3):26–32, 2000.

FeedsAPI. FeedsAPI: Create Full Text RSS Feeds Instantly, 2018. From: http://www.feedsapi.com/.

Christiane Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.

Flavius Frasincar, Jethro Borsje, and Leonard Levering. ‘A Semantic Web-Based Approach for Building Personalized News Services’. International Journal of E-Business Research, 5(3):35–53, 2009.

Frederik Hogenboom, Michael de Winter, Flavius Frasincar, and Uzay Kaymak. ‘A News Event-Driven Approach for the Historical Value at Risk Method’. Expert Systems With Applications, 42(10):4667–4675, 2015.

Frederik Hogenboom, Flavius Frasincar, Uzay Kaymak, Franciska de Jong, and Emiel Caron. ‘A Survey of Event Extraction Methods from Text for Decision Support Systems’. Decision Support Systems, 85:12–22, 2016.

Frederik Hogenboom, Damir Vandic, Flavius Frasincar, Arnout Verheij, and Allard Kleijn. A Query Language and Ranking Algorithm for News Items in the Hermes News Processing Framework. Science of Computer Programming, 94, Part 1:32–52, 2014.

Chihli Hung and Shiuan-Jeng Cheng. ‘Word Sense Disambiguation Based Sentiment Lexicons for Sentiment Classification’. Knowledge-Based Systems, 110:224–232, 2016.

Nancy Ide and Jean Véronis. ‘Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art’. Computational Linguistics, 24(1):1–40, 1998.

Wouter IJntema, Jordy Sangers, Frederik Hogenboom, and Flavius Fras-incar. ‘A Lexico-Semantic Pattern Language for Learning Ontology Instances from Text’. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 15(1):37–50, 2012.

Akshay Java, Tim Finin, and Sergei Nirenburg. ‘Text Understanding Agents and the Semantic Web’. In 39th Hawaii International Conference on Systems Science (HICSS 2006), volume 3, page 62b. IEEE Computer Society, 2006.

Jay J. Jiang and David W. Conrath. ‘Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy’. In 10th International Conference on Research in Computational Linguistics (ROCLING 1997), pages 19–33, 1997.

JWNL. Java WordNet Library, 2018. From: https://sourceforge.net/projects/jwordnet/.

Yannis Kalfoglou, John Domingue, Enrico Motta, Maria Vargas-Vera, and Simon Buckingham Shum. ‘myPlanet: an Ontology-Driven Web-Based Personalized News Service’. In Workshop on Ontologies and Information Sharing at 17th International Joint Conferences on Artificial Intelligence (IJCAI 2001), volume 47 of CEUR Workshop Proceedings, pages 44–52. CEUR-WS.org, 2001.

Henry Kučera and Francis W. Nelson. Computational Analysis of Present-Day American English. University Press of New England, 1967.

Michael Lesk. ‘Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone’. In 5th Annual International Conference on Systems Documentation (SIGDOC 1986), pages 24–26. ACM, 1986.

Alexander Maedche and Raphael Volz. ‘The Ontology Extraction and Maintenance Framework Text-To-Onto’. In Workshop on Integrating Data Mining and Knowledge Management (DM-KM 2001), at the 2001 IEEE International Conference on Data Mining (ICDM 2001), 2001. From: http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf.

Kevin Meijer, Flavius Frasincar, and Frederik Hogenboom. ‘A Semantic Approach for Extracting Domain Taxonomies from Text’. Decision Support Systems, 62:78–93, 2014.

Rada Mihalcea and Andras Csomai. ‘SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text’. In 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pages 53–56. Association for Computational Linguistics, 2005.

George A. Miller, Martin Chodorow, Shari Landes, Claudia Leacock, and Robert G. Thomas. ‘Using a Semantic Concordance for Sense Identification’. In ARPA Human Language Technology Workshop (HLT 1994). Morgan Kaufmann, 1994.

Roberto Navigli and Paola Velardi. ‘Structural Semantic Interconnections: A Knowledge-Based Approach to Word Sense Disambiguation’. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1075–1086, 2005.

Nikos Papadakis, Haridimos Kondylakis, Anastasios Kalaentzis, Ioannis Komporakis, Ioannis A. Deligiannis, Malvina Steiakaki, George Alexiou, and Xanthoula Atsalaki. ‘BlogSearch: Semantic Services for Aggregating and Searching Blog Articles’. International Journal of Semantic Computing, 10(3):399–415, 2016.

Koralia Papadokostaki, Stavros Charitakis, George Vavoulas, Stella Panou, Paraskevi Piperaki, Aris Papakonstantinou, Savvas Lemonakis, Anna Maridaki, Konstantinos Iatrou, Piotr Arent, Dawid Wiśniewski, Nikos Papadakis, and Haridimos Kondylakis. Strategic Innovative Marketing, chapter ‘News Articles Platform: Semantic Tools and Services for Aggregating and Exploring News Articles’, pages 511–519. Springer Proceedings in Business and Economics. Springer, 2017.

Borislav Popov, Atanas Kiryakov, Damyan Ognyanoff, Dimitar Manov, and Angel Kirilov. ‘KIM – A Semantic Platform for Information Extraction and Retrieval’. Journal of Natural Language Engineering, 10(3–4):375–392, 2004.

Jordy Sangers, Frederik Hogenboom, and Flavius Frasincar. ‘Event-Driven Ontology Updating’. In 13th International Conference on Web Information System Engineering (WISE 2012), volume 7651 of Lecture Notes in Computer Science, pages 44–57. Springer, 2012.

Stanford Center for Biomedical Informatics Research. The Protégé Ontology Editor and Knowledge Acquisition System, 2018. From: http://protege.stanford.edu/.

The Apache Software Foundation. Apache Jena – Version 3.1.1, 2018. From: http://jena.apache.org/.

Nicolas Weber and Paul Buitelaar. ‘Web-based Ontology Learning with ISOLDE’. In Workshop on Web Content Mining with Human Language Technologies collocated with the 5th International Semantic Web Conference (ISWC 2006), 2006. From: http://www.dfki.de/dfkibib/publications/docs/ISWC06.WebContentMining.pdf.

Yahoo! Inc. Pipes: Rewire the web, 2012. From: http://pipes.yahoo.com/pipes/.

Downloads

Published

2019-11-05

Issue

Section

Articles