Semantic Integration in Big Data: State-of-the-Art


  • Zaoui Sayah LINATI Laboratory, Department of Computer Science and New Technologies, KASDI Merbah University of Ouargla, Algeria
  • Okba Kazar Intelligent Computer Science Laboratory, Computer Science Department, University Mohamed Khider of Biskra, Algeria
  • Ahmed Ghenabzia LINATI Laboratory, Department of Computer Science and New Technologies, KASDI Merbah University of Ouargla, Algeria



Big data, Hadoop, ontology, semantic integration, interoperability


Nowadays, web users and systems continually overload the web with an exponential generation of a massive amount of data. This leads to making big data more important in several domains such as social networks, internet of things, health care, E-commerce, aviation safety, etc. The use of big data has become increasingly crucial for companies due to the significant evolution of information providers and users on the web. However, big data remain meaningless without semantics. In order to get a good comprehension of big data, we raise questions about how big data and semantic are related to each other and how semantic may help. To overcome this problem, researchers devote considerable time to the integration of ontology in big data to ensure reliable interoperability between systems in order to make big data more useful, readable and exploitable. This technology can hide the heterogeneity of different data resources. Moreover, in given domains, users can exchange knowledge without caring to choose the suitable semantic that makes their content more expressive. This paper aims to provide a comprehensive overview for readers about big data and the appropriate tools to manipulate and analyse them such as Hadoop. Afterwards, we talk about ontology and how it can be used to improve big data management and analyses for decision makers. Finally, different semantic integration approaches are seen in a comparative study. This survey is concluded with a discussion and some perspectives.


Download data is not yet available.

Author Biographies

Zaoui Sayah, LINATI Laboratory, Department of Computer Science and New Technologies, KASDI Merbah University of Ouargla, Algeria

Zaoui Sayah is a Ph.D. Student and research associate in the Artificial Intelligence and Information Technologies Laboratory (LINATI). Department of Computer Science and Information Technologies, Ouargla University, Algeria. He has a scientific BAC in 2001. He received his DEUA in computer science from Biskra University, Algeria in 2004. He earned his license and Master degrees in distributed system and artificial intelligence in 2014 and 2016 respectively from El oued University, Algeria. His current research interests include Big Data, Ontology, MAS, IoT, energy saving.

Okba Kazar, Intelligent Computer Science Laboratory, Computer Science Department, University Mohamed Khider of Biskra, Algeria

Okba Kazar obtained his magister degree in 1997 from the Constantine University (Algeria) by working on artificial intelligence field. He obtained his PhD degree from the same university in 2005. He is member of editorial board of some Journals. He published more than 307 papers in international journals and communication in international conference. He participate as a session chair in international conferences, and he also published a book “Manual d’Intelligence artificielle”, Bigdata security” and five chapters book. His main research field is artificial intelligence, and he is interested in the multiagents systems and their applications, PHM in medical and industrial fields, ERP, advanced information systems, Web services, semantic Web, bigdata, Internet of things, and cloud computing. Actually, Okba KAZAR is a full professor at computer science department of Biskra University and director of smart computer science laboratory (LINFI). Ahmed

Ahmed Ghenabzia, LINATI Laboratory, Department of Computer Science and New Technologies, KASDI Merbah University of Ouargla, Algeria

Ahmed Ghenabzia is a Ph.D. student and research associate in the Artificial Intelligence and Information Technologies Laboratory (LINATI) at Kasdi Merbah University-Ouargla, Algeria. he received his IT engineering from ESI in Algiers, Algeria in 2013. He earned his licence and Master degrees in distributed system and artificial intelligence in 2016 respectively from El Oued University, Algeria. His current research interests include Data science, Big Data, multi-agent systems, IoT, Ontology and renewable energy.


M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mob. networks

Appl., 2014, 9 (2), 171–209.

A. Oussous, F. Benjelloun, A. Ait, and S. Belfkih, “Big Data Technologies:

A Survey,” J. King Saud Univ. – Comput. Inf. Sci., 2017.

J. Kim, “A Survey of Big Data Technologies and How Semantic

Computing Can Help,” International J. Semant. Comput., 2014, 8 (1),


C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, “Big Data

Analytics: A Survey,” J. Big Data, 2015, 1–32.

B. Eine, M. Jurisch, and W. Quint, “Ontology-Based Big Data Management,”

Systems, 2017, 5 (3), 45.

H. M. Safhi, B. Frikh, B. Hirchoua, B. Ouhbi, and I. Khalil, “Data

Intelligence in the Context of Big Data: A Survey,” J. Mob. Multimedia.,

, 13 (1–2), 1–27.

B. Andrew, McAfee. Erik, “Big Data: The Management Revolution,”

Harv. Bus. Rev, 2012, 90 (10), 60–68.

A. Labrinidis and H. V. Jagadish, “Challenges and Opportunities with

Big Data,” Proc. VLDB Endow, 2012, 5 (12), 2032–2033.

C. K. Emani, N. Cullot, and C. Nicolle, “Understandable Big Data: A

Survey,” Comput. Sci. Rev, 15, 70–81.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop

Distributed File System,” in Mass storage systems and technologies

(MSST), 2010 IEEE 26th symposium on. IEEE, 2010, 1–10.

H. Abbes and F. Gargouri, “MongoDB-Based Modular Ontology Building

for Big Data Integration,” J. Data Semant, 2018, 7 (1), 1–27.

B. Chandrasekaran, J. R. Josephson, and V. R. Benjamins, “What Are

Ontologies, and Why DoWe Need Them?,” IEEE Intell. Syst. their Appl,

, 14 (1), 20–26.

A. Ben Salem, F. Boufares, and S. Correia, “Semantic Recognition of a

Data Structure in,” J. Comput. Commun, 2014, 2 (9), 93–102.

P. Hitzler, K. Janowicz, “Linked Data, Big Data, and the 4th Paradigm,”

Semant. Web, 2013, 4 (3), 233–235.

D. Obrst, N. Rychtyckyj, and M. Kim, “Integration of Big Data Using

Semantic Web Technologies,” in Semantic Computing (ICSC), 2016,


L. Obrst, M. Grüninger, K. Baclawski, M. Bennett, D. Brickley,

G. Berg-Cross, and C. Lange, “Semantic Web and Big Data Meets

Applied Ontology,” in Ontology Summit 2014, 2014.

K. Thirunarayan and A. Sheth, “Semantics-Empowered Approaches

to Big Data Processing for Physical-Cyber-Social Applications,” in

Semantics for Big Data: Papers from the AAAI Symposium. AAAI

Technical Report FS-13-04, 2013, 68–75.

Anne-Claire Boury-Brisset, “Managing Semantic Big Data for Intelligence,”,

In STIDS, 2012, 41–47.

J. W. Williams, P. Cuddihy, J. Mchugh, K. S. Aggour, A. Menon,

S. M. Gustafson, T. Healy, and C. Control, “Semantics for Big Data

Access & Integration: Improving Industrial Equipment Design through

Increased Data Usability,” in 2015 IEEE International Conference on

Big Data (Big Data), 2015, 1103–1112.

G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, and R. Rosati,

“Using Ontologies for Semantic Data Integration,” Springer Int. Publ.,

, 187–202.

H. Wache, T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster,

H. Neumann, and S. Hübner, “Ontology-Based Integration of Information:

A Survey of Existing Approaches,” in IJCAI-01 Workshop:

Ontologies and Information Sharing, 2001, 108–117.

S. K. Bansal and S. Kagemann, “Semantic Extract-Transform-Load

framework for Big Data Integration,” Computer (Long. Beach. Calif),

, 48 (3), 42–50.

A. L. Guido and R. Paiano, “Semantic Integration of Information

Systems,” Int. J. Comput. Networks Commun, 2010, 2 (1), 48–64.

V. K. Kiran and R. Vijayakumar, “Ontology-Based Data Integration

of NoSQL Datastores,” in Industrial and Information Systems (ICIIS),

, 1–6.

I. Lee, “Big Data: Dimensions, Evolution, Impacts, and Challenges,”

Bus. Horiz, 2017, 60 (3), 293–303.

E. Zikopoulos, Paul. Chris, “Understanding Big Data: Analytics

for Enterprise Class Hadoop and Streaming Data,” in McGraw-Hill

Osborne Media, 2011.

J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and

A. H. Byers, “Big Data: The Next Frontier for Innovation, Competition,

and Productivity,” Rep. – McKinsey Glob. Inst, 2011.

V. C. Storey and I. Song, “Data & Knowledge Engineering Big Data

Technologies and Management: What Conceptual Modelling Can Do,”

Data Knowl. Eng, 2017, 108 (2), 50–67.

Q. Quboa and N. Mehandjiev, “Creating Intelligent Business Systems

by Utilising Big Data and Semantics,” in Business Informatics (CBI),

IEEE 19th Conference on. IEEE, 2017, 39–46.

L. R. C. Rodríguez-enríquez, J. Luis, S. J. Cervantes, J. Luis, and

G. Alor-hernández, “A General Perspective of Big Data: Applications,

Tools,” J. Supercompu., 2016, 72 (8), 3073–3113.

H. J. Hadi, A. H. Shnain, S. Hadishaheed, and A. H. Ahmad, “Big Data

and Five V’s Characteristics,” Int. J. Adv. Electron. Comput. Sci, 2015,

(1), 16–23.

M. Panahiazar, V. Taslimitehrani, and A. Jadhav, “Empowering Personalized

Medicine with Big Data and Semantic Web Technology:

Promises, Challenges, and Use Cases,” in IEEE International Conference

on Big Data 2014, 2014, 790–795.

S. R. Jeong, I. Ghani, S. Korea, and J. Bahru, “Semantic Computing for

Big Data: Approaches, Tools, and Emerging Directions,” KSII Trans.

INTERNET Inf. Syst, 2014, 8 (6), 2022–2042.

V. M. Rao, V. V. Kumari, and N. Silpa, “An Extensive Study on Leading

Research Paths on Big Data Techniques and Technologies.,” Int. J.

Comput. Eng. Technol, 2015, 6 (12), 20–34.

I. Yaqoob, I. Abaker, T. Hashem, A. Gani, S. Mokhtar, E. Ahmed, N.

Badrul, and A. V. Vasilakos, “Big Data: From Beginning to Future,” Int.

J. Inf. Manage, 2016, 36 (6), 1231–1247.

M. K. Saggi and S. Jain, “A Survey Towards an Integration of Big Data

Analytics to Big Insights for Value-creation,” Inf. Process. Manag, 2018,

(5), 758–790.

C. Bizer, P. Boncz, M. L. Brodie, and O. Erling, “The Meaningful Use

of Big Data: Four Perspectives – Four Challenges,” Acm Sigmod Rec,

, 40 (4), 56–60.

C. A. Knoblock, P. Szekely, and M. Rey, “Semantics for Big Data

Integration and Analysis,” in Semantics for Big Data: Papers from the

AAAI Symposium. AAAI Technical Report FS-13-04, 2013, 28–31.

T. Aruna, K. Saranya, and B. Chetna, “A Survey on Ontology Evaluation

Tools,” in Process Automation, Control and Computing (PACC), 2011

International Conference on. IEEE, 2011, 1–5.

H. Özköse, P. L. Q. Uõ, and C. Gencer, “Yesterday, Today and Tomorrow

of Big Data,” in Procedia-Social and Behavioral Sciences, 2015, 195,


O. B. Sezer, E. Dogdu, M. Ozbayoglu, and A. Onal, “An Extended

IoT Framework with Semantics, Big Data, and Analytics,” in IEEE

International Conference on Big Data (Big Data), 2016, 1849–1856.

P. Wira, I. Szilagyi, and P. Wira, “Ontologies and Semantic Web for

the Internet of Things – a Survey,” in 42nd IEEE Industrial Electronics

Conference (IECON 2016), Florence, Italy, 2016, 10, 6949–6954.

O. B. Sezer, E. Dogdu, and A. M. Ozbayoglu, “Context-Aware Computing,

Learning and Big Data on the Internet of Things: A Survey,” IEEE

Internet Things J, 2018, 5 (1), 1–27.

A. Sheth, “Transforming Big Data into Smart Data: Deriving Value

via Harnessing Volume, Variety, and Velocity Using Semantic Techniques

and Technologies,” in Data Engineering (ICDE), 2014 IEEE 30th

International Conference on. IEEE, 2014, 2014, 2–2.

G. Bello-orgaz, J. J. Jung, and D. Camacho, “Social Big Data: Recent

Achievements and New Challenges,” Inf. Fusion, 2016, 28, 45–59.

S. Landset, T. M. Khoshgoftaar, A. N. Richter, and T. Hasanin, “A

Survey of Open Source Tools for Machine Learning with Big Data in

the Hadoop Ecosystem,” J. Big Data, 2015, 2 (1), 24.

C. A. Knoblock and P. Szekely, “Exploiting Semantics for Big Data

Integration,” AI Mag, 2015, 36 (1), 25–38.

C. Hsinchun, Roger H. L. Chiang, and V. C. Storey, “Business Intelligence

and Analytics: From Big Data to Big Impact,” MIS Quarterly,bus.

Intell. Res. Bus, 2012, 36 (4), 1165–1188.

H.Wu and A. Yamaguchi, “SemanticWeb Technologies for the Big Data

in Life Sciences,” Biosci. Trends, 2014, 8 (4), 192–201.

S. Ahmad, C. Bukhari, K. Malik, “SemanticWeb in the Age of Big Data:

A Perspective,” OSF Prepr, July 2018.

S. Bourekkache, O. Kazar, L. Kahloul, F. Gargouri, and B. Aïcha-

Nabila, “Un Environnement Sémantique à Base d’Agents pour la Formation

à Distance (E-Learning).,” in In 10ième édition de la conférence

sur Avancés des Systèmes Décisionnels-ASD 2016, 2016.

T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,” Sci.

Am, 2001, 284 (5), 34–43.

Ravil I. Muhamedyev, Maksat N. Kalimoldaev and Raisa K. Uskenbayeva,

“Semantic Network of ICT Domains and Applications. Institute

of Problems of Information,” in In Proceedings of the 2014 Conference

on Electronic Governance and Open Society: Challenges in Eurasia.

ACM., 2014, 11, 178–186.

F. Z. Laallam, M. L. Kherfi, and S. M. Benslimane, “A Survey on

the Complementarity Between Database and Ontologies: Principles and

Research Areas,” Int. J. Comput. Appl. Technol, 2014, 49 (2), 166–187.

A. P. Junior and L. D. C. Botega, “Ontological Semantic Agent in

the Context of Big Data: a Tool Applied to Information Retrieval in

Scientific,”, In New Advances in Information Systems and Technologies

springer , 2016, 307–308.

A. Vandecasteele, “Modélisation ontologique des connaissances experts

pour l ’ analyse de comportements à risque - Application à la surveillance

maritime – Doctoral dissertation, Ecole Nationale Supérieure des

Mines de Paris,” 2012.

N. Guarino, “Semantic Matching: Formal Ontological Distinctions for

Information Organization , Extraction, and Integration,” in International

Summer School on Information Extraction. Springer, Berlin, Heidelberg,

, 139–170.

T. R. Gruber, “Toward Principles for the Design Toward Principles for

the Design of Ontologies Used for Knowledge Sharing,” Int. J. Human-

Computer, 1995, 43 (5–6), 907–928.

A. Gómez-pérez, O. Corcho, and U. P. De Madrid, “Ontology Languages

for the Semantic Web,” IEEE Intell. Syst., 2002, 17 (1), 54–60.

N. F. Noy, “Semantic Integration: A Survey Of Ontology-Based

Approaches,” ACM Sigmod Rec, 2004, 33 (4), 65–70.

H. S. Pinto, P. Martins, B. Monte, and A. R. Pais, “Some Issues on

Ontology Integration,” IJCAI Scand. AI Soc. CEUR Work. Proceedings,

D. A. Koutsomitropoulos and A. K. Kalou, “A Standards-Based Ontology

and Support for Big Data Analytics in the,” ICT Express, 2017, 3 (2),


B. Mike,“SemWeb getting crushed by Big data in search popularity,”

accessed November 05, 2018, 2014.

S. A. Ghafour, “Méthodes et Outils pour l’Intégration des Ontologies,”

in Laboratoire d’InfoRmatique en Images et Systèmes d’information

LIRIS, Lyon, 2004.

D. Zouhir, “Donner une Autre vie à Vos besoins fonctionnels:

une approche dirigée par l ’ entreposage et l ’ analyse en ligne,”

Doctort. dissertation. ISAE-ENSMA Ec. Natl. Supérieure Mécanique

d’Aérotechique-Poitiers, 2017.

H. S. Pinto and P. Martins, “A Methodology for Ontology Integration,”

in Proceedings of the 1st international conference on Knowledge

capture, ACM, 2001, 131–138.

I. Horrocks, M. Giese, E. Kharlamov, and A. Waaler, “Using Semantic

Technology to Tame the Data Variety Challenge,” IEEE Internet

Comput, 2016, 20 (6), 62–66.

L. Ding, P. Kolari, Z. Ding, S. Avancha, T. Finin, and A. Joshi, “Using

Ontologies in the Semantic Web: A Survey,” Ontol. Springer, Boston,

MA, 2007, 79–113.

J. Hui, L. Li, and Z. Zhang, “Integration of Big Data: A Survey,” in International

Conference of Pioneering Computer Scientists, Engineers and

Educators. Springer, Singapore, 2018, 1, 101–121.

D. Calvanese, “Ontologies for Data Integration,” in IJCAI Workshop on

Formal Ontologies for Artificial Intelligence (FOfAI). Italy, 2015, 1–67.

F. T. Imam, “Application of Ontologies in Cloud Computing: The State-

Of-The-Art,” arXiv Prepr. arXiv1610.02333, 2016.

M. Y. Mehta, “Big Data Mining and Semantic Technologies: Challenges

and Opportunities,” Int. J. Recent Innov. Trends Comput. Commun,

3 (7), 4907–4913.

H. Liyanage, P. Krause, and S. de Lusignan, “Using Ontologies to

Improve Semantic Interoperability in Health Data,” Innov Heal. Inf,

, 22 (2), 309–305.

M. Bermúdez-Edo, E. Della Valle, and T. Palpanas, “Semantic Challenges

for the Variety and Velocity Dimensions of Big Data,” Int. J.

Semant. Web Inf. Syst, 2016, 12 (4), 2016.

E. Mezghani, E. Exposito, K. Drira, and M. Da Silveira, “A Semantic

Big Data Platform for Integrating Heterogeneous Wearable Data in

Healthcare,” J. Med. Syst, 2015, 39 (12), 185.

G. Mehdi, S. Brandt, M. Roshchin, and T. Runkler, “Semantic Framework

for Industrial Analytics and Diagnostics,” in Proceedings of the

Twenty-Fifth International Joint Conference on Artificial Intelligence

(IJCAI-16), 2012, 4016–4017.

A. M. Saettler, K. R. Llanes, P. Ivson, D. L. M. Nascimento, and E. T.

L. Corseuil, “An Ontology-Driven Framework for Data Integration and

Dynamic Service Composition: Case Study in the Oil & Gas Industry,”

in 16th International Conference WWW/Internet ICWI 2017, 2017,

May 2018.

M. El Hamdouni, H. Hanafi, A. Bouktib, and M. Bahra, “Sentiment

Analysis in Social Media with a SemanticWeb-Based Approach: Application

to the French Presidential Elections 2017,” in Proceedings of the

Mediterranean Symposium on Smart City Applications. Springer, Cham,

, 470–482.

A. I. Jony, “Applications of Real-Time Big Data Analytics,” Int. J.

Comput. Appl, 2016, 144 (5), 1–5.

N. Mohamed and J. Al-jaroodi, “Real-Time Big Data Analytics: Applications

and Challenges,” in High-Performance Computing & Simulation

(HPCS), 2014 International Conference on IEEE., 2014, 305–310.

L. Leenen, C. Peninsula, C. Town, and S. Africa, “Semantic Technologies

and Big Data Analytics for Cyber Defence,” Int. J. Cyber Warf.

Terror, 2016, 6 (3), 53–64.

M. Kiran, P. Murphy, I. Monga, J. Dugan, and S. S. Baveja, “Lambda

Architecture for Cost-Effective Batch and Speed Big Data Processing”.

In Big Data (Big Data), 2015 IEEE International Conference on, 2785–