Semantic Integration in Big Data: State-of-the-Art
Keywords:Big data, Hadoop, ontology, semantic integration, interoperability
Nowadays, web users and systems continually overload the web with an exponential generation of a massive amount of data. This leads to making big data more important in several domains such as social networks, internet of things, health care, E-commerce, aviation safety, etc. The use of big data has become increasingly crucial for companies due to the significant evolution of information providers and users on the web. However, big data remain meaningless without semantics. In order to get a good comprehension of big data, we raise questions about how big data and semantic are related to each other and how semantic may help. To overcome this problem, researchers devote considerable time to the integration of ontology in big data to ensure reliable interoperability between systems in order to make big data more useful, readable and exploitable. This technology can hide the heterogeneity of different data resources. Moreover, in given domains, users can exchange knowledge without caring to choose the suitable semantic that makes their content more expressive. This paper aims to provide a comprehensive overview for readers about big data and the appropriate tools to manipulate and analyse them such as Hadoop. Afterwards, we talk about ontology and how it can be used to improve big data management and analyses for decision makers. Finally, different semantic integration approaches are seen in a comparative study. This survey is concluded with a discussion and some perspectives.
M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mob. networks
Appl., 2014, 9 (2), 171–209.
A. Oussous, F. Benjelloun, A. Ait, and S. Belfkih, “Big Data Technologies:
A Survey,” J. King Saud Univ. – Comput. Inf. Sci., 2017.
J. Kim, “A Survey of Big Data Technologies and How Semantic
Computing Can Help,” International J. Semant. Comput., 2014, 8 (1),
C. W. Tsai, C. F. Lai, H. C. Chao, and A. V. Vasilakos, “Big Data
Analytics: A Survey,” J. Big Data, 2015, 1–32.
B. Eine, M. Jurisch, and W. Quint, “Ontology-Based Big Data Management,”
Systems, 2017, 5 (3), 45.
H. M. Safhi, B. Frikh, B. Hirchoua, B. Ouhbi, and I. Khalil, “Data
Intelligence in the Context of Big Data: A Survey,” J. Mob. Multimedia.,
, 13 (1–2), 1–27.
B. Andrew, McAfee. Erik, “Big Data: The Management Revolution,”
Harv. Bus. Rev, 2012, 90 (10), 60–68.
A. Labrinidis and H. V. Jagadish, “Challenges and Opportunities with
Big Data,” Proc. VLDB Endow, 2012, 5 (12), 2032–2033.
C. K. Emani, N. Cullot, and C. Nicolle, “Understandable Big Data: A
Survey,” Comput. Sci. Rev, 15, 70–81.
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop
Distributed File System,” in Mass storage systems and technologies
(MSST), 2010 IEEE 26th symposium on. IEEE, 2010, 1–10.
H. Abbes and F. Gargouri, “MongoDB-Based Modular Ontology Building
for Big Data Integration,” J. Data Semant, 2018, 7 (1), 1–27.
B. Chandrasekaran, J. R. Josephson, and V. R. Benjamins, “What Are
Ontologies, and Why DoWe Need Them?,” IEEE Intell. Syst. their Appl,
, 14 (1), 20–26.
A. Ben Salem, F. Boufares, and S. Correia, “Semantic Recognition of a
Data Structure in,” J. Comput. Commun, 2014, 2 (9), 93–102.
P. Hitzler, K. Janowicz, “Linked Data, Big Data, and the 4th Paradigm,”
Semant. Web, 2013, 4 (3), 233–235.
D. Obrst, N. Rychtyckyj, and M. Kim, “Integration of Big Data Using
Semantic Web Technologies,” in Semantic Computing (ICSC), 2016,
L. Obrst, M. Grüninger, K. Baclawski, M. Bennett, D. Brickley,
G. Berg-Cross, and C. Lange, “Semantic Web and Big Data Meets
Applied Ontology,” in Ontology Summit 2014, 2014.
K. Thirunarayan and A. Sheth, “Semantics-Empowered Approaches
to Big Data Processing for Physical-Cyber-Social Applications,” in
Semantics for Big Data: Papers from the AAAI Symposium. AAAI
Technical Report FS-13-04, 2013, 68–75.
Anne-Claire Boury-Brisset, “Managing Semantic Big Data for Intelligence,”,
In STIDS, 2012, 41–47.
J. W. Williams, P. Cuddihy, J. Mchugh, K. S. Aggour, A. Menon,
S. M. Gustafson, T. Healy, and C. Control, “Semantics for Big Data
Access & Integration: Improving Industrial Equipment Design through
Increased Data Usability,” in 2015 IEEE International Conference on
Big Data (Big Data), 2015, 1103–1112.
G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, and R. Rosati,
“Using Ontologies for Semantic Data Integration,” Springer Int. Publ.,
H. Wache, T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster,
H. Neumann, and S. Hübner, “Ontology-Based Integration of Information:
A Survey of Existing Approaches,” in IJCAI-01 Workshop:
Ontologies and Information Sharing, 2001, 108–117.
S. K. Bansal and S. Kagemann, “Semantic Extract-Transform-Load
framework for Big Data Integration,” Computer (Long. Beach. Calif),
, 48 (3), 42–50.
A. L. Guido and R. Paiano, “Semantic Integration of Information
Systems,” Int. J. Comput. Networks Commun, 2010, 2 (1), 48–64.
V. K. Kiran and R. Vijayakumar, “Ontology-Based Data Integration
of NoSQL Datastores,” in Industrial and Information Systems (ICIIS),
I. Lee, “Big Data: Dimensions, Evolution, Impacts, and Challenges,”
Bus. Horiz, 2017, 60 (3), 293–303.
E. Zikopoulos, Paul. Chris, “Understanding Big Data: Analytics
for Enterprise Class Hadoop and Streaming Data,” in McGraw-Hill
Osborne Media, 2011.
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and
A. H. Byers, “Big Data: The Next Frontier for Innovation, Competition,
and Productivity,” Rep. – McKinsey Glob. Inst, 2011.
V. C. Storey and I. Song, “Data & Knowledge Engineering Big Data
Technologies and Management: What Conceptual Modelling Can Do,”
Data Knowl. Eng, 2017, 108 (2), 50–67.
Q. Quboa and N. Mehandjiev, “Creating Intelligent Business Systems
by Utilising Big Data and Semantics,” in Business Informatics (CBI),
IEEE 19th Conference on. IEEE, 2017, 39–46.
L. R. C. Rodríguez-enríquez, J. Luis, S. J. Cervantes, J. Luis, and
G. Alor-hernández, “A General Perspective of Big Data: Applications,
Tools,” J. Supercompu., 2016, 72 (8), 3073–3113.
H. J. Hadi, A. H. Shnain, S. Hadishaheed, and A. H. Ahmad, “Big Data
and Five V’s Characteristics,” Int. J. Adv. Electron. Comput. Sci, 2015,
M. Panahiazar, V. Taslimitehrani, and A. Jadhav, “Empowering Personalized
Medicine with Big Data and Semantic Web Technology:
Promises, Challenges, and Use Cases,” in IEEE International Conference
on Big Data 2014, 2014, 790–795.
S. R. Jeong, I. Ghani, S. Korea, and J. Bahru, “Semantic Computing for
Big Data: Approaches, Tools, and Emerging Directions,” KSII Trans.
INTERNET Inf. Syst, 2014, 8 (6), 2022–2042.
V. M. Rao, V. V. Kumari, and N. Silpa, “An Extensive Study on Leading
Research Paths on Big Data Techniques and Technologies.,” Int. J.
Comput. Eng. Technol, 2015, 6 (12), 20–34.
I. Yaqoob, I. Abaker, T. Hashem, A. Gani, S. Mokhtar, E. Ahmed, N.
Badrul, and A. V. Vasilakos, “Big Data: From Beginning to Future,” Int.
J. Inf. Manage, 2016, 36 (6), 1231–1247.
M. K. Saggi and S. Jain, “A Survey Towards an Integration of Big Data
Analytics to Big Insights for Value-creation,” Inf. Process. Manag, 2018,
C. Bizer, P. Boncz, M. L. Brodie, and O. Erling, “The Meaningful Use
of Big Data: Four Perspectives – Four Challenges,” Acm Sigmod Rec,
, 40 (4), 56–60.
C. A. Knoblock, P. Szekely, and M. Rey, “Semantics for Big Data
Integration and Analysis,” in Semantics for Big Data: Papers from the
AAAI Symposium. AAAI Technical Report FS-13-04, 2013, 28–31.
T. Aruna, K. Saranya, and B. Chetna, “A Survey on Ontology Evaluation
Tools,” in Process Automation, Control and Computing (PACC), 2011
International Conference on. IEEE, 2011, 1–5.
H. Özköse, P. L. Q. Uõ, and C. Gencer, “Yesterday, Today and Tomorrow
of Big Data,” in Procedia-Social and Behavioral Sciences, 2015, 195,
O. B. Sezer, E. Dogdu, M. Ozbayoglu, and A. Onal, “An Extended
IoT Framework with Semantics, Big Data, and Analytics,” in IEEE
International Conference on Big Data (Big Data), 2016, 1849–1856.
P. Wira, I. Szilagyi, and P. Wira, “Ontologies and Semantic Web for
the Internet of Things – a Survey,” in 42nd IEEE Industrial Electronics
Conference (IECON 2016), Florence, Italy, 2016, 10, 6949–6954.
O. B. Sezer, E. Dogdu, and A. M. Ozbayoglu, “Context-Aware Computing,
Learning and Big Data on the Internet of Things: A Survey,” IEEE
Internet Things J, 2018, 5 (1), 1–27.
A. Sheth, “Transforming Big Data into Smart Data: Deriving Value
via Harnessing Volume, Variety, and Velocity Using Semantic Techniques
and Technologies,” in Data Engineering (ICDE), 2014 IEEE 30th
International Conference on. IEEE, 2014, 2014, 2–2.
G. Bello-orgaz, J. J. Jung, and D. Camacho, “Social Big Data: Recent
Achievements and New Challenges,” Inf. Fusion, 2016, 28, 45–59.
S. Landset, T. M. Khoshgoftaar, A. N. Richter, and T. Hasanin, “A
Survey of Open Source Tools for Machine Learning with Big Data in
the Hadoop Ecosystem,” J. Big Data, 2015, 2 (1), 24.
C. A. Knoblock and P. Szekely, “Exploiting Semantics for Big Data
Integration,” AI Mag, 2015, 36 (1), 25–38.
C. Hsinchun, Roger H. L. Chiang, and V. C. Storey, “Business Intelligence
and Analytics: From Big Data to Big Impact,” MIS Quarterly,bus.
Intell. Res. Bus, 2012, 36 (4), 1165–1188.
H.Wu and A. Yamaguchi, “SemanticWeb Technologies for the Big Data
in Life Sciences,” Biosci. Trends, 2014, 8 (4), 192–201.
S. Ahmad, C. Bukhari, K. Malik, “SemanticWeb in the Age of Big Data:
A Perspective,” OSF Prepr, July 2018.
S. Bourekkache, O. Kazar, L. Kahloul, F. Gargouri, and B. Aïcha-
Nabila, “Un Environnement Sémantique à Base d’Agents pour la Formation
à Distance (E-Learning).,” in In 10ième édition de la conférence
sur Avancés des Systèmes Décisionnels-ASD 2016, 2016.
T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,” Sci.
Am, 2001, 284 (5), 34–43.
Ravil I. Muhamedyev, Maksat N. Kalimoldaev and Raisa K. Uskenbayeva,
“Semantic Network of ICT Domains and Applications. Institute
of Problems of Information,” in In Proceedings of the 2014 Conference
on Electronic Governance and Open Society: Challenges in Eurasia.
ACM., 2014, 11, 178–186.
F. Z. Laallam, M. L. Kherfi, and S. M. Benslimane, “A Survey on
the Complementarity Between Database and Ontologies: Principles and
Research Areas,” Int. J. Comput. Appl. Technol, 2014, 49 (2), 166–187.
A. P. Junior and L. D. C. Botega, “Ontological Semantic Agent in
the Context of Big Data: a Tool Applied to Information Retrieval in
Scientific,”, In New Advances in Information Systems and Technologies
springer , 2016, 307–308.
A. Vandecasteele, “Modélisation ontologique des connaissances experts
pour l ’ analyse de comportements à risque - Application à la surveillance
maritime – Doctoral dissertation, Ecole Nationale Supérieure des
Mines de Paris,” 2012.
N. Guarino, “Semantic Matching: Formal Ontological Distinctions for
Information Organization , Extraction, and Integration,” in International
Summer School on Information Extraction. Springer, Berlin, Heidelberg,
T. R. Gruber, “Toward Principles for the Design Toward Principles for
the Design of Ontologies Used for Knowledge Sharing,” Int. J. Human-
Computer, 1995, 43 (5–6), 907–928.
A. Gómez-pérez, O. Corcho, and U. P. De Madrid, “Ontology Languages
for the Semantic Web,” IEEE Intell. Syst., 2002, 17 (1), 54–60.
N. F. Noy, “Semantic Integration: A Survey Of Ontology-Based
Approaches,” ACM Sigmod Rec, 2004, 33 (4), 65–70.
H. S. Pinto, P. Martins, B. Monte, and A. R. Pais, “Some Issues on
Ontology Integration,” IJCAI Scand. AI Soc. CEUR Work. Proceedings,
D. A. Koutsomitropoulos and A. K. Kalou, “A Standards-Based Ontology
and Support for Big Data Analytics in the,” ICT Express, 2017, 3 (2),
B. Mike,“SemWeb getting crushed by Big data in search popularity,”
accessed November 05, 2018, 2014.
S. A. Ghafour, “Méthodes et Outils pour l’Intégration des Ontologies,”
in Laboratoire d’InfoRmatique en Images et Systèmes d’information
LIRIS, Lyon, 2004.
D. Zouhir, “Donner une Autre vie à Vos besoins fonctionnels:
une approche dirigée par l ’ entreposage et l ’ analyse en ligne,”
Doctort. dissertation. ISAE-ENSMA Ec. Natl. Supérieure Mécanique
H. S. Pinto and P. Martins, “A Methodology for Ontology Integration,”
in Proceedings of the 1st international conference on Knowledge
capture, ACM, 2001, 131–138.
I. Horrocks, M. Giese, E. Kharlamov, and A. Waaler, “Using Semantic
Technology to Tame the Data Variety Challenge,” IEEE Internet
Comput, 2016, 20 (6), 62–66.
L. Ding, P. Kolari, Z. Ding, S. Avancha, T. Finin, and A. Joshi, “Using
Ontologies in the Semantic Web: A Survey,” Ontol. Springer, Boston,
MA, 2007, 79–113.
J. Hui, L. Li, and Z. Zhang, “Integration of Big Data: A Survey,” in International
Conference of Pioneering Computer Scientists, Engineers and
Educators. Springer, Singapore, 2018, 1, 101–121.
D. Calvanese, “Ontologies for Data Integration,” in IJCAI Workshop on
Formal Ontologies for Artificial Intelligence (FOfAI). Italy, 2015, 1–67.
F. T. Imam, “Application of Ontologies in Cloud Computing: The State-
Of-The-Art,” arXiv Prepr. arXiv1610.02333, 2016.
M. Y. Mehta, “Big Data Mining and Semantic Technologies: Challenges
and Opportunities,” Int. J. Recent Innov. Trends Comput. Commun,
3 (7), 4907–4913.
H. Liyanage, P. Krause, and S. de Lusignan, “Using Ontologies to
Improve Semantic Interoperability in Health Data,” Innov Heal. Inf,
, 22 (2), 309–305.
M. Bermúdez-Edo, E. Della Valle, and T. Palpanas, “Semantic Challenges
for the Variety and Velocity Dimensions of Big Data,” Int. J.
Semant. Web Inf. Syst, 2016, 12 (4), 2016.
E. Mezghani, E. Exposito, K. Drira, and M. Da Silveira, “A Semantic
Big Data Platform for Integrating Heterogeneous Wearable Data in
Healthcare,” J. Med. Syst, 2015, 39 (12), 185.
G. Mehdi, S. Brandt, M. Roshchin, and T. Runkler, “Semantic Framework
for Industrial Analytics and Diagnostics,” in Proceedings of the
Twenty-Fifth International Joint Conference on Artificial Intelligence
(IJCAI-16), 2012, 4016–4017.
A. M. Saettler, K. R. Llanes, P. Ivson, D. L. M. Nascimento, and E. T.
L. Corseuil, “An Ontology-Driven Framework for Data Integration and
Dynamic Service Composition: Case Study in the Oil & Gas Industry,”
in 16th International Conference WWW/Internet ICWI 2017, 2017,
M. El Hamdouni, H. Hanafi, A. Bouktib, and M. Bahra, “Sentiment
Analysis in Social Media with a SemanticWeb-Based Approach: Application
to the French Presidential Elections 2017,” in Proceedings of the
Mediterranean Symposium on Smart City Applications. Springer, Cham,
A. I. Jony, “Applications of Real-Time Big Data Analytics,” Int. J.
Comput. Appl, 2016, 144 (5), 1–5.
N. Mohamed and J. Al-jaroodi, “Real-Time Big Data Analytics: Applications
and Challenges,” in High-Performance Computing & Simulation
(HPCS), 2014 International Conference on IEEE., 2014, 305–310.
L. Leenen, C. Peninsula, C. Town, and S. Africa, “Semantic Technologies
and Big Data Analytics for Cyber Defence,” Int. J. Cyber Warf.
Terror, 2016, 6 (3), 53–64.
M. Kiran, P. Murphy, I. Monga, J. Dugan, and S. S. Baveja, “Lambda
Architecture for Cost-Effective Batch and Speed Big Data Processing”.
In Big Data (Big Data), 2015 IEEE International Conference on, 2785–