DATA INTELLIGENCE IN THE CONTEXT OF BIG DATA: A SURVEY
Keywords:
big data, data mining techniques, literature review, knowledge discoveryAbstract
Mining Big Data is the capability of nding new useful information in complex massive datasets, that may be continuously changing and may have varied data types. Big data is helpful only when it is transformed into knowledge or useful information. Data Intelligence is about transforming data into information, information into knowl- edge, and knowledge into value. It refers to the intelligent interaction with data in a rich, semantically meaningful ways, where data is used to learn and to obtain knowledge. However, extracting valuable information from this data by following the classical Knowledge Discovery process reveals new previously unknown challenges, due to Big Data properties. These challenges have received a lot of attention in recent years, and still need more and more contribution and research. A large number of publications have yielded a plethora of proposed methods and algorithms. In this paper, we provide a comprehensive literature review on Big Data current status. We present the Data Intelligence framework in the context of Big Data from data acquisition until insight extraction, we highlight its main issues, and identify its progress in both technological and algorithmic perspectives. We summarize and analyse relevant research papers in the eld, collected from dierent scientic databases. This investigation will help researchers to understand the current status of Data Intelligence, discover new research opportunities, and gain information about this eld.
Downloads
References
J. H. Friedman (1998), Data mining and statistics: What's the connection?, Computing Science
and Statistics, vol. 29, pp. 3{9.
K. W. Lin and Y.-C. Lo (2013), Ecient algorithms for frequent pattern mining in many-task
computing environments, Knowledge-Based Systems, vol. 49, pp. 10{21.
D. Ioannidis, P. Tropios, S. Krinidis, G. Stavropoulos, D. Tzovaras and S. Likothanasis (2016),
Occupancy driven building performance assessment, Journal of Innovation in Digital Ecosystems,
vol. 3, pp. 57{69.
A. Loureiro, L. Torgo and C. Soares (2004), Outlier detection using clustering methods: a data
cleaning application, in Proceedings of KDNet Symposium on Knowledge-based systems for the
Public Sector.
M. Kirlidog and C. Asuk (2012), A fraud detection approach with data mining in health insurance,
Procedia-Social and Behavioral Sciences, vol. 62, pp. 989{994.
M. N. Mohammad, N. Sulaiman and O. A. Muhsin (2011), A novel intrusion detection system
by using intelligent data mining in weka environment, Procedia Computer Science, vol. 3, pp.
{1242.
S. Duque and M. N. bin Omar (2015), Using data mining algorithms for developing a model for
intrusion detection system (IDS), Procedia Computer Science, vol. 61, pp. 46{51.
R. Agrawal, T. Imielinski and A. Swami (1993), Mining association rules between sets of items in
large databases, in Acm sigmod record, vol. 22, pp. 207{216, ACM.
J. Soni, U. Ansari, D. Sharma and S. Soni (2011), Predictive data mining for medical diagnosis:
An overview of heart disease prediction, International Journal of Computer Applications, vol. 17,
pp. 43{48.
S. R. D. C. T. S. Doddi, Achla Marathe (2001), Discovery of association rules in medical data,
Medical informatics and the Internet in medicine, vol. 26, pp. 25{33.
N. Gupta, N. Mangal, K. Tiwari and P. Mitra (2006), Mining quantitative association rules in
protein sequences, in Data Mining, pp. 273{281, Springer.
T. Oyama, K. Kitano, K. Satou and T. Ito (2002), Extraction of knowledge on protein{protein
interaction by association rule discovery, Bioinformatics, vol. 18, pp. 705{714.
J. Deshmukh and U. Bhosle (2016), Image Mining Using Association Rule for Medical Image
Dataset, Procedia Computer Science, vol. 85, pp. 117{124.
J. A. Rushing, H. Ranganath, T. H. Hinke and S. J. Graves (2002), Image segmentation using
association rule features, IEEE Transactions on Image Processing, vol. 11, pp. 558{567.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth et al. (1996), Knowledge Discovery and Data
Mining: Towards a Unifying Framework., in KDD, vol. 96, pp. 82{88.
I. Becerra-Fernandez and R. Sabherwal (2014), Knowledge management: Systems and processes,
Routledge.
Q. Yang, X. Wu, P. Domingos, C. Elkan, J. Gehrke, J. Han, D. Heckerman, D. Keim, J. Liu,
D. Madigan, G. Piatetsky-Shapiro, V. V. Raghavan, R. Rastogi, S. J. Stolfo, A. Tuzhilin and
B. W. Wah (2006), 10 Challenging Problems in Data Mining Research, International Journal
of Information Technology & Decision Making, vol. 5, pp. 597{604, ISSN 0219-6220, doi:
1142/S0219622006002258.
A. Cuzzocrea, I.-Y. Song and K. C. Davis (2011), Analytics over large-scale multidimensional
data: the big data revolution!, . . . 14th international workshop on Data . . . , pp. 101{104, doi:
1145/2064676.2064695.
G. G.-h. Lin and J. G. Scott (2012), NIH Public Access, vol. 100, pp. 130{134, ISSN 15378276,
doi:10.1016/j.pestbp.2011.02.012.Investigations.
Q. He, N. Li, W. J. Luo and Z. Z. Shi (2014), A survey of machine learning algorithms for big
data, Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Arti cial Intelligence, vol. 27,
pp. 327{336, ISSN 10036059, doi:10.1186/s13634-016-0355-x.
A. Jha (2016), A Review on the Study and Analysis of Big Data using Data Mining Techniques,
vol. 7.
U. Sivarajah, M. M. Kamal, Z. Irani and V. Weerakkody (2016), Critical analysis of Big
Data challenges and analytical methods, Journal of Business Research, ISSN 01482963, doi:
1016/j.jbusres.2016.08.001.
S. Aridhi and E. Mephu Nguifo (2016), Big Graph Mining: Frameworks and Techniques, Big Data
Research, vol. 1, pp. 1{10, ISSN 22145796, doi:10.1016/j.bdr.2016.07.002.
I. Yaqoob, I. A. T. Hashem, A. Gani, S. Mokhtar, E. Ahmed, N. B. Anuar and A. V. Vasilakos
(2016), Big data: From beginning to future, International Journal of Information Management,
vol. 36, pp. 1231{1247, ISSN 02684012, doi:10.1016/j.ijinfomgt.2016.07.009.
H. Chen, R. H. L. Chiang and V. C. Storey (2012), Business Intelligence and Analytics: From Big
Data to Big Impact, vol. 36.
M. Injadat, F. Salo and A. B. Nassif (2016), Data Mining Techniques in Social Media: A Survey
Data Mining Techniques in Social Media: A Survey, Neurocomputing, pp. 1{17, ISSN 09252312,
doi:10.1016/j.neucom.2016.06.045.
K. Sin and L. Muthu (2015), Application of big data in education data mining and learning
analytics-A literature review, Ictact Journal on Soft Computing: Special Issue on Soft Computing
Models for Big Data, vol. 5, pp. 1035{1049.
J. T. Wassan (2015), Discovering Big Data Modelling for Educational World, Procedia
- Social and Behavioral Sciences, vol. 176, pp. 642{649, ISSN 1877-0428, doi:
http://dx.doi.org/10.1016/j.sbspro.2015.01.522.
R. Addo-Tenkorang and P. T. Helo (2016), Big Data Applications in Operations/Supply-Chain
Management: A Literature Review, Computers & Industrial Engineering, pp. {, ISSN 0360-8352,
doi:http://dx.doi.org/10.1016/j.cie.2016.09.023.
M. Bilal, L. O. Oyedele, J. Qadir, K. Munir, S. O. Ajayi, O. O. Akinade, H. A. Owolabi, H. A.
Alaka and M. Pasha (2016), Big Data in the construction industry: A review of present status,
opportunities, and future trends, Advanced Engineering Informatics, vol. 30, pp. 500{521, ISSN
, doi:10.1016/j.aei.2016.07.001.
I. de la Torre D??ez, H. M. Cosgaya, B. Garcia-Zapirain and M. L??pez-Coronado (2016), Big
Data in Health: a Literature Review from the Year 2005, Journal of Medical Systems, vol. 40, pp.
{6, ISSN 1573689X, doi:10.1007/s10916-016-0565-7.
S. Fodeh and Q. Zeng (2016), Mining Big Data in biomedicine and health care, Journal of Biomedical
Informatics, vol. 63, pp. 400{403, ISSN 15320464, doi:10.1016/j.jbi.2016.09.014.
A. Elragal and M. Haddara (2014), Big Data Analytics : a Text Mining-Based Literature Analysis,
Norsk konferanse for organisasjoners bruk av IT, vol. 22, pp. 1{12.
S. Goswami, S. Chakraborty, S. Ghosh, A. Chakrabarti and B. Chakraborty (2016), A review on
application of data mining techniques to combat natural disasters, Ain Shams Engineering Journal,
ISSN 20904479, doi:10.1016/j.asej.2016.01.012.
A. Gandomi and M. Haider (2015), Beyond the hype: Big data concepts, methods, and analytics,
International Journal of Information Management, vol. 35, pp. 137{144.
T. Huang, L. Lan, X. Fang, P. An, J. Min and F. Wang (2015), Promises and challenges of big
data computing in health sciences, Big Data Research, vol. 2, pp. 2{11.
K. Normandeau (2013), Beyond volume, variety and velocity is the issue of big data veracity, Inside
Big Data.
M. Gotz, M. Richerzhagen, C. Bodenstein, G. Cavallaro, P. Glock, M. Riedel and J. A. Benediktsson
(2015), On scalable data mining techniques for earth science, Procedia Computer Science,
vol. 51, pp. 2188{2197, ISSN 18770509, doi:10.1016/j.procs.2015.05.494.
J. Dean and S. Ghemawat (2008), MapReduce, Communications of the ACM, vol. 51, p. 107, ISSN
, doi:10.1145/1327452.1327492.
C.-F. Tsai, W.-C. Lin and S.-W. Ke (2016), Big Data Mining with Parallel Computing: A Com-
parison of Distributed and MapReduce Methodologies, Journal of Systems and Software, vol. 122,
pp. 83{92, ISSN 01641212, doi:10.1016/j.jss.2016.09.007.
Yahoo (2014), Apache Hadoop.
M. Zaharia, M. Chowdhury, T. Das and A. Dave (2012), Resilient distributed datasets: A fault-
tolerant abstraction for in-memory cluster computing, in NSDI'12 Proceedings of the 9th USENIX
conference on Networked Systems Design and Implementation, pp. 2{2, USENIX Association,
ISBN 978-931971-92-8, ISSN 00221112, doi:10.1111/j.1095-8649.2005.00662.x.
The Apache Software Foundation (2015), Apache Storm, URL http://storm.apache.org/.
G. D. F. Morales and A. Bifet (2015), SAMOA: Scalable Advanced Massive Online Analysis,
Journal of Machine Learning Research, vol. 16, pp. 149{153, ISSN 15337928.
Apache Software Foundation (2015), Apache Flink, URL http://flink.apache.org/.
A. Alexandrov, R. Bergmann, S. Ewen, J. C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich,
U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinlander, M. J. Sax, S. Schelter, M. Hoger,
K. Tzoumas and D. Warneke (2014), The Stratosphere platform for big data analytics, VLDB
Journal, vol. 23, pp. 939{964, ISSN 0949877X, doi:10.1007/s00778-014-0357-y.
Apache h2o, URL http://www.h2o.ai/.
S. Ghemawat, H. Gobio and S.-T. Leung (2003), The Google le system, ACM SIGOPS Operating
Systems Review, vol. 37, p. 29, ISSN 01635980, doi:10.1145/1165389.945450.
K. Shvachko, H. Kuang, S. Radia and R. Chansler (2010), The Hadoop distributed le system,
IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010, pp. 1{10,
ISSN 978-1-4244-7152-2, doi:10.1109/MSST.2010.5496972.
A. Makris, K. Tserpes, V. Andronikou and D. Anagnostopoulos (2016), A Classi cation of NoSQL
Data Stores Based on Key Design Characteristics, Procedia Computer Science, vol. 97, pp. 94{103.
A. Corbellini, C. Mateos, A. Zunino, D. Godoy and S. Schiano (2017), Persisting big-data: The
NoSQL landscape, Information Systems, vol. 63, pp. 1{23.
I. Guyon and A. Elissee (2003), An introduction to variable and feature selection, Journal of
machine learning research, vol. 3, pp. 1157{1182.
H. Liu and H. Motoda (2012), Feature selection for knowledge discovery and data mining, vol. 454,
Springer Science & Business Media.
L. Wang, Y. Wang and Q. Chang (2016), Feature selection methods for big data bioinformatics:
A survey from the search perspective, Methods, vol. 111, pp. 21{31.
V. Bolon-Canedo, N. Sanchez-Maro~no and A. Alonso-Betanzos (2015), Recent advances and emerg-
ing challenges of feature selection in the context of big data, Knowledge-Based Systems, vol. 86,
pp. 33{45.
D. Peralta, S. Del Ro, S. Ramrez-Gallego, I. Triguero, J. M. Benitez and F. Herrera (2015),
Evolutionary Feature Selection for Big Data Classi cation: A MapReduce Approach, Mathematical
Problems in Engineering, vol. 2015, pp. 1{11, ISSN 15635147, doi:10.1155/2015/246139.
H. Banka and S. Dara (2015), A Hamming distance based binary particle swarm optimization
(HDBPSO) algorithm for high dimensional feature selection, classi cation and validation, Pattern
Recognition Letters, vol. 52, p. 94100, ISSN 01678655, doi:10.1016/j.patrec.2014.10.007.
V. Bolon-Canedo, N. Sanchez-Marono and A. Alonso-Betanzos (2015), Distributed feature selec-
tion: An application to microarray data classi cation, Applied Soft Computing, vol. 30, p. 136150,
ISSN 15684946, doi:10.1016/j.asoc.2015.01.035.
M. Moradkhani, A. Amiri, M. Javaherian and H. Safari (2015), A hybrid algorithm for feature
subset selection in high-dimensional datasets using FICA and IWSSr algorithm, Applied Soft
Computing, vol. 35, p. 123135, ISSN 15684946, doi:10.1016/j.asoc.2015.03.049.
I. Triguero, S. del Ro, V. Lopez, J. Bacardit, J. M. Bentez and F. Herrera (2015), ROSEFW-
RF: The winner algorithm for the ECBDL'14 big data competition: An extremely imbalanced big
data bioinformatics problem, Knowledge-Based Systems, vol. 87, pp. 69{79, ISSN 09507051, doi:
1016/j.knosys.2015.05.027.
J. Apolloni, G. Leguizamn and E. Alba (2016), Two hybrid wrapper- lter feature selection algo-
rithms applied to high-dimensional microarray experiments, Applied Soft Computing, vol. 38, p.
, ISSN 15684946, doi:10.1016/j.asoc.2015.10.037.
B. Pes, N. Dess and M. Angioni (2017), Exploiting the ensemble paradigm for stable feature se-
lection: A case study on high-dimensional genomic data, Information Fusion, vol. 35, p. 132147,
ISSN 15662535, doi:10.1016/j.in us.2016.10.001.
A. Muralidhar and V. Pattabiraman (2015), An ecient association rule based clustering
of XML documents, Procedia Computer Science, vol. 50, pp. 401{407, ISSN 18770509, doi:
1016/j.procs.2015.04.024.
L. Sael, I. Jeon and U. Kang (2015), Scalable Tensor Mining, Big Data Research, vol. 2, pp. 82{86,
ISSN 22145796, doi:10.1016/j.bdr.2015.01.004.
M. Kumar, N. K. Rath and S. K. Rath (2016), Analysis of microarray leukemia data using an
ecient MapReduce-based K-nearest-neighbor classi er, Journal of Biomedical Informatics, vol. 60,
pp. 395{409, ISSN 15320464, doi:10.1016/j.jbi.2016.03.002.
S. Batra and S. Sachdeva (2016), Organizing standardized electronic healthcare records data for
mining, Health Policy and Technology, pp. 1{17, ISSN 22118845, doi:10.1016/j.hlpt.2016.03.006.
U. Yun and G. Lee (2016), Sliding window based weighted erasable stream pattern mining for stream
data applications, Future Generation Computer Systems, vol. 59, pp. 1{20, ISSN 0167739X, doi:
1016/j.future.2015.12.012.
Z. Deng, X. Zhu, D. Cheng, M. Zong and S. Zhang (2015), Ecient kNN classi cation algorithm
for big data, Neurocomputing, ISSN 18728286, doi:10.1016/j.neucom.2015.08.112.
J. Cao, H. Cui, H. Shi and L. Jiao (2016), Big data: A parallel particle swarm optimization-back-
propagation neural network algorithm based on MapReduce, PLoS ONE, vol. 11, pp. 1{17, ISSN
, doi:10.1371/journal.pone.0157551.
R. Agerri, X. Artola, Z. Beloki, G. Rigau and A. Soroa (2014), Big data for Natural Language
Processing: A streaming approach, Knowledge-Based Systems, vol. 79, pp. 36{42, ISSN 09507051,
doi:10.1016/j.knosys.2014.11.007.
D. M. Farid, M. A. Al-Mamun, B. Manderick and A. Nowe (2016), An adaptive rule-based classi er
for mining big biological data, Expert Systems with Applications, vol. 64, pp. 305{316, ISSN
, doi:10.1016/j.eswa.2016.08.008.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. H. Witten (2009), The
WEKA data mining software, ACM SIGKDD Explorations, vol. 11, pp. 10{18, ISSN 19310145,
doi:10.1145/1656274.1656278.
M. Yuan and Y. Shi (2015), Text clustering based on a divide and merge strategy, Procedia Computer
Science, vol. 55, pp. 825{832, ISSN 18770509, doi:10.1016/j.procs.2015.07.153.
D. L. Davies and D. W. Bouldin (1979), A cluster separation measure, IEEE transactions on
pattern analysis and machine intelligence, pp. 224{227.
V. Bol??n-Canedo, N. S??nchez-Maro??o and A. Alonso-Betanzos (2015), Distributed feature se-
lection: An application to microarray data classi cation, Applied Soft Computing Journal, vol. 30,
pp. 136{150, ISSN 15684946, doi:10.1016/j.asoc.2015.01.035.
X. Yang (2015), Knowledge management in big data times, in Big Data and Cloud Computing
(BDCloud), 2015 IEEE Fifth International Conference on, pp. 168{171, IEEE.
S. Tichkiewitch (2008), Capitalization and reuse of forging knowledge in integrated design, Methods
and Tools for E ective Knowledge Life-Cycle-Management, pp. 479{485.
R. K. Lomotey and R. Deters (2014), Towards knowledge discovery in big data, in Service Oriented
System Engineering (SOSE), 2014 IEEE 8th International Symposium on, pp. 181{191, IEEE.
C. Esposito, M. Ficco, F. Palmieri and A. Castiglione (2015), A knowledge-based platform for
Big Data analytics based on publish/subscribe services and stream processing, Knowledge-Based
Systems, vol. 79, pp. 3{17.
P. P. Ruiz, B. K. Foguem and B. Grabot (2014), Generating knowledge in maintenance from
Experience Feedback, Knowledge-Based Systems, vol. 68, pp. 4{20.
U. Sha que and H. Qaiser (2014), A comparative study of data mining process models (KDD,
CRISP-DM and SEMMA), Int. J. Innov. Sci. Res, vol. 12, pp. 217{222.
M.-H. Karray, B. Chebel-Morello and N. Zerhouni (2014), PETRA: Process Evolution using a
TRAce-based system on a maintenance platform, Knowledge-Based Systems, vol. 68, pp. 21{39.
J. Dean and S. Ghemawat (2008), MapReduce: simpli ed data processing on large clusters, Communications
of the ACM, vol. 51, pp. 107{113.
G. Bello-Orgaz, J. J. Jung and D. Camacho (2016), Social big data: Recent achievements and new
challenges, Information Fusion, vol. 28, pp. 45{59.
R. Chalh, Z. Bakkoury, D. Ouazar and M. D. Hasnaoui (2015), Big data open platform for water
resources management, in Cloud Technologies and Applications (CloudTech), 2015 International
Conference on, pp. 1{8, IEEE.
J. Andreu-Perez, C. C. Poon, R. D. Merri eld, S. T. Wong and G.-Z. Yang (2015), Big data for
health, IEEE journal of biomedical and health informatics, vol. 19, pp. 1193{1208.
K. Abuosba (2015), Formalizing big data processing lifecycles: Acquisition, serialization, aggre-
gation, analysis, mining, knowledge representation, and information dissemination, in Computing
and Communication (IEMCON), 2015 International Conference and Workshop on, pp. 1{4, IEEE.
G. Chatzigeorgakidis, S. Karagiorgou, S. Athanasiou and S. Skiadopoulos (2015), A MapReduce
based k-NN joins probabilistic classi er, in Big Data (Big Data), 2015 IEEE International Con-
ference on, pp. 952{957, IEEE.
I. Gorton, J. Klein and A. Nurgaliev (2015), Architecture knowledge for evaluating scalable
databases, in Software Architecture (WICSA), 2015 12th Working IEEE/IFIP Conference on,
pp. 95{104, IEEE.
M. El Houari, M. Rhanoui and B. El Asri (2015), From Big Data to Big Knowledge: The art of
making Big Data alive, in Cloud Technologies and Applications (CloudTech), 2015 International
Conference on, pp. 1{6, IEEE.
Y. Huang and X. Zhou (2015), Knowledge model for electric power big data based on ontology and
semantic web, CSEE Journal of Power and Energy Systems, vol. 1, pp. 19{27.
M. D. Wang (2015), Biomedical Big Data Analytics for Patient-Centric and Outcome-Driven
Precision Health, in Computer Software and Applications Conference (COMPSAC), 2015 IEEE
th Annual, vol. 3, pp. 1{2, IEEE.
K. Taneja, Q. Zhu, D. Duggan and T. Tung (2015), Linked enterprise data model and its use
in real time analytics and context-driven data discovery, in Mobile Services (MS), 2015 IEEE
International Conference on, pp. 277{283, IEEE.
N. Bari, R. Vichr, K. Kowsari and S. Berkovich (2014), 23-bit metaknowledge template towards
big data knowledge discovery and management, in Data Science and Advanced Analytics (DSAA),
International Conference on, pp. 519{526, IEEE.
C. L. Borgman, P. T. Darch, A. E. Sands, J. C. Wallis and S. Traweek (2014), The ups and downs
of knowledge infrastructures in science: Implications for data management, in Digital Libraries
(JCDL), 2014 IEEE/ACM Joint Conference on, pp. 257{266, IEEE.
M. A. Roger, Y. Xu and M. Zhao (2014), BigCache for big-data systems, in Big Data (Big Data),
IEEE International Conference on, pp. 189{194, IEEE.
R. Y. Zhong, G. Q. Huang and Q. Dai (2014), A big data cleansing approach for n-dimensional
RFID-Cuboids, in Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the
IEEE 18th International Conference on, pp. 289{294, IEEE.
N. Mishra, C.-C. Lin and H.-T. Chang (2014), A cognitive oriented framework for IoT big-data
management prospective, in Communication Problem-Solving (ICCP), 2014 IEEE International
Conference on, pp. 124{127, IEEE.
A. Madkour, W. G. Aref and S. Basalamah (2013), Knowledge cubesA proposal for scalable and
semantically-guided management of Big Data, in Big Data, 2013 IEEE International Conference
on, pp. 1{7, IEEE.
C. Seebode, M. Ort, C. Regenbrecht and M. Peuker (2013), BIG DATA infrastructures for phar-
maceutical research, in Big Data, 2013 IEEE International Conference on, pp. 59{63, IEEE.
P. Tin, T. T. Zin, T. Toriu and H. Hama (2013), An Integrated Framework for Disaster Event
Analysis in Big Data Environments, in Intelligent Information Hiding and Multimedia Signal
Processing, 2013 Ninth International Conference on, pp. 255{258, IEEE.
G. Stalidis and D. Karapistolis (2014), Tourist destination marketing supported by electronic cap-
italization of knowledge, Procedia-Social and Behavioral Sciences, vol. 148, pp. 110{118.
R. W. Gehl (2015), Sharing, knowledge management and big data: A partial genealogy of the data
scientist, European Journal of Cultural Studies, vol. 18, pp. 413{428.
C. Fan, F. Xiao, H. Madsen and D. Wang (2015), Temporal knowledge discov-
ery in big fBASg data for building energy management, Energy and Buildings, vol.
, pp. 75 { 89, ISSN 0378-7788, doi:https://doi.org/10.1016/j.enbuild.2015.09.060, URL
http://www.sciencedirect.com/science/article/pii/S0378778815302991.
X. Yang (2015), Knowledge management in big data times.
E. Begoli (2012), A short survey on the state of the art in architectures and platforms for large
scale data analysis and knowledge discovery from data, in Proceedings of the WICSA/ECSA 2012
Companion Volume, pp. 177{183, ACM.
C. Fan, F. Xiao and C. Yan (2015), A framework for knowledge discovery in massive building
automation data and its application in building diagnostics, Automation in Construction, vol. 50,
pp. 81{90.
M. Castelli, L. Vanneschi, L. Manzoni and A. Popovic (2016), Semantic genetic programming for
fast and accurate data knowledge discovery, Swarm and Evolutionary Computation, vol. 26, pp.
{7.
A. Endert, S. Szymczak, D. Gunning and J. Gersh (2014), Modeling in big data environments, in
Proceedings of the 2014 Workshop on Human Centered Big Data Research, p. 56, ACM.