MARIA: A PROCESS TO MODEL ENTITY RECONCILIATION PROBLEMS
Keywords:Web Engineering, Model-Driven Engineering, Entity Reconciliation, NDT
Within the development of software systems, the development of web applications may be one of the most widespread at present due to the great number of advantages they provide such as: multiplatform, speed of access or the not requiring extremely powerful hardware among others. The fact that so many web applications are being developed, makes enormous the volume of information that it is generated daily. In the management of all this information, the entity reconciliation (ER) problem occurs, which is to identify objects referring to the same real-world entity. This paper proposes to give a solution to this problem through a web perspective based on the Model-Driven Engineering paradigm. To this end, the Navigational Development Techniques (NDT) methodology, that provides a formal and complete set of processes that bring support to the software lifecycle management, has been taken as a reference and it has been extended adding new activities, artefacts and documents to cover the ER. All these elements are defined by a process named Model-Driven Entity ReconcilIAtion (MaRIA), that can be integrated in any software development methodology and allows one to define the ER problem from the early stages of the development. In addition, this proposal has been validated in a real-world case study helping companies to reduce costs when a software product that must give a solution to an ER problem has to be developed.
J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of Things (IoT): A vision,
architectural elements, and future directions,” Futur. Gener. Comput. Syst., vol. 29, no. 7, pp.
J. G. Enríquez, R. Blanco, F. J. Domínguez-Mayo, J. Tuya, and M. J. Escalona, “Towards an
MDE-based Approach to Test Entity Reconciliation Applications,” in Proceedings of the 7th
International Workshop on Automating Test Case Design, Selection, and Evaluation, 2016, pp.
N. Koch, A. Knapp, G. Zhang, and H. Baumeister, “UML-based web engineering: An
Approach Based on Standards,” Web Eng. Model. Implement. Web Appl., pp. 157–191, 2008.
S. Ceri, P. Fraternali, and A. Bongio, “Web modeling language (WebML): a modeling
language for designing Web sites,” Comput. Networks, vol. 33, no. 1, pp. 137–157, 2000.
S. Meliá, J. Gómez, S. Pérez, and O. Díaz, “A model-driven development for GWT-based rich
internet applications with OOH4RIA,” in Proceedings - 8th International Conference on Web
Engineering, ICWE 2008, 2008, pp. 13–23.
M. Linaje, J. C. Preciado, and F. Sánchez-Figueroa, “Engineering rich internet application user
interfaces over legacy web models,” IEEE Internet Comput., vol. 11, no. 6, pp. 53–59, 2007.
M. J. Escalona and G. Aragón, “NDT. A model-driven approach for web requirements,” IEEE
Trans. Softw. Eng., vol. 34, no. 3, pp. 377–394, 2008.
F. J. DOMINGUEZ-MAYO, M. J. ESCALONA, M. MEJIAS, M. ROSS, and G. STAPLES,
“Towards a Homogeneous Characterization of The Model-Driven Web Development
Methodologies,” J. web Eng., vol. 13, no. 1–2, pp. 129–159, 2014.
J.G. Enríquez, F.J. Domínguez-Mayo, M.J. Escalona, M. Ross, and G. Staples, “Entity
Reconciliation in Big Data Sources: a Systematic Mapping Study,” Expert Syst. Appl., vol. 80,
pp. 14–27, 2017.
S.-M.-R. Beheshti, B. Benatallah, S. Venugopal, S. H. Ryu, H. R. Motahari-Nezhad, and W.
Wang, “A systematic review and comparative analysis of cross-document coreference
resolution methods and tools,” Computing, pp. 1–37, 2016.
G. Papadakis, J. Svirsky, A. Gal, and T. Palpanas, “Comparative Analysis of Approximate
Blocking Techniques for Entity Resolution,” Pvldb, vol. 9, no. 9, pp. 684–695, 2016.
S. Cucerzan, “Large-Scale Named Entity Disambiguation Based on Wikipedia Data,” in
EMNLP-CoNLL 2007, 2007, pp. 708–716.
A. Moro, A. Raganato, and R. Navigli, “Entity Linking meets Word Sense Disambiguation: a
Unified Approach,” Trans. Assoc. Comput. Linguist., vol. 2, no. 0, pp. 231–244, 2014.
W. Shen, J. Wang, and J. Han, “Entity linking with a knowledge base: Issues, techniques, and
solutions,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 2, pp. 443–460, 2015.
X. Liu, Y. Li, H. Wu, M. Zhou, and Y. L. Furu Wei, “Entity Linking for Tweets,” Acl ’13, pp.
L. García-Borgoñón, “Un marco de referencia para facilitar la interoperabilidad y
mantenibilidad de los modelos de procesos de software,” 2015.
D. C. Schmidt, “Guest Editor’s Introduction : Model-Driven Engineering,” IEEE Comput., vol.
, no. 2, pp. 25–31, 2006.
M. Brambilla, J. Cabot, and M. Wimmer, Model-Driven Software Engineering in Practice, vol. 1, no. 1. 2012.
A. Metzger, “A Systematic Look at Model Transformations,” Nature, vol. 451, no. 7, pp. 644–
S. Mellor, K. Scott, A. Uhl, and D. Weise, “MDA Distilled - Principles of Model Driven
Architecture,” Addison Wesley, 2004.
L. Thiry and B. Thirion, “Functional metamodels for systems and software,” J. Syst. Softw.,
vol. 82, no. 7, pp. 1125–1136, 2009.
J. Bézivin, “On the unification power of models,” Softw. Syst. Model., vol. 4, no. 2, pp. 171–
F. J. Domínguez-Mayo, M. J. Escalona, and M. Mejías, “QuEF (Quality Evaluation
Framework) for model-driven web methodologies,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 2010, vol. 6385 LNCS, pp. 571–575.
F. J. Domínguez-Mayo, M. J. Escalona, and M. Mej ías, “Quality issues on model-driven web
engineering methodologies,” in Information Systems Development: Asian Experiences, 2011,
F. J. Dominguez-Mayo, M. J. Escalona, M. Mejias, and a. H. Torres, “A Quality Model in a
Quality Evaluation Framework for MDWE methodologies,” Res. Challenges Inf. Sci. (RCIS),
Fourth Int. Conf., 2010.
L. Getoor and A. Machanavajjhala, “Entity resolution: Theory, practice & open challenges,”
Proc. VLDB Endow., vol. 5, no. 12, pp. 2018–2019, 2012.
F. Wang, H. Wang, J. Li, and H. Gao, “Graph-based reference table construction to facilitate
entity matching,” J. Syst. Softw., vol. 86, no. 6, pp. 1679–1688, 2013.
A. McCallum, K. Nigam, and L. H. Ungar, “Efficient clustering of high-dimensional data sets
with application to reference matching,” Proc. sixth ACM SIGKDD Int. Conf. Knowl. Discov.
data Min. KDD 00, pp. 169–178, 2000.
S. E. Whang and H. Garcia-Molina, “Incremental entity resolution on rules and data,” VLDB
J., vol. 23, no. 1, pp. 77–102, 2014.
ISO/IEC JTC 1, “ISO/IEC CD 20546 - Big data report,” vol. 31, no. 5, pp. 498–513, 2014.
ISO/IEC/IEEE, “INTERNATIONAL STANDARD ISO/IEC/IEEE 29119,” vol. 2013, 2013.
R. S. Pressman, Software Engineering A Practitioner’s Approach 7th Ed - Roger S. Pressman.
J. G. Enríquez, J. A. García-García, F. J. Domínguez-Mayo, and M. J. Escalona, “ALAMEDA
Ecosystem: Centering efforts in Software Testing Development,” Qual. Control Assur. - An
Anc. Greek Term Re-Mastered, vol. 1, no. 1, pp. 155–172, 2017.
J. J. Chilenski, “An investigation of three forms of the modified condition decision coverage
(MCDC) criterion,” Security, no. April, 2001.
J. Tuya, M. J. Suárez-Cabal, and C. De La Riva, “Full predicate coverage for testing SQL
database queries,” Softw. Test. Verif. Reliab., vol. 20, no. 3, pp. 237–288, 2010.
R. Blanco, J. Tuya, and R. V. Seco, “Test adequacy evaluation for the user-database
interaction: A specification-based approach,” in Proceedings - IEEE 5th International
Conference on Software Testing, Verification and Validation, ICST 2012, 2012, pp. 71–80.
Goverment of Spain. Retrieved January 2018 from: “http://datos.gob.es/.”
The World Bank (TWB). Retrieved January 2018 from: “www.worldbank.org.”
International Labour Organzacion (ILO). Retrieved January 2018 from: “www.ilo.org”