Leveraging Conceptual Data Models to Ensure the Integrity of Cassandra Databases

Authors

  • Pablo Suárez-Otero Computer Science Department, University of Oviedo, Campus de Viesques, Gijón, Spain
  • María José Suárez-Cabal Computer Science Department, University of Oviedo, Campus de Viesques, Gijón, Spain
  • Javier Tuya Computer Science Department, University of Oviedo, Campus de Viesques, Gijón, Spain

Keywords:

NoSQL, Cloud, Conceptual Model, Logical Model, Cassandra, Logical Data Integrity

Abstract

The use of NoSQL databases for cloud environments has been increasing due to their performance advantages when working with big data. One of the most popular NoSQL databases used for cloud services is Cassandra, in which each table is created to satisfy one query. This means that as the same data could be retrieved by several queries, these data may be repeated in several different tables. The integrity of these data must be maintained in the application that works with the database, instead of in the database itself as in relational databases. In this paper, we propose a method to ensure the data integrity when there is a modification of data by using a conceptual model that is directly connected to the logical model that represents the Cassandra tables. This method identifies which tables are affected by the modification of the data and also proposes how the data integrity of the database may be ensured. We detail the process of this method along with two examples where we apply it in two insertions of tuples in a conceptual model. We also apply this method to a case study where we insert several tuples in the conceptual model, and then we discuss the results. We have observed how in most cases several insertions are needed to ensure the data integrity as well as needing to look for values in the database in order to do it.

Downloads

Download data is not yet available.

Author Biographies

Pablo Suárez-Otero, Computer Science Department, University of Oviedo, Campus de Viesques, Gijón, Spain

Pablo Suárez-Otero received his B.Sc. degree in Computer Engineering in 2015 and in M.Sc. in Computer Engineering in 2017 from the University of Oviedo. He is currently a PhD candidate at the University of Oviedo. He is also an Assistant Professor at the University of Oviedo. He is a member of the Software Engineering Research Group. His research interests include software testing, NoSQL databases and data modelling.

María José Suárez-Cabal, Computer Science Department, University of Oviedo, Campus de Viesques, Gijón, Spain

María José Suárez-Cabal isan assistant professor atthe University of Oviedo, Spain, and is a member of the Software Engineering Research Group (GIIS, giis.uniovi.es). She obtained her PhD in Computing from the University of Oviedo in 2006. Her research focusses on software testing, and more specifically on testing database applications.

Javier Tuya, Computer Science Department, University of Oviedo, Campus de Viesques, Gijón, Spain

Javier Tuya is Professor in the Computing Department at the University of Oviedo, Spain. His current research interests in the field of Software Testing include database driven applications, data engineering, testing techniques and automation. He has been the manager in many research and technology transfer projects and published in different international conferences and journals. He held the position of CIO of the University of Oviedo and currently he is Director of the Indra-Uniovi Chair, member of the ISO working group that works in the development of the new software testing standard ISO/IEC/IEEE 29119, and convenor of the UNE national body workgroup on software testing.

References

Moniruzzaman, A. B. M, Hossain and Syed Akhter (2013). Nosql database: New era of databases for big data analytics-classification, characteristics and comparison. arXiv preprint arXiv:1307.0191.

Leavitt, Neal. (2010). Will NoSQL databases live up to their promise? Computer, Vol 43, No 2, pp 12–14.

Li, Yishan, and Manoharan, Sathiamoorthy. (2013). A performance comparison of SQL and NoSQL databases. In Communications, computers and signal processing, pp 15–19

Cattell, Rick. (2011). Scalable SQL and NoSQL data stores. Acm Sigmod Record, Vol 39, No 4, pp 12–27

Tauro, Clarence. JM, Aravindh, Shreeharsha and Shreeharsha, A. B. (2012). Comparative study of the new generation, agile, scalable, high performance NOSQL databases. International Journal of Computer Applications, Vol 48, No 20, pp. 1–4.

Bhogal, Jagdev and Choksi, Imran (2015). Handling big data using NoSQL. In IEEE 29th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 393–398

Pokorny, Jaroslav (2013). NoSQL databases: a step to database scalability in web environment. International Journal of Web Information Systems, Vol 9 No 1, pp 69–82.

MongoDB Inc (2019). Who uses MongoDB https://www.mongodb.com/who-uses-mongodb Accesed: 2019-03-13

Datastax (2019). Case Studies, https://www.datastax.com/ resources/casestudies Accessed: 2019-03-13

Apache Software Foundation. (2016). Apache Cassandra, http://cassandra.apache.org/ Accessed: 2019-03-13

Han, Jing et al (2011). Survey on NoSQL database. In 6th international conference on Pervasive computing and applications (ICPCA), 2011 pp. 363–366

Datastax (2015). Basic Rules of Cassandra Data Modeling, https://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling Accessed 2019-03-13

Rajanarayanan Thottuvaikkatumana. (2015). Cassandra Design Patterns, second edition, ed. Packt Publishing Ltd

Suárez-Otero, Pablo, Suárez-Cabal, María José and Tuya, Javier (2018). Leveraging Conceptual Data Models for Keeping Cassandra Database Integrity. In WEBIST 2018, pp 398–403

Apache Software Foundation (2016). The Cassandra Query Language (CQL) http://cassandra.apache.org/doc/latest/cql/Accessed 2019-03-13

Ghazizadeh, Puya, Mukkamala, Ravi and Olariu, Stephan (2013). Data Integrity Evaluation in CloudDatabase-as-a-Service. In IEEE Ninth World Congress on Services pp 280–285

Aniello, Leonard et al (2017). Blockchain-based Database to Ensure Data Integrityin Cloud Computing Environments. In 13th European Dependable Computing Conference (EDCC), pp. 151–154

Olmsted, Aspen and Santhanakrishnan, Gayathri (2016). Cloud Data Denormalization of Anonymous Transactions. In Cloud Computing Seventh International Conference on Cloud Computing, GRIDs, and Virtualization, pp 42–46.

Datastax. (2017). How are consistent read and write operations handled? : https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDataConsistency.html Accessed 2019-03-13

Datastax (2015). New in Cassandra: Materialized Views: https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views Accessed 2019-03-13

Christian Peter. (2015). Supporting the Join Operation in a NoSQL System. Master’s thesis. Norwegian university of Science and Technology, Norway

Chebotko, Artem; Kashlev, Andrey and Lu, Shiyong (2015). A Big Data Modeling Methodology for Apache Cassandra. In IEEE International Congress on Big Data (BigData’15), pp. 238–245

Sevilla Ruiz, Diego, Morales Feliciano, Severino and García Molina, Jesús (2015). Inferring versioned schemas from NoSQL databases and its applications. In International Conference on Conceptual Modeling (ER 2015), pp. 467–480

Datastax (2019). Creating a table: https://docs.datastax.com/en/dse/5.1/cql/cql/cqlusing/useCreateTable.html Accessed 2019-05-20

Downloads

Published

2019-06-01

Issue

Section

Articles