Database Security Enhancement by Eliminating the Redundant and Incorrect Spelled Data Entries

Keywords: Database security, redundancy, spell checker, Bloom filter, Edit distance


Database is used for storing the data in an easy and efficient format. In recent days large size of data has been generated through number of applications and same has been stored in the database. Considering the importance of data in every sector of digitized world, it is foremost important to secure the data. Hence, database security has been given a prime importance in every organization. Redundant data entries may stop the functioning of the database. Redundant data entries may be inserted in the database because of the absence of primary key or due to incorrect spelled data. This article addresses the solution for database security by protecting the database from redundant data entries based on the concept of Bloom filter. This database security has been obtained by correcting the incorrect spelled data from query values with the help of edit distance algorithm followed by the data redundancy check. This article also presents the performance comparison between proposed technique and MongoDB database for document search functionality.


Download data is not yet available.

Author Biographies

Rupali Chopade, Department of Computer Engineering and IT, College of Engineering Pune, Savitribai Phule Pune University, India

Rupali Chopade is a full time Research Scholar under AICTE-QIP scheme, at Department of Computer Engineering and IT, College of Engineering Pune, India. She is working as Assistant Professor at Department of Information Technology, Marathwada Mitra Mandal’s College of Engineering Pune, India. She has 17 years of teaching experience. Her research interest includes database forensics and database security. She has received “Distinguished HOD “Award by Computer Society of India (CSI) in 2017.

Vinod Pachghare, Department of Computer Engineering and IT, College of Engineering Pune, Savitribai Phule Pune University, India

Vinod Pachghare is Associate Professor in the Department of Computer Engineering and Information Technology, College of Engineering, Pune (An autonomous institute of Government of Maharashtra), India. He has 29 years of teaching experience and has published the books on Cloud Computing and Computer Graphics. Dr. Pachghare has over 37 research publications in various international journals and conferences. His area of research is network security. Also he is a member of Board of studies in Computer Engineering/Information Technology of a number of Autonomous Institutes. He is an Investigator for the Information Security Education and Awareness [ISEA] Project, Ministry of Information Technology, Govt. of India. He was a Principal Investigator for a research project “Wireless IDS”, sponsored by AICTE, New Delhi. He delivered lectures on recent and state of the art topics in Computer Engineering and Information Technology as an invited speaker. He has received “Best Faculty Award” 2018 by CSI, Mumbai Chapter.


Jiang L, Naumann F (2019) Holistic primary key and foreign key detection. J Intell Inf Syst 1–23

Date CJ (2019) What Is Database Design, Anyway? In: Database Design and Relational Theory. Springer, pp 393–406

Link S, Prade H (2019) Relational database schema design for uncertain data. Inf Syst 84:88–110

Vighio MS, Khanzada TJ, Kumar M (2017) Analysis of the effects of redundancy on the performance of relational database systems. In: 2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences (ICETSS). pp 1–5

(2013) Dealing with Database Denial of Service Licensed by DB Networks

DoS-in Your Database.

Bahmani AH, Naghibzadeh M, Bahmani B (2008) Automatic database normalization and primary key generation. In: 2008 Canadian Conference on Electrical and Computer Engineering. pp 11–16

Zhu M, Shen D, Nie T, Kou Y (2009) An Adjusted-Edit Distance Algorithm Applying to Web Environment. In: 2009 Sixth Web Information Systems and Applications Conference. pp 71–75

Hong S (1995) A method for analyzing and reducing data redundancy in objectoriented databases

Al-Rawi M, Al-Zuqary Y, Saghezchi FB, et al (2017) Data redundancy may lead to unreliable intrusion detection systems. In: 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC). pp 1897–1902

Huang Z, Chen J, Lin Y, et al (2015) Minimizing data redundancy for high reliable cloud storage systems. Comput Networks 81:164–177

Altarawneh R (2017) Spelling detection errors techniques in NLP: A survey. Int J Comput Appl 172:1–5

Kukich K (1992) Techniques for automatically correcting words in text. Acm Comput Surv 24:377–439

Jayalatharachchi E, Wasala A, Weerasinghe R (2012) Data-driven spell checking: the synergy of two algorithms for spelling error detection and correction. In: International Conference on Advances in ICT for Emerging Regions (ICTer2012). pp 7–13

Turchin A, Chu JT, Shubina M, Einbinder JS (2007) Identification of misspelled words without a comprehensive dictionary using prevalence analysis. In: AMIA Annual Symposium Proceedings. p 751

DeCastro-Garc’ia N, Muñoz Castañeda ÁL, Fernández Rodr’iguez M, Carriegos M V (2018) On detecting and removing superficial redundancy in vector databases. Math Probl Eng 2018:

Nalini M, Anbu S (2016) Elimination of Data Redundancy before Persisting into DBMS using SVM Classification. Int J Eng Res 5:

Mongodb Primary Key: Example to set _id field with ObjectId().

Zhang S, Hu Y, Bian G (2017) Research on string similarity algorithm based on Levenshtein Distance. In: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). pp 2247–2251

Patgiri R, Nayak S, Borgohain SK (2019) Role of bloom filter in big data research: A survey. arXiv Prepr arXiv190306565

Doniparthi G, Mühlhaus T, Deßloch S (2020) A Bloom Filter-Based Framework for Interactive Exploration of Large Scale Research Data. In: European Conference on Advances in Databases and Information Systems. pp 166–176

Jing C, Zhengang N, Liying L, Fei Y (2009) Research and application on Bloom filter in routing planning for indoor robot navigation system. In: 2009 Pacific-Asia Conference on Circuits, Communications and Systems. pp 244–247

Rani S, Singh J (2017) Enhancing Levenshtein’s edit distance algorithm for evaluating document similarity. In: International Conference on Computing, Analytics and Networks. pp 72–80

Greenhill SJ (2011) Levenshtein distances fail to identify language relationships accurately. Comput Linguist 37:689–698

GitHub - ozlerhakan/mongodb-json-files: A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB.

Calculate the required bloom filter size and optimal number of hashes from the expected number of items in the collection and acceptable false-positive rate • GitHub.

Bose P, Guo H, Kranakis E, et al (2008) On the false-positive rate of Bloom filters. Inf Process Lett 108:210–213

Emerging Trends in Cyber Security and Cryptography