Database Security Enhancement by Eliminating the Redundant and Incorrect Spelled Data Entries
Keywords:Database security, redundancy, spell checker, Bloom filter, Edit distance
Database is used for storing the data in an easy and efficient format. In recent days large size of data has been generated through number of applications and same has been stored in the database. Considering the importance of data in every sector of digitized world, it is foremost important to secure the data. Hence, database security has been given a prime importance in every organization. Redundant data entries may stop the functioning of the database. Redundant data entries may be inserted in the database because of the absence of primary key or due to incorrect spelled data. This article addresses the solution for database security by protecting the database from redundant data entries based on the concept of Bloom filter. This database security has been obtained by correcting the incorrect spelled data from query values with the help of edit distance algorithm followed by the data redundancy check. This article also presents the performance comparison between proposed technique and MongoDB database for document search functionality.
Jiang L, Naumann F (2019) Holistic primary key and foreign key detection. J Intell Inf Syst 1–23
Date CJ (2019) What Is Database Design, Anyway? In: Database Design and Relational Theory. Springer, pp 393–406
Link S, Prade H (2019) Relational database schema design for uncertain data. Inf Syst 84:88–110
Vighio MS, Khanzada TJ, Kumar M (2017) Analysis of the effects of redundancy on the performance of relational database systems. In: 2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences (ICETSS). pp 1–5
(2013) Dealing with Database Denial of Service Licensed by DB Networks
Bahmani AH, Naghibzadeh M, Bahmani B (2008) Automatic database normalization and primary key generation. In: 2008 Canadian Conference on Electrical and Computer Engineering. pp 11–16
Zhu M, Shen D, Nie T, Kou Y (2009) An Adjusted-Edit Distance Algorithm Applying to Web Environment. In: 2009 Sixth Web Information Systems and Applications Conference. pp 71–75
Hong S (1995) A method for analyzing and reducing data redundancy in objectoriented databases
Al-Rawi M, Al-Zuqary Y, Saghezchi FB, et al (2017) Data redundancy may lead to unreliable intrusion detection systems. In: 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC). pp 1897–1902
Huang Z, Chen J, Lin Y, et al (2015) Minimizing data redundancy for high reliable cloud storage systems. Comput Networks 81:164–177
Altarawneh R (2017) Spelling detection errors techniques in NLP: A survey. Int J Comput Appl 172:1–5
Kukich K (1992) Techniques for automatically correcting words in text. Acm Comput Surv 24:377–439
Jayalatharachchi E, Wasala A, Weerasinghe R (2012) Data-driven spell checking: the synergy of two algorithms for spelling error detection and correction. In: International Conference on Advances in ICT for Emerging Regions (ICTer2012). pp 7–13
Turchin A, Chu JT, Shubina M, Einbinder JS (2007) Identification of misspelled words without a comprehensive dictionary using prevalence analysis. In: AMIA Annual Symposium Proceedings. p 751
DeCastro-Garc’ia N, Muñoz Castañeda ÁL, Fernández Rodr’iguez M, Carriegos M V (2018) On detecting and removing superficial redundancy in vector databases. Math Probl Eng 2018:
Nalini M, Anbu S (2016) Elimination of Data Redundancy before Persisting into DBMS using SVM Classification. Int J Eng Res 5:
Mongodb Primary Key: Example to set _id field with ObjectId(). https://www.guru99.com/mongodb-objectid.html.
Zhang S, Hu Y, Bian G (2017) Research on string similarity algorithm based on Levenshtein Distance. In: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). pp 2247–2251
Patgiri R, Nayak S, Borgohain SK (2019) Role of bloom filter in big data research: A survey. arXiv Prepr arXiv190306565
Doniparthi G, Mühlhaus T, Deßloch S (2020) A Bloom Filter-Based Framework for Interactive Exploration of Large Scale Research Data. In: European Conference on Advances in Databases and Information Systems. pp 166–176
Jing C, Zhengang N, Liying L, Fei Y (2009) Research and application on Bloom filter in routing planning for indoor robot navigation system. In: 2009 Pacific-Asia Conference on Circuits, Communications and Systems. pp 244–247
Rani S, Singh J (2017) Enhancing Levenshtein’s edit distance algorithm for evaluating document similarity. In: International Conference on Computing, Analytics and Networks. pp 72–80
Greenhill SJ (2011) Levenshtein distances fail to identify language relationships accurately. Comput Linguist 37:689–698
GitHub - ozlerhakan/mongodb-json-files: A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB. https://github.com/ozlerhakan/mongodb-json-files.
Calculate the required bloom filter size and optimal number of hashes from the expected number of items in the collection and acceptable false-positive rate • GitHub. https://gist.github.com/brandt/8f9ab3ceae37562a2841.
Bose P, Guo H, Kranakis E, et al (2008) On the false-positive rate of Bloom filters. Inf Process Lett 108:210–213