Additional Detection of Clones Using Locally Sensitive Hashing

Authors

DOI:

https://doi.org/10.13052/jcsm2245-1439.123.6

Keywords:

language-independent incremental repeat detector, locally sensitive hashing, incremental approach, incremental step, experiment, hash segment, hash function, clone index, shingles, MinHashing, shingling

Abstract

Today, there are many methods for detecting blocks with repetitions and redundancy in the program code. But mostly they turn out to be dependent on the programming language in which the software is developed and try to detect complex types of repeating blocks. Therefore, the goal of the research was to develop a language-independent repetition detector and expand its capabilities. In the development and operation of the language-independent incremental repeater detector, it was decided to conduct experiments for five open source systems for evaluation using the industrial detector SIG (Software Improvement Group), including the use of a tool syntactic analysis. But there was the question of extending the algorithm for additional detection of duplication and redundancy in the code, which was proposed by Hammel, and how improvements can be made to achieve independence from the programming language. Particular attention was paid to the empirical results presented in the original study, as their effectiveness is questionable. The main parameters that were considered when creating the index for LIIRD (Language-independent incremental repeat detector) and its expansion of the LSH (locally sensitive hashing): measuring time, memory and creating an incremental step. Based on the results of experiments conducted by the authors of Hammel’s work, there was a motivation to develop an extended approach. The idea of this approach is that according to the original study, the operation of calculating the entire block index with repeats and redundancy from scratch is very time consuming. Therefore, it is proposed to use LSH to obtain an effective assessment of the similarity of software project files.

Downloads

Download data is not yet available.

Author Biography

Nataliia I. Pravorska, Khmelnitsky National University, Ukraine

Nataliia I. Pravorska has the degree of Candidate of Pedagogical Sciences 2005, PhD in Pedagogy (theory and methods of informatics (computer science)) 2011, MS of Software Engineering 2021. She is a associate professor of the Department of Software Engineering at Khmelnytskyi National University (2004–). Research interests: C+++⁣+ and Java programming, object-oriented programming, development of software products based on mathematical models, Internet of Things. Educational activity. Teaches disciplines: Applied information systems, Software design, Object-oriented programming, Basics of team software development, Java programming technologies, Software systems development methodologies and technologies.

References

Benjamin Hummel, ElmarJuergens, Lars Heinemann, and Michael Conradt. Indexbased code clone detection: incrementtal, distributed, scalable. In 2010 IEEE International Conference on Software Maintenance, pages 1–9. IEEE, 2010.

Indyk Piotr, MotwaniRajeev. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998.

LeskovecJure, RajaramanAnand, UllmanJeffrey David. Mining of Massive Datasets. Cambridge University Press, USA, 2nd edition, 2014. ISBN 1107077230.

Pravorska N.I., Barmak O.V., Medzatiy D.M., Shestakevych T.V. The process of detecting blocks with repetitions and redundancy when using a language-independent incremental detector. KHNU Bulletin, Technical Sciences series, 3, 2021, pp. 39–45.

Pravorska N.I., Bedratyuk L.P, Forkun Y.V. Yashina O.M. Language-independent detector for detection and elimination of repetitions and redundancies of the program code. Measuring and computing equipment in technological processes. – Khmelnytskyi, 2021. 1, pp. 56–61.

ZhouWei, HuJiankun, WangSong. Enhanced locality-sensitive hashing for fingerprint forensics over large multi-sensor databases. IEEE Transactions on Big Data, 2017.

Downloads

Published

2023-05-18

How to Cite

1.
Pravorska NI. Additional Detection of Clones Using Locally Sensitive Hashing. JCSANDM [Internet]. 2023 May 18 [cited 2024 Apr. 24];12(03):367-88. Available from: https://journals.riverpublishers.com/index.php/JCSANDM/article/view/18799

Issue

Section

Assurance of Information Systems’ Quality and Security