Additional Detection of Clones Using Locally Sensitive Hashing

Nataliia I.  Pravorska

doi:10.13052/jcsm2245-1439.123.6

2023, Assurance of Information Systems’ Quality and Security

2023

Additional Detection of Clones Using Locally Sensitive Hashing

Assurance of Information Systems’ Quality and Security

https://doi.org/10.13052/jcsm2245-1439.123.6

Published 2023-05-18

Nataliia I. Pravorska⁺⁻

Nataliia I. Pravorska

Khmelnitsky National University, Ukraine

https://orcid.org/0000-0001-6001-3311

PDF

HTML

Keywords

language-independent incremental repeat detector
locally sensitive hashing
incremental approach
incremental step
experiment, hash segment
hash function
clone index, shingles
MinHashing
shingling

How to Cite

[1]

N. I. . Pravorska, “Additional Detection of Clones Using Locally Sensitive Hashing”, JCSANDM, vol. 12, no. 03, pp. 367–388, May 2023.

Abstract

Today, there are many methods for detecting blocks with repetitions and redundancy in the program code. But mostly they turn out to be dependent on the programming language in which the software is developed and try to detect complex types of repeating blocks. Therefore, the goal of the research was to develop a language-independent repetition detector and expand its capabilities. In the development and operation of the language-independent incremental repeater detector, it was decided to conduct experiments for five open source systems for evaluation using the industrial detector SIG (Software Improvement Group), including the use of a tool syntactic analysis. But there was the question of extending the algorithm for additional detection of duplication and redundancy in the code, which was proposed by Hammel, and how improvements can be made to achieve independence from the programming language. Particular attention was paid to the empirical results presented in the original study, as their effectiveness is questionable. The main parameters that were considered when creating the index for LIIRD (Language-independent incremental repeat detector) and its expansion of the LSH (locally sensitive hashing): measuring time, memory and creating an incremental step. Based on the results of experiments conducted by the authors of Hammel’s work, there was a motivation to develop an extended approach. The idea of this approach is that according to the original study, the operation of calculating the entire block index with repeats and redundancy from scratch is very time consuming. Therefore, it is proposed to use LSH to obtain an effective assessment of the similarity of software project files.

https://doi.org/10.13052/jcsm2245-1439.123.6

PDF

HTML

References

Benjamin Hummel, ElmarJuergens, Lars Heinemann, and Michael Conradt. Indexbased code clone detection: incrementtal, distributed, scalable. In 2010 IEEE International Conference on Software Maintenance, pages 1–9. IEEE, 2010.

Indyk Piotr, MotwaniRajeev. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 604–613, 1998.

LeskovecJure, RajaramanAnand, UllmanJeffrey David. Mining of Massive Datasets. Cambridge University Press, USA, 2nd edition, 2014. ISBN 1107077230.

Pravorska N.I., Barmak O.V., Medzatiy D.M., Shestakevych T.V. The process of detecting blocks with repetitions and redundancy when using a language-independent incremental detector. KHNU Bulletin, Technical Sciences series, 3, 2021, pp. 39–45.

Pravorska N.I., Bedratyuk L.P, Forkun Y.V. Yashina O.M. Language-independent detector for detection and elimination of repetitions and redundancies of the program code. Measuring and computing equipment in technological processes. – Khmelnytskyi, 2021. 1, pp. 56–61.

ZhouWei, HuJiankun, WangSong. Enhanced locality-sensitive hashing for fingerprint forensics over large multi-sensor databases. IEEE Transactions on Big Data, 2017.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Downloads

Download data is not yet available.

Additional Detection of Clones Using Locally Sensitive Hashing

Keywords

How to Cite

Download Citation

Abstract

References

Downloads