Software clone detection techniques

Keywords clone detection procedural abstraction refactoring 1 introduction duplicated code arises in software for many reasons. Several studies have been proposed in the literature on software clones from different points of view and covering many correlated features and areas, which are particularly relevant to software maintenance and evolution. We begin with an overall survey based on criteria that capture the main features of detection techniques. Clone detection can tolerate changes in code duplication detection, including not only variable changes, but also statement insertion and deletion. Enhancing program dependency graph based clone detection.

A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Applications of clone detection research to other domains of software engineering and in the same time how other domain can assist clone detection research have also been pointed out. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a tokenbytoken comparison. This obvious gap between the clone detection tools and the clone analysis tools, makes the refactoring and the programmers refactoring the duplicate codes. Code duplication is one of the factors that severely complicates the maintenance and evolution of large software systems. In this paper, we provide a qualitative comparison and evaluation of the current stateoftheart in clone detection techniques and tools, and organize the large amount of information into a coherent conceptual framework. Code clone detection and analysis using software metrics and.

An efficient type 4 clone detection technique for software testing. Code clones have an influence on the difficulty of maintaining code. Code clone detection could be useful for maintenance, reengineering and plagiarism detection. Over the past few years, several software clone detection tools and techniques have been introduced by numerous researchers. Methods of digital image forgery detection digital.

Algorithm, code clone, clone testing, software testing. Its easier to use clone detection doesnt require tedious info such as compiler or build options. With clone detection, developers apply clone removal refactorings e. Also, it will help in the selection of appropriate techniques for detecting and managing clones as. Applications of clone detection research to other domains of software engineering. All tokens of source code are converted back into token sequence lines. Unfortunately, despite a decade of active research, there is a marked lack in clone detectors that scale to large software repositories. Many code clone detection techniques have been proposed 1.

With time, as inspectorclone is used, the number of humanly validated clone pairs will increase in the dataset. Finally, this paper concludes by pointing out several open problems. Research article software clone detection and refactoring. Report by advances in natural and applied sciences. A code clone is a code portion in source files that is identical or similar to another. In this study, code clones, common types of clones, phases of clone detection, the stateofthe art in code clone detection techniques and tools, and challenges faced by clone detection techniques are discussed. Deep learning code fragments for code clone detection ieee. Based on the level of analysis applied to the source code, the techniques can roughly be classi. It detects bugs clone detection is the only tool that can detect copypaste errors. Code clone detection is an important problem for software maintenance and evolution. Science and technology, general cloning methods surveys computer software industry software quality management software industry source code identification and classification web applications. In this paper, we provide a comprehensive survey of the capabilities of currently available clone detection techniques.

Programming clones may prompt bug engendering and genuine support issues. Many approaches consider either structure or identifiers, but none of the existing detection techniques model both sources of information. For different reasons, developers may produce code that is cloned. Clone detection locates exact or similar pieces of code, known as clones, within or between software systems. How detected clones can be removed automatically and areas related to clone detection are discussed in section 7. Substring matching for clone detection and change tracking 1994. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. The research paper published by ijser journal is about the reverse engineering in oriented aspect detection of semantics clones 4 issn 22295518 clone detection techniques aim at finding duplicated code, which may have been adapted slightly from the original. Various intermediate representations and match detection techniques used in clone detection techniquetool are reported. Code clones are copied fragments that occur at different levels of abstraction and may have different origins in a software system.

Iwsc 2018 12th international workshop on software clones. For a comparison and evaluation of code clone detection techniques and tools. Detection and analysis of software clones dissertation directed by professor jugal kalita effective detection of code clones is important for software maintenance. From the discussion it is concluded that clone detection using software metrics and artificial neural network is the best technique of code clone detection, analysis and clone prediction.

Research article software clone detection and refactoring francescaarcellifontana,marcozanoni,andrearanchetti,anddavideranchetti university of milanobicocca, viale sarca, milano, italy correspondence should be addressed to marco zanoni. Using the results of this study one can more easily choose the right tools to meet the. The plagiarists can easily acquire the techniques of clone detection tools, and by studying the tools they can camou. In token based clone detection techniques, firstly, tokens are extracted from the source code by lexical analysis.

With the amount of source code increasing steadily, largescale clone detection has become a necessity. Comparison and evaluation of code clone detection techniques and. These techniques also depend on generic, handcrafted features to. The main focus of these techniques is on the detection of clones. In such techniques, all source code lines are divided into a sequence of tokens during the lexical analysis phase of a compiler. Over the last decade many techniques and tools for software clone detection have been proposed such as textual approaches, lexical approaches, syntactic approaches, semantic approaches, etc. In this paper, we describe our experience on clone detection through three different tools and investigate the impact of clone refactoring on different software. The existing clone detection techniques detect only single granularity clones, such as. For example, the comparison granularity used by the clone detection techniques are discussed including lines, tokens, ast nodes, pdg nodes. For clone detection various techniques and tools had been proposed on the basis of detecting various types of clones.

Empirical evaluation of clone detection tools techniques is presented. Simple clone detection techniques fail to determine the reasons of code. The 8th international workshop on software clones will emphasize clone management in practice, that is, use cases and experiences with clones and clone management in the software lifecycle. Over the last decade many techniques for software clone detection have been proposed. Comparison and evaluation of code clone detection techniques. Several clone detection approaches, techniques and tools are discussed in great details in section 5. In this paper, we provide a comprehensive survey of the capabilities of. Comparative study of software clone detection techniques ieee. And sometimes programmers need to manually understand the clones by the use of clone detection tools, decide how they should be refectories. Software clones are often a result of copying and pasting as an act of adhoc reuse by programmers, and can occur at many levels, from simple statement sequences to blocks, methods, classes, source files, subsystems, models, architectures and entire designs, and in all software artifacts code, models, requirements or architecture documentation, etc. Many code clone detection techniques have been proposed for this purpose. In this thesis, we advance the stateoftheart in clone. In this paper, we describe our experience on clone detection through three different tools and investigate the impact of clone refactoring on different software quality metrics.

Therefore, the software clone detectors may have to deal. Supervised deep features for software functional clone. Comparison and evaluation of code clone detection techniques and tools. The ast nodes are serialized in preorder traversal a suffix tree is created for these serialized ast nodes cut the ast node sequences according to their syntactic region find clones on transformed ast using sequence matching algorithm. Detection of duplicated bugs within a piece of software is challenging, especially when duplica. Oct 22, 2019 we performed preprocessing, indexing, and clone detection for more than 324 billion of loc using a hadoop distributed environment, which is quite faster and more efficient as compared to existing distributed indexing and clone detection techniques. On the analysis of different clone detection techniques, most of the matches tend to be methods functions of 15 lines of code.

Methods of digital image forgery detection digital images and videos have become an essential part of our lives for a long time already. Clone sortstypes, techniques of clones and different procedures. Section v presents various dimensions of clone detection techniques, vi discusses various evaluation metrics and then comparative analysis of the clone detection techniques is performed in section vii. Due to availability of many distinct programming paradigm and languages led to number of clone variants and software clone detection techniques. Iwsc 2019 th international workshop on software clones. Ijca a survey of software clone detection techniques. Software clone detection using cosine distance similarity. To fully unleash the power of deep learning for detecting code. Evaluation of clone detection techniques is discussed in section 6.

Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection. Literature survey roy ck and cordy 1, comparison and evaluation of clone detection techniques and tools in this paper roy and cordy 1 did comparison of different techniques of clone detection such as textual. Plagiarism detection origin analysis and software evolution multiversion program analysis bug detection malicious software detection applications and related research for clone detection. It helps in understanding the clone detection techniques. Transferring codeclone detection and analysis to practice. Source line equality assumes that the cloning process introduced no changes in identifiers, comments, spacing, or other nonsemantic changes, and thus limits clone detection to exact matches. Dec 21, 2012 several studies have been proposed in the literature on software clones from different points of view and covering many correlated features and areas, which are particularly relevant to software maintenance and evolution. These techniques also depend on generic, handcrafted features to represent code fragments.

In this paper, we provide a qualitative comparison and evaluation of the current stateoftheart in clone detection techniques and tools, and organize the large amount of information into a. Many approaches consider ei ther structure or identi. Then some set of tokens at a specific granularity level is formed into a sequence. Various tools techniques for clone detection developed till date are mentioned with their chare of usage. The importance of semantic clone detection and model based clone detection led to different classifications. Analysis of code clone detection using object oriented.

Suffix tree or suffix array based token by token comparison is the heart of token based clone detection algorithms. We performed preprocessing, indexing, and clone detection for more than 324 billion of loc using a hadoop distributed environment, which is quite faster and more efficient as compared to existing distributed indexing and clone detection techniques. Software code clone detection techniques and tools play a major role in improving the software quality as well as saving maintenance cost and effort. Program dependency graph pdg based clone detection techniques have a key advantage over other techniques as they are capable of detecting noncontiguous code clones in addition to contiguous clones. Many approaches consider either structure or identi. A survey of software clone detection techniques semantic scholar. Over the years, many techniques have been proposed in order to minimize or prevent the code cloning problems. What you need is access to the source code of your programs different versions for example, access to the repository.

However, the clone detection is not a task in a closed and static environment. Lexical approaches are also called tokenbased clone detection techniques. These pieces can be statements, blocks of code, functions, classes, or even complete source. Cordya, rainer koschkeb aschool of computing, queens university, canada buniversity of bremen, germany abstract over the last decade many techniques and tools for software clone detection have been proposed. It has a negative impact on code quality and code clones are one of the most frequent problems that may appear in a software project. Clone detection techniques have been proposed with free granularity, mostly with more than six lines of code kamiya et al. Neural detection of semantic code clones via treebased. Nontrivial software clone detection using program dependency. Over the last decade many techniques and tools for software clone detection have been proposed. Given the availability of largescale sourcecode repositories, there have been a large number of applications for clone detection. Scenariobased comparison of clone detection techniques.

In this paper, various types of metric based clone detection approach and techniques are discussed. Cordya survey on software clone detection research. Existing literature about software clones is classified broadly into different categories. Techniques for detecting duplicated code exist but rely mostly on parsers, technology that has proven to be brittle in the face of different languages and dialects.

If we detect software clones it can decrease software maintenance cost. Deep learning code fragments for code clone detection. The existing clone detection techniques can be roughly classi. Nontrivial software clone detection using program dependency graph. Code clones may become an important problem in software development cycle and. The reverse engineering in oriented aspect detection of. Towards automating precision studies of clone detectors.

Iwsc 2020 14th international workshop on software clones. This condition increases significantly software maintenance costs and required effortduration for understanding the code. Overview of clone detection techniques many clone detection approaches have been proposed in the literature. Introduction when a programmer copies and pastes a fragment of code, possibly.

However, several research studies have demonstrated that removal or refactoring of cloned code is sometimes harmful. A survey of software clone detection techniques request pdf. This document describes how to use our incremental clone detection tool iclones to extract clone evolution data from a programs history. Pdf various code clone detection techniques and tools. Over the last decade many techniques and tools for software clone detection have been proposed such as textual approaches, lexical approaches. Several studies have been proposed in the literature on software clones from. Keywords software clone, code clone, duplicated code detection, clone detection 1. Such high tunability of xiao is critical in applying an approach of code clone detection such as xiao to a broad scope of software engineering tasks such as refactoring and bug detection since.

1198 183 1312 746 713 1421 1432 737 973 755 457 31 630 206 11 564 1488 1329 1207 477 407 1169 503 622 926 935 1247 323 1533 1459 827 693 1361 971 1296 1291 1214 949 1219 1415 872 604 707