問題描述
我正在尋找一種工具來查找 Java/Javascript 的重復或相似代碼.我無法說出similar"的確切定義,但我希望該工具足夠智能并給我一些重構代碼的建議,例如,
(1) A 類和 B 類具有相似的方法(例如,有 5 個方法具有相同的方法名稱、參數和相似的實現出現在兩個類中),那么應該建議將這些相似的方法移到基類中.
(2) 類 A 多次在不同的地方有相似的代碼行,工具應該建議將這些相似的代碼行移動到一個方法中.
我試過
[表格來自:Roy、Cordy、Koschke:代碼克隆檢測技術和工具的比較和評估:一種定性方法,計算機編程科學,第 74 卷第 7 期,2009 年 5 月.本文概述了許多不同的克隆檢測方法并評估其有效性.]
[PMD 未列出,但顯然使用 Rabin-Karp 字符串匹配,基于文本"根據上表,而不是AST匹配.]
關于 OP 的要求:
CloneDR(實際上我不知道任何工具)不會在多個方法中找到一組相似的方法,如果這些方法在不同的類中以不同的順序出現.在這種情況下,CloneDR 更有可能將單個方法報告為克隆;最終結果是一樣的.如果成員在不同的類中以相同的順序依次出現,它將找到這樣一個集合,就像一個類主體從另一個類主體被批量復制時發生的那樣.
跨多種方法的相似代碼塊很常見.生成的報告顯示了相似代碼塊的相關性,包括代碼的抽象版本,它本質上是方法體所需的參數化代碼塊.
I'm looking for a tool to find duplicate or similar code of Java/Javascript. I can't tell the exact definition of "similar", but I wish the tool is smart enough and give me advices to refactor the code, e.g.,
(1) class A and class B have imilar methods (e.g., there 5 methods have same method name, arguments and similar implementation appearing in both classes), then it should advise to move these similar methods into a base class.
(2) class A has similar code lines at different places multiple times, the tool should advise to move these similar code lines into a single method.
I tried PMD which can find duplicate code lines but it's not clever enough. It did not find out those similar source codes which is widely spreaded in one my projects.
Is there such tool?
Our CloneDR tool finds duplicated code by comparing abstract syntax trees from parsers. (It comes in language-specific versions for many languages, including Java and JavaScript).
This means it can find cloned code in spite of format changes and modifications of the body of the clone, both of which are often done while cloning. Found clones match language concepts such as expression, declaration, statements, functions, and even classes. Clones that are similar are reported along with the differences/variation points as proposed parameters.
It can find clone sets with multiple instances (we've some applications with hundreds of clones of a single bit of code), and it can find clones across many source files.
It produces HTML reports that are directly readable by people, and XML reports that can be processed by other downstream tools. (You can see some sample HTML reports via the link).
Similarity is hard to define, and in fact you can define it in many ways. CloneDR defines it as the ratio of identical elements (technically, AST nodes) across a clone set divided by the total number of elements across the clone set. This ratio is a value between 0 and 1. It is compared against a threshold; we've found that 95% is surprisingly robust as threshold in terms of the quality of reported clones.
It is useful to establish a minimum size for interesting clones. a*b
is a clone of x*y
(with 2 parameters) but isn't useful to report because it is too small. CloneDR also uses a size threshold which we call "line count", but in fact is the size of the clone in elements divided by the average number of elements per line across the entire code base. This produces clones which usually have more lines than the threshold, but it will find clones for enormous expressions that are within a line. We've found that 5-6 "lines" is also fairly robust in terms of reported clone quality.
This table shows how effective the AST matching approach of CloneDR is compared to many other clone detection tools (ranking it "very well"). The only one that comes close is CCDIML …. which is an academic re-implementation of the CloneDR approach. There are other approaches (namely PDG-based approaches) which can detect clones that are scattered about more effectively, but in practice, in my personal experience, people that clone code don’t usually cut the cloned part into a bunch of separate parts to scatter them about; they are just too lazy. YMMV.
[Table from: Roy, Cordy, Koschke: Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach , Science of Computer Programming, Volume 74 Issue 7, May, 2009. This paper sketches many different clone detection approaches and evaluates their effectiveness.]
[PMD isn't listed, but apparantly using Rabin-Karp string matching, "text based" according to the above table, rather than AST matching.]
Re OP's requirements:
CloneDR (and in fact no tool I know) will NOT find a set of similar methods across multiple methods, if those methods occur in different orders in different classes. In this case, CloneDR is more likely to report the individual methods as clones; the net result is the same. It will find such a set if the members occur sequentially in the same order in the different classes, as happens when one class body has been wholesale copied from another.
Similar code blocks across multiple methods is quite commonly detected. The generated report shows how the the similar code blocks are related, including an abstracted version of the code which is essentially the parameterized code block you need for a method body.
這篇關于自動重構工具來查找類似的 Java/Javascript 重復源代碼?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!