A tool similar to simian http://www.harukizaemon.com/simian/ which is designed to identify duplicate code inside a project. It is however open source.
Licensed under GNU Affero General Public License 3.0.
Using dcd
commercially? If you want priority support for dcd
you can purchase a years worth https://boyter.gumroad.com/l/wajuc which entitles you to priority direct email support from the developer.
If you are comfortable using Go and have >= 1.19 installed:
go install github.com/boyter/dcd@latest
Binaries for GNU/Linux and macOS for both i386 and x86_64 and ARM64 machines are available from the releases page.
Why use dcd
?
- It's reasonably fast and works with large projects
- Works very well across multiple platforms without slowdown (GNU/Linux, macOS)
Command line usage of dcd
is designed to be as simple as possible.
Full details can be found in dcd --help
or dcd -h
. Note that the below reflects the state of master not a release.
$ dcd -h
dcd
Version 1.1.0
Ben Boyter <ben@boyter.org>
Usage:
dcd [flags]
Flags:
-x, --exclude-pattern strings file and directory locations matching case sensitive patterns will be ignored [comma separated list: e.g. vendor,_test.go]
-f, --fuzz uint8 fuzzy value where higher numbers allow increasingly fuzzy lines to match, values 0-255 where 0 indicates exact match
-h, --help help for dcd
-i, --include-ext strings limit to file extensions [comma separated list: e.g. go,java,js]
-m, --match-length int min match length (default 6)
--max-read-size-bytes int number of bytes to read into a file with the remaining content ignored (default 10000000)
--min-line-length int number of bytes per average line for file to be considered minified (default 255)
--no-gitignore disables .gitignore file logic
--no-ignore disables .ignore file logic
--process-same-file
-v, --verbose verbose output
--version version for dcd
Output should look something like the below for any project
$ dcd
Found duplicate lines in processor/cocomo_test.go:
lines 0-8 match 0-8 in processor/workers_tokei_test.go (length 8)
Found duplicate lines in processor/cocomo_test.go:
lines 0-8 match 0-8 in processor/detector_test.go (length 8)
Found duplicate lines in processor/cocomo_test.go:
lines 0-6 match 0-6 in processor/helpers_test.go (length 6)
Found duplicate lines in processor/detector_test.go:
lines 0-8 match 0-8 in processor/processor_test.go (length 8)
Found duplicate lines in processor/detector_test.go:
lines 0-8 match 0-8 in processor/workers_tokei_test.go (length 8)
Found duplicate lines in processor/detector_test.go:
lines 0-8 match 0-8 in processor/cocomo_test.go (length 8)
Found duplicate lines in processor/detector_test.go:
lines 0-6 match 0-6 in processor/helpers_test.go (length 6)
Found duplicate lines in processor/detector_test.go:
lines 0-8 match 2-10 in processor/processor_unix_test.go (length 8)
Found duplicate lines in processor/filereader.go:
lines 0-7 match 0-7 in processor/workers.go (length 7)
Found duplicate lines in processor/filereader.go:
lines 0-6 match 0-6 in processor/formatters.go (length 6)
>> SNIP <<
Found 98634 duplicate lines in 140 files
Note that you don't have to specify the directory you want to run against. Running dcd
will assume you want to
run against the current directory.
dcd
mostly supports .ignore files inside directories that it scans. This is similar to how ripgrep, ag and tokei work.
.ignore files are 100% the same as .gitignore files with the same syntax, and as such dcd
will ignore files and directories
listed in them. You can add .ignore files to ignore things like vendored dependency checked in files and such.
The idea is allowing you to add a file or folder to git and have ignored in the count.
If you want to hack away feel free! PR's are generally accepted.
The below produces all the packages for binary releases.
GOOS=darwin GOARCH=amd64 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-x86_64-apple-darwin.zip dcd
GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-arm64-apple-darwin.zip dcd
GOOS=windows GOARCH=amd64 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-x86_64-pc-windows.zip dcd.exe
GOOS=windows GOARCH=386 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-i386-pc-windows.zip dcd.exe
GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-x86_64-unknown-linux.zip dcd
GOOS=linux GOARCH=386 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-i386-unknown-linux.zip dcd
GOOS=linux GOARCH=arm64 go build -ldflags="-s -w" && zip -r9 dcd-1.0.0-arm64-unknown-linux.zip dcd
Some of the ideas for detection taken from this paper https://ieeexplore.ieee.org/document/792593