A cli tool helps you to de-duplicate images in a folder.
Suppose you enjoy collecting images and intensively gather a large number of them from various websites to your hard drive every day. Soon, your hard drive accumulates tens of thousands of CG images with different sources and resolutions.
However, you discover that a significant portion of these downloaded CG images are duplicates, occupying precious disk space. Therefore, you urgently need a fast method to identify and remove duplicates from your CG directory. Since the sources of these downloads vary, traditional methods such as searching by file names are clearly inadequate for solving this problem; it's necessary to analyze the content of the images.
Run the following command to install this tool:
dotnet tool install --global Aiursoft.NiBot
After getting the binary, run it directly in the terminal.
$ NiBot dedup
Option '--path' is required.
Description:
De-duplicate images in a folder.
Usage:
nibot dedup [options]
Options:
-p, --path <path> (REQUIRED) Path of the folder to dedup.
-ds, --duplicate-similar <duplicate-similar> Similarity bar. This value means two image are considered as duplicates if their similarity is greater than it. Setting too small may cause different images to be considered as
duplicates. Suggested values: [96-100] [default: 96]
-r, --recursive Recursively search for similar images in subdirectories. [default: False]
-k, --keep <keep> Preference for sorting images by quality to determine which to keep when duplicates are found. Available options:
Colorful|GrayScale|Newest|Oldest|Smallest|Largest|HighestResolution|LowestResolution. [default: Colorful|HighestResolution|Largest|Newest]
-a, --action <Delete|MoveToTrash|Nothing> Action to take when duplicates are found. Available options: Nothing, Delete, MoveToTrash. [default: MoveToTrash]
-y, --yes No interactive mode. Taking action without asking for confirmation. [default: False]
-e, --extensions <extensions> Extensions of files to dedup. [default: jpg|jpeg|png|jfif]
-t, --threads <threads> Number of threads to use for image indexing. Default is 32. [default: 32]
-v, --verbose Show detailed log
-?, -h, --help Show help and usage information
It will fetch all images in the folder and compare them with each other. If two images are similar enough, it will consider them as duplicates.
It will pick the best one based the --keep
option. If the --action
is set to Delete
, it will delete the rest of the duplicates. If the --action
is set to MoveToTrash
, it will move the rest of the duplicates to the trash.
With the --interactive
option, it will preview each photo and ask for confirmation before deleting files.
You can also install this tool as a class library.
dotnet add package Aiursoft.NiBot.Core
Then you can use the DedupEngine
class to de-duplicate images in your own code.
services.AddLogging(builder =>
{
builder.AddConsole();
builder.AddDebug();
});
services.AddTransient<DedupEngine>();
services.AddTransient<ImageHasher>();
services.AddTransient<BestPhotoSelector>();
services.AddTransient<FilesHelper>();
services.AddTaskCanon();
var sp = services.BuildServiceProvider();
var dedupEngine = sp.GetRequiredService<DedupEngine>();
That's it!
There are many ways to contribute to the project: logging bugs, submitting pull requests, reporting issues, and creating suggestions.
Even if you with push rights on the repository, you should create a personal fork and create feature branches there when you need them. This keeps the main repository clean and your workflow cruft out of sight.
We're also interested in your feedback on the future of this project. You can submit a suggestion or feature request through the issue tracker. To make this process more effective, we're asking that these include more information to help define them more clearly.