Skip to content

Commit 0d44021

Browse files
committed
Update readme description of new feature
1 parent 89d6505 commit 0d44021

File tree

1 file changed

+25
-13
lines changed

1 file changed

+25
-13
lines changed

README.md

+25-13
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Duplicate Image Finder
22

3-
This Python script finds duplicate images using a perspective hash (pHash) to compare images. pHash ignores the image size and file size and instead creates a hash based on the pixels of the image. This allows you to find duplicate pictures that have been rotated, have changed metadata, and edited.
3+
This Python script finds duplicate images using a [perspective hash (pHash)](http://www.phash.org) to compare images. pHash ignores the image size and file size and instead creates a hash based on the pixels of the image. This allows you to find duplicate pictures that have been rotated, have changed metadata, and edited.
44

5-
This script hashes images added to it, storing the hash into a database. To find duplicate images, hashes are compared. If the hash is the same between two images, then they are marked as duplicates. A web interface is provided to delete duplicate images easily.
5+
This script hashes images added to it, storing the hash into a database. To find duplicate images, hashes are compared. If the hash is the same between two images, then they are marked as duplicates. A web interface is provided to delete duplicate images easily. If you are feeling lucky, there is an option to automatically delete duplicate files.
66

7-
As a word of caution, pHash is not perfect. I have found that duplicate pictures sometimes have different hashes and similar pictures have the same hash. This script is a great starting point for cleaning your photo library of duplicate pictures, but make sure you look at the pictures before you delete them.
7+
As a word of caution, pHash is not perfect. I have found that duplicate pictures sometimes have different hashes and similar (but not the same) pictures have the same hash. This script is a great starting point for cleaning your photo library of duplicate pictures, but make sure you look at the pictures before you delete them. You have been warned! I hold no responsibility for any family memories that might be lost because of this script.
88

99
This script has only been tested with Python 3 and is still pretty rough around the edges. Use at your own risk.
1010

@@ -17,7 +17,6 @@ First, install this script. This can be done by either cloning the repository or
1717
git clone https://github.com/philipbl/duplicate-images.git
1818
```
1919

20-
2120
Next, download all required modules. This script has only been tested with Python 3. I would suggest that you make a virtual environment, setting Python 3 as the default python executable (`mkvirtualenv --python=/usr/local/bin/python3 <name>`)
2221
```
2322
pip install -r requirements.txt
@@ -29,34 +28,42 @@ python duplicate_finder.py
2928
```
3029

3130

32-
3331
## Usage
3432

3533
```
3634
Usage:
37-
duplicate_finder.py add <path> ...
38-
duplicate_finder.py remove <path> ...
39-
duplicate_finder.py clear
40-
duplicate_finder.py show
41-
duplicate_finder.py find [--print] [--match-time] [--trash=<trash_path>]
35+
duplicate_finder.py add <path> ... [--db=<db_path>] [--parallel=<num_processes>]
36+
duplicate_finder.py remove <path> ... [--db=<db_path>]
37+
duplicate_finder.py clear [--db=<db_path>]
38+
duplicate_finder.py show [--db=<db_path>]
39+
duplicate_finder.py find [--print] [--match-time] [--trash=<trash_path>] [--db=<db_path>]
40+
duplicate_finder.py dedup [--confirm] [--match-time] [--trash=<trash_path>]
4241
duplicate_finder.py -h | –-help
4342
4443
Options:
45-
-h, -–help Show this screen
44+
-h, -–help Show this screen
45+
46+
--db=<db_path> The location of the database. (default: ./db)
47+
48+
--parallel=<num_processes> The number of parallel processes to run to hash the image
49+
files (default: 8).
4650
4751
find:
4852
--print Only print duplicate files rather than displaying HTML file
4953
--match-time Adds the extra constraint that duplicate images must have the
5054
same capture times in order to be considered.
5155
--trash=<trash_path> Where files will be put when they are deleted (default: ./Trash)
56+
57+
dedup:
58+
--confirm Confirm you realize this will delete duplicates automatically.
5259
```
5360

5461
### Add
5562
```
5663
python duplicate_finder.py add /path/to/images
5764
```
5865

59-
When a path is added, image files are recursively searched for. In particular, `JPEG`, `PNG`, `GIF`, and `TIFF` images are searched for. Any image files found will be hashed. Adding a path uses 8 processes to hash images in parallel so the CPU usage is very high.
66+
When a path is added, image files are recursively searched for. In particular, `JPEG`, `PNG`, `GIF`, and `TIFF` images are searched for. Any image files found will be hashed. Adding a path uses 8 processes (by default) to hash images in parallel so the CPU usage is very high.
6067

6168
### Remove
6269
```
@@ -84,9 +91,14 @@ Prints the contents database.
8491
python duplicate_finder.py find [--print] [--match-time] [--trash=<trash_path>]
8592
```
8693

87-
Finds duplicate pictures that have been hashed. This will find images that have the same hash stored in the database. There are a few options associated with find. By default, when this command is run, a webpage is displayed showing duplicate pictures and a server is started that allows for the pictures to be deleted (images are not actually deleted, but moved to a trash folder). The first option, `--print`, prints all duplicate pictures and does not display a webpage or start the server. `--match-time` adds the extra constraint that images must have the same EXIF time stamp to be considered duplicate pictures. Last, `--trash=<trash_path>` lets you select a path to where you want files to be put when they are deleted. The trash path must already exist before a image is deleted.
94+
Finds duplicate pictures that have been hashed. This will find images that have the same hash stored in the database. There are a few options associated with `find`. By default, when this command is run, a webpage is displayed showing duplicate pictures and a server is started that allows for the pictures to be deleted (images are not actually deleted, but moved to a trash folder -- I really don't want you to make a mistake). The first option, `--print`, prints all duplicate pictures and does not display a webpage or start the server. `--match-time` adds the extra constraint that images must have the same EXIF time stamp to be considered duplicate pictures. Last, `--trash=<trash_path>` lets you select a path to where you want files to be put when they are deleted. The trash path must already exist before a image is deleted.
8895

96+
### Dedup
97+
```
98+
python duplicate_finder.py dedup [--confirm] [--match-time] [--trash=<trash_path>]
99+
```
89100

101+
Similar to find, except that it deletes any duplicate picture it finds rather than bringing up a webpage. To make sure you really want to do this, you must provide the `--confirm` flag. See `find` for a description of the other options.
90102

91103
## Requirements
92104

0 commit comments

Comments
 (0)