Hadoop-Duplicate-Files-Finder

A Python script to count and list the files and duplicate files statistics in an Hadoop Distributed File System Directory. This script also works for files having different names but the same content.

This is a recursive script, hence it is capable of running through the inner branches of the given directory.

Usage:python dir_dedup.py

Output Format:[<NAMES AND DIRECTORIES OF FILES IN LIST FORMAT(multiple items if duplicates are present)>],

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
dir_dedup02.py		dir_dedup02.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop-Duplicate-Files-Finder

About

Releases

Packages

Languages

License

Yoga07/Hadoop-Duplicate-Files-Finder

Folders and files

Latest commit

History

Repository files navigation

Hadoop-Duplicate-Files-Finder

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages