Skip to content

A Python script to count and list the duplicate files in an Hadoop Distributed File System. This script also works for files having different names but the same content.

License

Notifications You must be signed in to change notification settings

Yoga07/Hadoop-Duplicate-Files-Finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Hadoop-Duplicate-Files-Finder

A Python script to count and list the files and duplicate files statistics in an Hadoop Distributed File System Directory. This script also works for files having different names but the same content.

This is a recursive script, hence it is capable of running through the inner branches of the given directory.

Usage:python dir_dedup.py

Output Format:[<NAMES AND DIRECTORIES OF FILES IN LIST FORMAT(multiple items if duplicates are present)>],

About

A Python script to count and list the duplicate files in an Hadoop Distributed File System. This script also works for files having different names but the same content.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages