A collection of simple bash scripts to compute MD5 hashes of audio content only, ignoring metadata and container formats. These tools are useful for detecting duplicate audio files, verifying audio integrity across different formats, and comparing audio content regardless of tags or file format.
- Content-only hashing: Ignores ID3 tags, album art, and other metadata
- Format-agnostic: Compare MP3, FLAC, WAV, M4A, OGG, and any other format supported by ffmpeg
- Stream selection: Choose specific audio streams in multi-track files
- Fast operation: Uses ffmpeg's native MD5 muxer for efficient hashing
- Multiple output modes: Verbose, quiet, and hash-only modes for different use cases
- Finding duplicate music files across different formats (e.g., MP3 vs FLAC of the same album)
- Verifying audio conversions to ensure no quality loss or audio alterations
- Detecting re-tagged files where metadata changed but audio content is identical
- Batch processing to identify duplicate audio content in large libraries
- Quality assurance for audio processing pipelines
- Archival verification to ensure audio backups are bit-identical
- ffmpeg must be installed and available in your PATH
- Bash shell (standard on macOS and Linux)
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt-get install ffmpegOther systems: See ffmpeg.org
- Clone this repository or download the scripts:
git clone https://github.com/yourusername/audio_hash.git
cd audio_hash- Make the scripts executable:
chmod +x audio_md5.sh audio_compare.sh- (Optional) Add to your PATH for easy access:
# Add to ~/.bashrc or ~/.zshrc
export PATH="$PATH:/path/to/audio_hash"Calculate the MD5 hash of an audio file's content (ignoring metadata).
Basic usage:
./audio_md5.sh song.mp3
# Output: MD5=a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6Options:
-h, --help Show help message
-v, --verbose Show ffmpeg output and processing details
-q, --quiet Only output the hash value (no MD5= prefix)
-s, --stream NUM Select specific audio stream (default: all audio streams)
Examples:
# Get hash in machine-readable format (hash only)
./audio_md5.sh -q song.mp3
# Output: a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
# Debug processing with verbose output
./audio_md5.sh -v song.mp3
# Compare specific audio track in a multi-track file
./audio_md5.sh -s 0 concert_recording.mkv
# Batch process all MP3 files
for file in *.mp3; do
echo "$file: $(./audio_md5.sh -q "$file")"
doneCompare the audio content of two files by computing and comparing their MD5 hashes.
Basic usage:
./audio_compare.sh original.mp3 copy.flac
# Output:
# MD5=a1b2c3d4... (original.mp3)
# MD5=a1b2c3d4... (copy.flac)
#
# ✓ Audio content is IDENTICALOptions:
-h, --help Show help message
-v, --verbose Show ffmpeg output and processing details
-q, --quiet Only show match/no-match via exit code (0=match, 1=different)
-s, --stream NUM Select specific audio stream (default: all audio streams)
--hash-only Only show hashes without comparison result
Examples:
# Use in conditional statements (quiet mode)
if ./audio_compare.sh -q original.mp3 converted.flac; then
echo "Conversion was lossless"
else
echo "Audio differs - conversion may have altered content"
fi
# Show only hashes without comparison
./audio_compare.sh --hash-only file1.wav file2.mp3
# Compare with verbose output for debugging
./audio_compare.sh -v track1.m4a track2.ogg
# Compare specific audio streams in multi-track files
./audio_compare.sh -s 0 video1.mkv video2.mkvThese scripts use ffmpeg's MD5 muxer to compute a hash of the raw audio stream data:
ffmpeg -i input.mp3 -map 0:a -c copy -f md5 -This command:
- Reads the input file (
-i input.mp3) - Selects all audio streams (
-map 0:a) - Copies the audio without re-encoding (
-c copy) - Outputs in MD5 format (
-f md5) - Writes to stdout (
-)
The hash is computed on the audio bitstream itself, not the file container or metadata, making it format-agnostic.
- Encoding matters: An MP3 and a WAV of the same source will have different hashes because the audio data itself is encoded differently. These scripts detect if two files contain bit-identical audio streams, not if they sound the same.
- Lossy conversions: Converting between lossy formats (e.g., MP3 to OGG) will always result in different hashes even if they sound similar.
- Sample rate/bit depth: Files with different sample rates or bit depths will have different hashes.
- Multi-channel layout: Changes in channel mapping may result in different hashes even if the audio content is the same.
These tools are best used for:
- Comparing files in the same format
- Verifying exact copies or lossless conversions
- Detecting duplicate files that have been re-tagged or moved between containers
#!/bin/bash
# find_audio_duplicates.sh
declare -A hashes
for file in *.{mp3,flac,m4a,wav,ogg}; do
[ -f "$file" ] || continue
hash=$(./audio_md5.sh -q "$file" 2>/dev/null)
if [ -n "$hash" ]; then
if [ -n "${hashes[$hash]}" ]; then
echo "Duplicate found:"
echo " Original: ${hashes[$hash]}"
echo " Duplicate: $file"
else
hashes[$hash]="$file"
fi
fi
done#!/bin/bash
# verify_conversion.sh - Compare original FLACs with converted MP3s
for flac in *.flac; do
mp3="${flac%.flac}.mp3"
if [ -f "$mp3" ]; then
if ./audio_compare.sh -q "$flac" "$mp3"; then
echo "✓ $flac -> $mp3 [OK]"
else
echo "✗ $flac -> $mp3 [DIFFERENT]"
fi
fi
done#!/bin/bash
# create_hash_database.sh - Create a database of audio hashes
output="audio_hashes.txt"
> "$output"
find . -type f \( -name "*.mp3" -o -name "*.flac" -o -name "*.m4a" \) | while read file; do
hash=$(./audio_md5.sh -q "$file" 2>/dev/null)
if [ -n "$hash" ]; then
echo "$hash $file" >> "$output"
fi
done
echo "Hash database created: $output"Contributions are welcome! Please feel free to submit issues or pull requests.
MIT License - feel free to use these scripts for any purpose.
Created using ffmpeg's powerful multimedia processing capabilities.