Skip to content

numblr/audiohash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Audio Hash Tools

A collection of simple bash scripts to compute MD5 hashes of audio content only, ignoring metadata and container formats. These tools are useful for detecting duplicate audio files, verifying audio integrity across different formats, and comparing audio content regardless of tags or file format.

Features

  • Content-only hashing: Ignores ID3 tags, album art, and other metadata
  • Format-agnostic: Compare MP3, FLAC, WAV, M4A, OGG, and any other format supported by ffmpeg
  • Stream selection: Choose specific audio streams in multi-track files
  • Fast operation: Uses ffmpeg's native MD5 muxer for efficient hashing
  • Multiple output modes: Verbose, quiet, and hash-only modes for different use cases

Use Cases

  • Finding duplicate music files across different formats (e.g., MP3 vs FLAC of the same album)
  • Verifying audio conversions to ensure no quality loss or audio alterations
  • Detecting re-tagged files where metadata changed but audio content is identical
  • Batch processing to identify duplicate audio content in large libraries
  • Quality assurance for audio processing pipelines
  • Archival verification to ensure audio backups are bit-identical

Requirements

  • ffmpeg must be installed and available in your PATH
  • Bash shell (standard on macOS and Linux)

Installing ffmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Other systems: See ffmpeg.org

Installation

  1. Clone this repository or download the scripts:
git clone https://github.com/yourusername/audio_hash.git
cd audio_hash
  1. Make the scripts executable:
chmod +x audio_md5.sh audio_compare.sh
  1. (Optional) Add to your PATH for easy access:
# Add to ~/.bashrc or ~/.zshrc
export PATH="$PATH:/path/to/audio_hash"

Usage

audio_md5.sh - Compute Audio Hash

Calculate the MD5 hash of an audio file's content (ignoring metadata).

Basic usage:

./audio_md5.sh song.mp3
# Output: MD5=a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6

Options:

-h, --help          Show help message
-v, --verbose       Show ffmpeg output and processing details
-q, --quiet         Only output the hash value (no MD5= prefix)
-s, --stream NUM    Select specific audio stream (default: all audio streams)

Examples:

# Get hash in machine-readable format (hash only)
./audio_md5.sh -q song.mp3
# Output: a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6

# Debug processing with verbose output
./audio_md5.sh -v song.mp3

# Compare specific audio track in a multi-track file
./audio_md5.sh -s 0 concert_recording.mkv

# Batch process all MP3 files
for file in *.mp3; do
    echo "$file: $(./audio_md5.sh -q "$file")"
done

audio_compare.sh - Compare Two Audio Files

Compare the audio content of two files by computing and comparing their MD5 hashes.

Basic usage:

./audio_compare.sh original.mp3 copy.flac
# Output:
# MD5=a1b2c3d4... (original.mp3)
# MD5=a1b2c3d4... (copy.flac)
#
# ✓ Audio content is IDENTICAL

Options:

-h, --help          Show help message
-v, --verbose       Show ffmpeg output and processing details
-q, --quiet         Only show match/no-match via exit code (0=match, 1=different)
-s, --stream NUM    Select specific audio stream (default: all audio streams)
--hash-only         Only show hashes without comparison result

Examples:

# Use in conditional statements (quiet mode)
if ./audio_compare.sh -q original.mp3 converted.flac; then
    echo "Conversion was lossless"
else
    echo "Audio differs - conversion may have altered content"
fi

# Show only hashes without comparison
./audio_compare.sh --hash-only file1.wav file2.mp3

# Compare with verbose output for debugging
./audio_compare.sh -v track1.m4a track2.ogg

# Compare specific audio streams in multi-track files
./audio_compare.sh -s 0 video1.mkv video2.mkv

How It Works

These scripts use ffmpeg's MD5 muxer to compute a hash of the raw audio stream data:

ffmpeg -i input.mp3 -map 0:a -c copy -f md5 -

This command:

  1. Reads the input file (-i input.mp3)
  2. Selects all audio streams (-map 0:a)
  3. Copies the audio without re-encoding (-c copy)
  4. Outputs in MD5 format (-f md5)
  5. Writes to stdout (-)

The hash is computed on the audio bitstream itself, not the file container or metadata, making it format-agnostic.

Limitations

  • Encoding matters: An MP3 and a WAV of the same source will have different hashes because the audio data itself is encoded differently. These scripts detect if two files contain bit-identical audio streams, not if they sound the same.
  • Lossy conversions: Converting between lossy formats (e.g., MP3 to OGG) will always result in different hashes even if they sound similar.
  • Sample rate/bit depth: Files with different sample rates or bit depths will have different hashes.
  • Multi-channel layout: Changes in channel mapping may result in different hashes even if the audio content is the same.

These tools are best used for:

  • Comparing files in the same format
  • Verifying exact copies or lossless conversions
  • Detecting duplicate files that have been re-tagged or moved between containers

Advanced Examples

Find All Duplicate Audio Files in a Directory

#!/bin/bash
# find_audio_duplicates.sh

declare -A hashes

for file in *.{mp3,flac,m4a,wav,ogg}; do
    [ -f "$file" ] || continue
    hash=$(./audio_md5.sh -q "$file" 2>/dev/null)
    if [ -n "$hash" ]; then
        if [ -n "${hashes[$hash]}" ]; then
            echo "Duplicate found:"
            echo "  Original: ${hashes[$hash]}"
            echo "  Duplicate: $file"
        else
            hashes[$hash]="$file"
        fi
    fi
done

Verify a Batch Conversion

#!/bin/bash
# verify_conversion.sh - Compare original FLACs with converted MP3s

for flac in *.flac; do
    mp3="${flac%.flac}.mp3"
    if [ -f "$mp3" ]; then
        if ./audio_compare.sh -q "$flac" "$mp3"; then
            echo "$flac -> $mp3 [OK]"
        else
            echo "$flac -> $mp3 [DIFFERENT]"
        fi
    fi
done

Create a Hash Database

#!/bin/bash
# create_hash_database.sh - Create a database of audio hashes

output="audio_hashes.txt"
> "$output"

find . -type f \( -name "*.mp3" -o -name "*.flac" -o -name "*.m4a" \) | while read file; do
    hash=$(./audio_md5.sh -q "$file" 2>/dev/null)
    if [ -n "$hash" ]; then
        echo "$hash  $file" >> "$output"
    fi
done

echo "Hash database created: $output"

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - feel free to use these scripts for any purpose.

Credits

Created using ffmpeg's powerful multimedia processing capabilities.

About

Audio-only comparison and hashing for audio files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages