A high-performance audio processing library with built-in performance monitoring, using Essentia C++ library.
Ensure you have Python 3.12.6 installed. Install dependencies using pip:
pip install -r requirements.txt
- This project uses
essentia
for audio processing and employs multiprocessing for performance optimization.
Requirements:
- Python 3.12.6
- For Apple Silicon users:
xcode-select --install
may be required
pip install git+ssh://git@github.com/parsasabetz/audiopro_essentia.git
# Analyze all features (creates output.msgpack)
python -m audiopro input.wav output
# Analyze specific features only (computes only the selected features):
python -m audiopro input.wav output --features rms spectral_centroid mfcc
# To compute all features, either omit --features or pass an empty list:
python -m audiopro input.wav output --features
# Output as JSON (creates output.json)
python -m audiopro input.wav output --format json
# Skip performance monitoring
python -m audiopro input.wav output --skip-monitoring
# Enable gzip compression for msgpack output
python -m audiopro input.wav output --gzip
# INCORRECT - don't include extension in output name
❌ python -m audiopro input.wav output.json # This will raise an error
# CORRECT - specify format with --format flag
✓ python -m audiopro input.wav output --format json
To analyze only a specific range of the input audio file, use the --start
and --end
arguments:
python -m audiopro input.wav output --start 10.0 --end 20.0
This will process the audio from 10 seconds to 20 seconds. Omitting --end
will process until the end of the file:
python -m audiopro input.wav output --start 10.0
import asyncio
from audiopro import analyze_audio, FeatureConfig
# Basic usage - analyze all features
await analyze_audio(
file_path="input.wav",
output_path="output", # No extension needed
output_format="msgpack" # This determines the extension (.msgpack)
)
# Analyze specific features only.
# Note: Passing an empty dictionary (or one with all False values) computes all features.
feature_config: FeatureConfig = {
"rms": True,
"spectral_centroid": True,
"mfcc": True,
# Features not included or set to False will be excluded.
# To compute all features, simply pass None or an empty dict.
}
await analyze_audio(
file_path="input.wav",
output_path="output", # No extension needed
output_format="json", # This determines the extension (.json)
feature_config=feature_config
time_range={"start": 40, "end": 50}, # Analyze only the range from 40 to 50 seconds
)
# Batch processing multiple files
from audiopro.batch import batch_process_audio
results = await batch_process_audio(
input_files=["file1.wav", "file2.mp3"],
output_dir="output", # Directory name only
feature_config=feature_config,
max_concurrent=2 # Process 2 files at a time
)
The library supports two output formats:
msgpack
(default): Binary MessagePack format for efficient storage (creates.msgpack
files)json
: Human-readable JSON format (creates.json
files)
Important: Never include the file extension in the output path. Instead:
- Provide the output path without extension
- Use
output_format
or--format
to specify the format - The library will automatically add the correct extension
Note: Gzip compression applies only to the msgpack format.
For example:
# INCORRECT ❌
await analyze_audio(
file_path="input.wav",
output_path="output.json", # Don't include extension
output_format="json"
)
# CORRECT ✓
await analyze_audio(
file_path="input.wav",
output_path="output", # No extension
output_format="json" # This determines the extension
)
You can selectively enable/disable any of these features:
rms
: Root Mean Square energy valuevolume
: Volume level in decibels (dBFS), computed as20 * log10(rms)
spectral_centroid
: Weighted mean of frequenciesspectral_bandwidth
: Variance of frequencies around the centroidspectral_flatness
: Measure of how noise-like the signal isspectral_rolloff
: Frequency below which most spectral energy existszero_crossing_rate
: Rate of signal polarity changesmfcc
: Mel-frequency cepstral coefficients (13 values)frequency_bands
: Energy in different frequency bandschroma
: Distribution of spectral energy across pitch classes
Feature selection can be done in three ways:
- Pass
None
asfeature_config
to compute all features (default behavior) - Include only the features you want with
True
values - Explicitly disable features with
False
values (optional)
Important:
- The
feature_config
argument controls which features to compute.- If you pass
None
, the analysis computes all features. - If you pass a dictionary with selected features enabled, only those will be computed.
- If an empty dictionary (or a dictionary where no feature is enabled) is provided, it will default to computing all features.
- If you pass
- The output includes an
included_features
field that lists which features were computed.- An empty list indicates that all available features were computed.
Example configurations:
from audiopro import FeatureConfig
# Compute all features
feature_config = None // or
feature_config = {}
# Compute only RMS and MFCC
feature_config: FeatureConfig = {
"rms": True,
"mfcc": True
}
# Compute everything _except_ spectral features
feature_config: FeatureConfig = {
"spectral_centroid": False,
"spectral_bandwidth": False,
"spectral_flatness": False,
"spectral_rolloff": False
}
# Compute only volume levels
feature_config: FeatureConfig = {
"volume": True
}
# Compute volume and RMS together
feature_config: FeatureConfig = {
"volume": True,
"rms": True
}
# Compute everything except volume
feature_config: FeatureConfig = {
"volume": False,
# other features set to True...
}
{
"metadata": {
"file_info": {
"filename": str,
"format": str,
"codec": str,
"size_mb": float,
"created_date": str, # Unix timestamp
"mime_type": str, # MIME type of the file (or "unknown" if not determined)
"md5_hash": str
},
"audio_info": {
"duration_seconds": float,
"sample_rate": int,
"bit_rate": int,
"channels": int,
"peak_amplitude": float,
"rms_amplitude": float,
"dynamic_range_db": float,
"quality_metrics": {
"dc_offset": float,
"silence_ratio": float,
"potentially_clipped_samples": int
}
}
},
"tempo": float, # Beats per minute
"beats": [float], # List of beat timestamps in seconds
"included_features": [], // List of enabled features; empty means all were computed.
"features": [
{
"time": float, # Time in milliseconds (always included)
"rms": float, # Optional
"volume": float, # Optional, in dBFS (decibels relative to full scale)
"spectral_centroid": float, # Optional
"spectral_bandwidth": float, # Optional
"spectral_flatness": float, # Optional
"spectral_rolloff": float, # Optional
"zero_crossing_rate": float, # Optional
"frequency_bands": { # Optional
"sub_bass": float, # 20-60 Hz
"bass": float, # 60-250 Hz
"low_mid": float, # 250-500 Hz
"mid": float, # 500-2000 Hz
"upper_mid": float, # 2000-5000 Hz
"treble": float # 5000-20000 Hz
},
"mfcc": [float], # Optional, 13 coefficients
"chroma": [float] # Optional, 12 values representing pitch classes
}
]
}
- Selective Feature Computation: Choose which audio features to compute to optimize processing time and output size
- Audio Analysis: Extract tempo, beats, and spectral features using
essentia
- Performance Monitoring: Built-in CPU and memory usage tracking leveraging multiprocessing
- Multiprocessing: Parallel processing for faster analysis of large audio files
- Flexible Output: Choose between JSON and MessagePack formats
- Resource Efficient: Optimized for large audio files with efficient multiprocessing
- Metadata Extraction: Extract detailed audio metadata (e.g., duration, sample rate)
- File Validation: Check for valid audio files before processing, with detailed error messages, allowing only for files up to 50MB in size.
- Partial Processing: Analyze only a specific range of an audio file. Note that if only
start
is provided, the analysis will continue until the end of the file, similarly if onlyend
is provided, the analysis will start from the beginning of the file. - Batch Processing: Process multiple files concurrently with error handling
audiopro/
└── src/
└── audiopro/
├── __init__.py # Public API for the library
├── main.py # CLI and programmatic entry point
├── arg_parser.py # Command line argument parsing
│
├── analysis/ # Audio analysis module
│ ├── __init__.py
│ └── controller.py # Core analysis logic
│
├── audio/ # Audio processing modules
│ ├── __init__.py
│ ├── audio_loader.py # Audio file loading
│ ├── extractor.py # Feature extraction
│ ├── metadata.py # File metadata extraction
│ ├── models.py # Data model definitions
│ ├── processors.py # Frame feature extraction
│ └── validator.py # Audio validation utilities
│
├── errors/ # Custom exception definitions
│ ├── __init__.py
│ ├── exceptions.py # Audio processing exceptions
│ └── tracking.py # Error tracking utilities
│
├── output/ # Output handling
│ ├── __init__.py
│ ├── output_handler.py # Result file writing
│ ├── modules.py # JSON/MessagePack serialization
│ └── types.py # Output type definitions
│
├── monitor/ # Performance monitoring
│ ├── __init__.py
│ ├── monitor.py # CPU/memory monitoring
│ └── loader.py # Monitor function loading
│
└── utils/ # Utility modules
├── __init__.py
├── audio.py # Audio processing utilities
├── logger.py # Logging configuration
├── path.py # File path handling
├── process.py # Process management
└── constants.py # Configuration constants
└── utils.py # General utilities
git clone git@github.com:parsasabetz/audiopro_essentia.git
cd audiopro
python -m venv venv
source venv/bin/activate
pip install -e .
Follow the Commitizen format:
<type>(<scope>): <subject>
Make commits using:
git cz
Version bumping:
cz bump
This project is closed source. All rights reserved.