A command-line tool for comparing two data sources (CSV/JSONL) and identifying differences.
Compare two data files (supports CSV and JSONL formats):
uv run data_diff source.csv target.jsonl- Compare specific columns using a mapping file:
uv run data_diff --mapping column-map.json source.csv target.csv- Specify ID columns for matching records:
uv run data_diff --id-columns id,email source.csv target.csv- Output differences to a file:
uv run data_diff --output diff-report.txt source.csv target.csv- Show all available options:
uv run data_diff --helpThe mapping file (JSON) specifies how columns correspond between files:
{
"source_column1": "target_column1",
"source_column2": "target_column2"
}The tool will show:
- Added records (in target but not source)
- Removed records (in source but not target)
- Modified records (matching IDs but different values)
- Summary statistics of differences found
Run tests:
uv run pytest