Skip to content
/ dq Public

A command-line tool for querying CSV, JSON, and Avro files using pipe-based syntax

Notifications You must be signed in to change notification settings

razeghi71/dq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dq

Query CSV, JSON, Avro, and Parquet files from the command line. Pipe operations together, like jq but for tables.

dq 'users.csv | filter { age > 25 } | select name city | sorta name'
dq -o csv 'users.json | filter { age > 25 }' > filtered.csv

Install

Homebrew:

brew tap razeghi71/tap
brew install dq

Go:

go install github.com/razeghi71/dq/cmd/dq@latest

Or build from source:

git clone https://github.com/razeghi71/dq.git
cd dq
go build -ldflags="-s -w" -o dq ./cmd/dq

How It Works

Every query starts with a file and pipes it through operations. Each operation takes a table in and returns a table out.

dq 'file.csv | operation1 | operation2 | ...'

Wrap queries in single quotes so your shell doesn't interpret |, {, }, or >.

Operations

head / tail - Get rows from the start or end

dq 'users.csv | head'          # first 10 rows (default)
dq 'users.csv | head 20'       # first 20 rows
dq 'users.csv | tail'          # last 10 rows (default)
dq 'users.csv | tail 5'        # last 5 rows

select - Keep only the columns you want

dq 'users.csv | select name age city'
dq 'data.json | select name address.city'        # nested field -> column "address_city"

remove - Drop columns you don't need

dq 'users.csv | remove password ssn'

filter - Keep rows that match a condition

Expressions go inside { }. Use == for equality, and/or for logic.

dq 'users.csv | filter { age > 25 }'
dq 'users.csv | filter { age > 25 and city == "NY" }'
dq 'users.csv | filter { email is not null }'
dq 'data.json | filter { address.city == "NY" }'    # nested field access

sorta / sortd - Sort rows ascending or descending

dq 'users.csv | sorta age'          # youngest first
dq 'users.csv | sortd age'          # oldest first
dq 'users.csv | sorta city age'     # sort by city, then age

count - Count how many rows

dq 'users.csv | count'
dq 'users.csv | filter { age > 30 } | count'

distinct - Remove duplicate rows

dq 'users.csv | distinct'             # unique rows
dq 'users.csv | distinct city'        # unique cities (keeps first occurrence)
dq 'users.csv | distinct city age'    # unique city+age combinations

rename - Rename columns

Names are paired: old then new. Use backticks for column names with spaces.

dq 'users.csv | rename name username'
dq 'users.csv | rename `first name` first_name `last name` last_name'

transform - Create or overwrite columns with computed values

Assignments are comma-separated. Works row by row.

dq 'users.csv | transform age2 = age * 2'
dq 'users.csv | transform name = upper(name), age_months = age * 12'
dq 'sales.csv | transform total = coalesce(quantity, 0) * coalesce(price, 0)'

group - Group rows by column values

Collects rows that share the same value(s) into groups. The non-grouped columns are nested into a column called grouped (or a custom name with as).

dq 'users.csv | group city'
city | grouped
---- | -------------------------
NY   | [ {name:alice,age:30,city:NY}, {name:bob,age:25,city:NY} ]
LA   | [ {name:carol,age:28,city:LA} ]

All original columns (including group keys) are preserved in the nested records.

dq 'users.csv | group city as people'       # custom nested column name
dq 'users.csv | group city department'       # group by multiple columns
dq 'data.json | group address.city'          # group by nested field -> key column "address_city"

reduce - Aggregate over grouped rows

Runs aggregation functions (sum, avg, count, etc.) over the nested rows created by group. The nested column is kept after reduction -- use remove to drop it.

dq 'users.csv | group city | reduce avg_age = avg(age), n = count()'
city | grouped                                                | avg_age | n
---- | ------------------------------------------------------ | ------- | -
NY   | [ {name:alice,age:30,city:NY}, {name:bob,age:25,city:NY} ] | 27.5    | 2
LA   | [ {name:carol,age:28,city:LA} ]                        | 28      | 1

By default reduce operates on the column named grouped. If you used a custom name with group ... as, or if you have a pre-existing list column (e.g. from a Parquet/JSON file), pass the column name as the first argument:

dq 'users.csv | group city as people | reduce people avg_age = avg(age)'
dq 'orders.parquet | reduce orders total = sum(amount), n = count()'

group + reduce - Putting them together

# average age per city, clean output
dq 'users.csv | group city | reduce avg_age = avg(age) | remove grouped'

# total revenue per category, top 3
dq 'sales.csv | group category | reduce total = sum(price), n = count() | remove grouped | sortd total | head 3'

Functions

For reduce (aggregate across rows): count(), sum(col), avg(col), min(col), max(col), first(col), last(col)

For transform (compute per row): upper(s), lower(s), len(s), trim(s), substr(s, start, len), coalesce(a, b, ...), if(cond, then, else), year(d), month(d), day(d)

Operators (work everywhere): +, -, *, /, ==, !=, <, >, <=, >=, and, or, not

Nested Fields

JSON, Avro, and Parquet files can contain nested records. Use dot notation to access sub-fields in filter, transform, select, and group:

dq 'data.json | filter { address.city == "Chicago" }'
dq 'data.json | transform city = address.city | select name city'
dq 'data.json | filter { profile.stats.logins > 10 }'
dq 'data.json | select name address.city'                          # -> columns: name, address_city
dq 'data.json | group address.city | reduce n = count() | remove grouped'

Dot paths in select and group flatten to underscore-separated column names (e.g., address.city becomes address_city). If a column with that name already exists, a numeric suffix is added (address_city_2).

Missing sub-fields return null.

Output Formats

By default dq prints a pretty ASCII table. Use -o to change the output format:

dq 'users.csv | select name age'                        # table (default)
dq -o csv  'users.csv | select name age' > out.csv      # CSV
dq -o json 'users.csv | select name age' > out.json     # JSON array of objects
dq -o jsonl 'users.csv | select name age' > out.jsonl   # one JSON object per line
Format Flag Notes
table default Pretty-printed ASCII table
csv -o csv Standard CSV. Nulls render as empty strings.
json -o json JSON array. Preserves types (ints, bools, nulls, nested).
jsonl -o jsonl One JSON object per line. Same type preservation as JSON.

Supported Input Formats

CSV (.csv), JSON (.json), JSONL (.jsonl), Avro (.avro), Parquet (.parquet)

License

MIT

About

A command-line tool for querying CSV, JSON, and Avro files using pipe-based syntax

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages