tsv and/or csv output support #48

johan · 2012-11-20T18:41:59Z

Could we get a command-line flag to spit out tab- or comma-separated values instead of concatenated json objects, csv/tsv-quoting cells as necessary?

It would make jq a real power house for reports and spreadsheets generation.

stedolan · 2012-12-03T20:38:54Z

Hrm. This would definitely be useful, but there are a lot of annoying special cases here. I'm not sure what should happen if the output's not a set of json objects, or if they have different fields. Also, CSV quoting rules are a nightmare, but I suppose I wouldn't have to parse the CSV afterwards...

alexchamberlain · 2012-12-03T20:40:02Z

Just make it Excel compliant...

jdanbrown · 2012-12-20T05:37:17Z

+1

Just require the output to be arrays, and map those arrays directly to tsv/csv. Quoting shouldn't be too hard since you're generating it and not parsing it (as you point out).

jdanbrown · 2012-12-20T05:50:16Z

In the meantime, here's a simple tsv hack that takes advantage of string substitution (and will break if you have any tabs embedded in strings):

$ echo '{"one":1,"two":"x"}' | jq --raw-output '"\(.one)\t\(.two)"'
1       x

Fixes part of #47 and #48.

ajmath · 2013-01-16T21:59:59Z

+1

stedolan · 2013-01-17T10:53:53Z

This is working but undocumented in master, with a slightly strange syntax that may change. @csv is a builtin function that reformats its argument (which must be an array) as a CSV string. So, something like:

./jq -r '@csv'

will output 1,2,"hello, world" when given [1,2,"hello, world"].

johan · 2013-01-17T20:22:48Z

That sort of sounds like something else. If unclear, my wish is to get the csv output:

date,count,title
2011-01-12 13:14,17,"He's dead, Jim!"
2011-01-13 21:30,4711,"What do you mean, \"dead\"?"
2011-01-14 00:07,,Dead!

…apart from either of these two JSON/JSONish ones currently supported of the same data:

[ { "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
, { "date": "2011-01-13 21:30", "count": 4711, "title":"What do you mean, \"dead\"?" }
, { "date": "2011-01-14 00:07", "title":"Dead!" }
]

…or:

{ "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
{ "date": "2011-01-13 21:30", "count": 4711, "title":"What do you mean, \"dead\"?" }
{ "date": "2011-01-14 00:07", "title":"Dead!" }

Is that what you are shooting for, or something else that's useful for some other purpose?

ghost · 2013-01-18T00:09:17Z

One issue is that order is significant for csv, but not for JSON fields: we can't rely on the order of fields in JSON. This can be solved by specifying the mapping from JSON named fields to csv positional fields, by constructing an array of those fields, using [.date,.count,.title]:

input: { "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
jq -r '[.date,.count,.title] | @csv'
"2011-01-12 13:14",17,"He's dead, Jim!"

The csv output has slightly overzealous quoting (more than your target), but does no harm.

A second issue is you want to apply this to an array of objects, not just one. The .[] operator streams each item of an array in turn:

jq  -r '.[] | [.date, .count, .title] | @csv'
"2011-01-12 13:14",17,"He's dead, Jim!"
"2011-01-13 21:30",4711,"What do you mean, ""dead""?"
"2011-01-14 00:07",,"Dead!"

Finally, you also want a header, stating the csv field names at the top. The easiest way to do this is literally (we need parentheses because , binds tighter than |):

jq  -r '["date", "count", "title"], (.[] | [.date, .count, .title]) | @csv'
"date","count","title"
"2011-01-12 13:14",17,"He's dead, Jim!"
"2011-01-13 21:30",4711,"What do you mean, ""dead""?"
"2011-01-14 00:07",,"Dead!"

But duplication is bad. We can avoid repeating the same list of field names, by reusing the header array to lookup the fields in each object - but this looks confusing to me, I wonder if there's a clearer way?:

jq  -r '["date", "count", "title"] as $fields| $fields, (.[] | [.[$fields[]]]) | @csv'

EDIT here it is as a function , with a slightly nicer field syntax, using path() (but you still need to remember the -r switch - maybe a --csw does make sense):

def csv(fs): [path(null|fs)[]] as $fields| $fields, (.[] | [.[$fields[]]]) | @csv;
USAGE: csv(.date, .count, .title)

PS: if the input is not an array of objects, but just a sequence of objects (your second, JSONish, input), then we could just omit the .[] - but then we can't get the header at the top. It's probably best to convert it to an array, using the --slurp/-s option (or put [] around it, if it's generated within jq).

PPS: You can also get an array of the fields of an object using keys, but it returns them in alphabetical ("unicode codepoint") order. Of course, one could create a tool that does retain the ordering of JSON fields - but it's not part of the JSON spec, and JSON tools and web APIs don't seem to do it. (Though I think it would be convenient in some ways e.g. ordering the string/number fields first make JSON much more human-readable when deeply nested).

johan · 2013-01-18T00:46:03Z

While json attribute order can be arbitrary, I would be happy if (in situations where the jq command line does not stipulate an ordering) the csv column order is given by the order of attribute discovery from the input file, i e date, count and title in my example, as given by the input order of the first object (while ignoring the order of all subsequent input lines, since none of them introduce any additional attributes / columns).

For (typical?) cases where the jq command line itself names what fields the output should have, it would indeed be nice to figure out a DRY syntax that infers order by your already stated order on that command line, and either always print out the header names on the first line, or have a special csv-mode flag to specifically disable that output (make the rare case a flag).

My gut feel is that asking for csv output is best done as a command-line mode flag rather than as a step in the pipeline, so that kind of housekeeping logic becomes the responsibility of jq rather than yourself, while also making it easier to switch output formats back to json again, should you want to, but maybe that is substantially harder to implement?

neq1337 · 2013-03-14T11:17:32Z

Sorry for digging out this topic, but does @csv work in the currrent version? I got "compile error" when I try to use @csv, @html or others. I am using the windows jq version.

lluchs · 2013-03-14T18:01:50Z

No, it's not included in the current version.

neq1337 · 2013-03-14T18:03:37Z

Is it possible to download somewhere the previous version that had this implemented? I found that it is the only way to parse json...

nicowilliams · 2013-04-03T22:03:39Z

IMO this could be left to a separate utility.

neq1337 · 2013-05-07T12:46:54Z

@svpenn Thank you! CSV works, however this build adds some numbers to the parsed data. For example:

←[0m←[34;1m"id"←[0m←[37m: ←[0m←[0m518377←[0m←[37m,

The original data is id:518377.

Does anybody know how to resolve this issue?

neq1337 · 2013-05-07T14:36:21Z

@svnpenn Thanks a lot! :)

factom · 2013-05-17T23:18:49Z

👍 working well for me in master. love this feature.

jdanbrown · 2013-05-20T21:02:00Z

@csv is great! Can we get a @tsv as well?

pablomendes · 2013-08-01T20:20:55Z

+1 for @TSV 👍

Update: if you arrived here trying to find a way to do tsv formatting, the command below should also work:
jq -r '"\(.date)\t\(.count)\t\(.title)"'

stevezieglerva · 2014-08-31T13:12:35Z

Here's a the syntax for tab delimited when using DOS:
jq -r ".aggregations[] | .buckets[] | "(.key)\t(.doc_count)""

drorata · 2014-09-11T10:31:24Z

I fail to use the @csv and the suggestions of @stevezieglerva . Assume that I have:

[
  {
    "a": 1,
    "b": 2
  },
  {
    "a": 3,
    "b": 3
  },
  {
    "a": 2,
    "b": 1
  }
]

stored in foo.bar. How can I export it to CSV? Tried some combination, but I didn't manage to make it work...

drorata · 2014-09-11T11:14:27Z

OK. Got it...

cat foo.bar | jq '.[] | [.a, .b] | @csv'

Follow up: how can I get rid of the "'s and their escapes (in the case where the value is a string for instance)?

pkoppstein · 2014-09-11T13:20:18Z

@drorata - You might find the -r command-line option useful:

$ $ jq -n '["a","b","a\"b"]|@csv'
"\"a\",\"b\",\"a\"\"b\""

$ jq -r -n '["a","b","a\"b"]|@csv'
"a","b","a""b"

Notice in particular the double-double-quotes in the last line.

However, using the "-r" option may also not be want you want, depending on which variant of CSV you expect. For example, the "-r" option will cause the newline in "a\nb" to become an actual newline.

@csv is useful but it was not designed to produce output that conforms strictly with RFC 4180 (which, for example, requires CR/LF). It looks as though the intent was to provide a simple "UTF-8-style CSV", so that if some specific style of CSV is required, an external tool (or tool chain) can be used.

ghost · 2014-09-12T13:54:05Z

Perhaps @csv should output actual CSVs?

2014-09-11 15:20 GMT+02:00 pkoppstein notifications@github.com:

@drorata https://github.com/drorata - You might find the -r
command-line option useful:

$ $ jq -n '["a","b","a"b"]|@csv'
""a","b","a""b""

$ jq -r -n '["a","b","a"b"]|@csv'
"a","b","a""b"

Notice in particular the double-double-quotes in the last line.

However, using the "-r" option may also not be want you want, depending
on which variant of CSV you expect. For example, the "-r" option will cause
the newline in "a\nb" to become an actual newline.

@csv https://github.com/csv is useful but it was not designed to
produce output that conforms strictly with RFC 4180 (which, for example,
requires CR/LF). It looks as though the intent was to provide a simple
"UTF-8-style CSV", so that if some specific style of CSV is required, an
external tool (or tool chain) can be used.

"a
","b"
peter:~/jq$

—
Reply to this email directly or view it on GitHub
#48 (comment).

pkoppstein · 2014-09-12T15:16:37Z

@slapresta asked:

Perhaps @csv should output actual CSVs?

@csv must produce valid JSON, since it's a jq filter, not a "post-filter". I believe that jq in combination with -r does in fact produce "valid CSV" (see below), it being understood that there are many variations in common use.

Regarding the particular point about embedded newlines, jq is in conformance with the "800 pound gorilla CSV" standard:

http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs

In fact, this document also sanctions the use of EITHER CR or CRLF as a record separator:

Each record is one line   ...but
A record separator may consist of a line feed (ASCII/LF=0x0A), or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A).

Perhaps what's needed is a reference to some "CSV standard" to which @csv conforms? Or @csv4180? In addition to @TSV, of course :-)

JohnParkerXNU · 2014-09-12T15:31:21Z

Instead of polluting jq, why not use an external tool to convert JSON to CSV (e.g. https://github.com/jehiah/json2csv) ?

ghost · 2014-09-12T23:07:13Z

Agree with JohnParkerXNU

joelpurra · 2014-09-16T13:48:31Z

Update: see jq-hopkok's tabular folder.

For reference: RFC4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files
http://www.ietf.org/rfc/rfc4180.txt

I use two scripts I pipe JSON into, and get CSV/TSV output. Object keys are used to generate a header.
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-tsv.sh

Because of implementation details in to_entries (see #561 to_entries always outputs keys sorted - can it be avoided?), I have to name keys so they're sorted in the output. I follow the format 01--My first column, 02--My second column, 03--A third column, and clean them with another script.
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/clean-csv-sorted-header.sh
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/clean-tsv-sorted-header.sh

~~I'll throw the scripts into the (future) pile of utilities that is jq-hopkok when I have time to clean up code.~~ Update: see jq-hopkok's tabular folder.

[
    {
        "01--Employee name": "Homer Simpson",
        "02--Logins": 578
    },
    {
        "01--Employee name": "Carl Carlsson",
        "02--Logins": 75926
    }
]

<"input.json" array-of-objects-to-tsv.sh | clean-tsv-sorted-header.sh >"output.json"

Employee name   Logins
Homer Simpson   578
Carl Carlsson   75926

The same input piped through the CSV scripts produces this.

"Employee name","Logins"
"Homer Simpson",578
"Carl Carlsson",75926

I prefer TSV as it's much easier to split the raw data into columns (some tex/latex packages don't play nice with RFC4180 input). I've taken a shortcut in replacing \t in strings with \\t - in my own data I "assume" there are no tabs in the output.

Hope these scripts can help someone.

Update: see jq-hopkok's tabular folder.

kmatt · 2014-09-22T20:35:48Z

If I have a structure like this:

{
    "timestamp": 1319730758,
    "base": "USD",
    "rates": {
        "AED": 3.672626,
        "AFN": 48.3775,
        "ALL": 110.223333,
        "AMD": 409.604993
     } 
}

jq '.rates | @csv' responds with "jq: error: object cannot be csv-formatted, only array"

Is it necessary to "cast" the object to an array? If so, how?

wtlangford · 2014-09-22T21:00:18Z

There's no real casting, here. All the data types are too discrete, overall, for casting (except via string interpolation, but then all you have is a string.). The issue is that all @csv does is print an array out in csv form. How exactly did you want the output printed?

kmatt · 2014-09-22T21:44:15Z

@wtlangford In this case, I am converting a json source to flat delimited for bulk loading into a relational database table (PostgreSQL). I can pipe the output of " jq '.rates' " into awk, filter for the pairs and replace delimiter characters, but keeping everything in jq would be cleaner.

Example:

jq '.rates' rates.json | awk -v OFS=',' '/:/ { gsub(/[\":,]/,"",$0); print "'\''" $1 "'\''" , $2 }'

wtlangford · 2014-09-22T22:00:26Z

Looks like .rates | to_entries | .[] | [.key,.value] | @csv is what you want, then.

kmatt · 2014-09-22T22:04:05Z

@wtlangford Very close! Is there a way to omit the brackets, and print key, value on same line?

"AZN", 0.786006

instead of

[
  "AZN",
  0.786006
]

wtlangford · 2014-09-22T22:05:59Z

Don't forget the @csv on the end. You might also want to pass -r to jq when you invoke it to strip out the quoting escapes. Keep in mind that this may not quite be what you want, however. If any of your strings has a newline in it, that'll get printed out unescaped.

wtlangford · 2014-09-22T22:10:59Z

Just out of curiosity, why was @csv not sufficient at the end?

On Mon, Sep 22, 2014 at 6:09 PM, Matt Keranen notifications@github.com
wrote:

Thanks - this is close enough:

jq -c '.rates | to_entries | .[] | [.key,.value]' rates.json | tr -d "[]" | sed s/"/'/g

—
Reply to this email directly or view it on GitHub
#48 (comment).

kmatt · 2014-09-22T22:12:39Z

It was, I neglected it in a cut-and-paste error:

jq -c -r '.rates | to_entries | .[] | [.key,.value] | @csv'

wtlangford · 2014-09-22T22:15:20Z

Excellent! Glad to have helped.

Fjelley85 · 2015-01-29T13:18:03Z

Hi, I also have a problem with "Cannot index array with string"

My code: curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | ./jq -r '.[] | .committer | {login: .login,id: .id} | @csv'

Help would be appreciated

pkoppstein · 2015-01-29T16:37:54Z

The login and id values are associated with the "author" field:

jq -c -r '.[] | .author | {login,id} '

To output them using @csv:

jq -c -r '.[] | .author | [.login, .id] | @csv '

Fjelley85 · 2015-01-30T07:05:43Z

Thank you!

sankalp-khare · 2015-06-03T23:44:17Z

Thanks! the @csv option is very helpful 👍

joelpurra · 2015-07-27T11:10:26Z

I've finally moved my CSV/TSV utility scripts to jq-hopkok's tabular folder -- see also other helpful shell scripts for parallel processing etcetera.

AlJohri · 2019-07-10T17:57:39Z

It would be really great to have a way to turn lists of objects into csvs (with header) automatically.

Right now I'm resorting to json2csv:

curl 'http://.../nfl/participants/' | jq -c '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName}' | json2csv > players.csv

I would love to instead do | @csv directly to a list of objects.

curl 'http://.../nfl/participants/' | jq -r '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName} | @csv' > players.csv

Is there any chance something like this can be implemented?

EDIT: while https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh exists, I really need something that's either built into jq or at least easily installable like npm install -g json2csv. Built into JQ would be the ideal here.

EDIT2: woah, did not see this before commenting: https://github.com/joelpurra/jqnpm

pkoppstein · 2019-07-10T19:27:54Z

Here's a def that takes a stream of JSON objects and produces output suitable for feeding directly to @csv or @TSV. It does NOT require the keys to be consistently ordered, and only pays attention to the keys in the first object. An illustrative example follows.

def object2array(stream):
  foreach stream as $x (null;
    if . == null then $x | [true, keys_unsorted] else .[0]=false end;
    (if .[0] then .[1] else empty end),
    .[1] as $keys | $x | [getpath( $keys[] | [.]) ] );

Example

def data: [{a:1,b:2}, {b:22,a:11,c:0}];

object2array(data[])

Output

["a","b"]
[1,2]
[11,22]

to_table

to_table (available here) takes a very different approach. It can handle JSON irregardless of the depth of nesting, but is only useful if there is sufficient regularity, notably of the ordering of keys within corresponding objects.

AlJohri · 2019-07-10T19:50:20Z

How can I easily use “object2array” myself and on my team? Is there some way to definite global functions and share them? I would like to just run “| object2array” and have it work on my laptop and my colleagues laptops. What’s the best distribution strategy

pkoppstein · 2019-07-10T21:55:19Z

Unfortunately, all the simple options that I know of have one drawback or another.

Probably the simplest approach would be for everyone to add the desired utility functions to a file called ~/.jq since, if it exists, it will automatically be read in on startup. The main drawback of this approach is that it prevents ~/.jq from being used as a directory, which jq's module system by default recognizes as the location of jq modules.

Another option would be to take advantage of jq's module system. This is a tiny bit complicated, but is certainly worth considering. Here is one approach:

Copy the desired def into ~/.jq/jq.jq
Invoke jq as usual but preface the jq program with include "jq";
e.g. jq 'include "jq"; ...' input.json

A third option which I've had some success with is to piggy-back off npm, but as best I can tell, it's not completely trivial.

richardjharris · 2021-08-04T12:45:26Z

@tsv seems to work for me on jq-1.6 (Manjaro Linux)

stedolan added a commit that referenced this issue Dec 28, 2012

@foo syntax for encoding of strings into various formats.

89e2696

Fixes part of #47 and #48.

stedolan closed this as completed May 20, 2013

joelpurra mentioned this issue Aug 26, 2014

to_entries always outputs keys sorted - can it be avoided? #561

Closed

amandasaurus mentioned this issue Dec 11, 2014

Simple TSV output format #645

Closed

dtolnay added the feature request label Jul 27, 2015

tsv and/or csv output support #48

tsv and/or csv output support #48

Comments

johan commented Nov 20, 2012

stedolan commented Dec 3, 2012

alexchamberlain commented Dec 3, 2012

jdanbrown commented Dec 20, 2012

jdanbrown commented Dec 20, 2012

ajmath commented Jan 16, 2013

stedolan commented Jan 17, 2013

johan commented Jan 17, 2013

ghost commented Jan 18, 2013

johan commented Jan 18, 2013

neq1337 commented Mar 14, 2013

lluchs commented Mar 14, 2013

neq1337 commented Mar 14, 2013

nicowilliams commented Apr 3, 2013

neq1337 commented May 7, 2013

neq1337 commented May 7, 2013

factom commented May 17, 2013

jdanbrown commented May 20, 2013

pablomendes commented Aug 1, 2013

stevezieglerva commented Aug 31, 2014

drorata commented Sep 11, 2014

drorata commented Sep 11, 2014

pkoppstein commented Sep 11, 2014

ghost commented Sep 12, 2014

pkoppstein commented Sep 12, 2014

JohnParkerXNU commented Sep 12, 2014

ghost commented Sep 12, 2014

joelpurra commented Sep 16, 2014

kmatt commented Sep 22, 2014

wtlangford commented Sep 22, 2014

kmatt commented Sep 22, 2014

wtlangford commented Sep 22, 2014

kmatt commented Sep 22, 2014

wtlangford commented Sep 22, 2014

wtlangford commented Sep 22, 2014

kmatt commented Sep 22, 2014

wtlangford commented Sep 22, 2014

Fjelley85 commented Jan 29, 2015

pkoppstein commented Jan 29, 2015

Fjelley85 commented Jan 30, 2015

sankalp-khare commented Jun 3, 2015

joelpurra commented Jul 27, 2015

AlJohri commented Jul 10, 2019 • edited Loading

pkoppstein commented Jul 10, 2019 • edited Loading

Example

Output

to_table

AlJohri commented Jul 10, 2019

pkoppstein commented Jul 10, 2019

richardjharris commented Aug 4, 2021

AlJohri commented Jul 10, 2019 •

edited

Loading

pkoppstein commented Jul 10, 2019 •

edited

Loading