Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsv and/or csv output support #48

Closed
johan opened this issue Nov 20, 2012 · 46 comments
Closed

tsv and/or csv output support #48

johan opened this issue Nov 20, 2012 · 46 comments

Comments

@johan
Copy link

johan commented Nov 20, 2012

Could we get a command-line flag to spit out tab- or comma-separated values instead of concatenated json objects, csv/tsv-quoting cells as necessary?

It would make jq a real power house for reports and spreadsheets generation.

@stedolan
Copy link
Contributor

stedolan commented Dec 3, 2012

Hrm. This would definitely be useful, but there are a lot of annoying special cases here. I'm not sure what should happen if the output's not a set of json objects, or if they have different fields. Also, CSV quoting rules are a nightmare, but I suppose I wouldn't have to parse the CSV afterwards...

@alexchamberlain
Copy link
Contributor

Just make it Excel compliant...

@jdanbrown
Copy link

+1

Just require the output to be arrays, and map those arrays directly to tsv/csv. Quoting shouldn't be too hard since you're generating it and not parsing it (as you point out).

@jdanbrown
Copy link

In the meantime, here's a simple tsv hack that takes advantage of string substitution (and will break if you have any tabs embedded in strings):

$ echo '{"one":1,"two":"x"}' | jq --raw-output '"\(.one)\t\(.two)"'
1       x

stedolan added a commit that referenced this issue Dec 28, 2012
@ajmath
Copy link

ajmath commented Jan 16, 2013

+1

@stedolan
Copy link
Contributor

This is working but undocumented in master, with a slightly strange syntax that may change. @csv is a builtin function that reformats its argument (which must be an array) as a CSV string. So, something like:

./jq -r '@csv'

will output 1,2,"hello, world" when given [1,2,"hello, world"].

@johan
Copy link
Author

johan commented Jan 17, 2013

That sort of sounds like something else. If unclear, my wish is to get the csv output:

date,count,title
2011-01-12 13:14,17,"He's dead, Jim!"
2011-01-13 21:30,4711,"What do you mean, \"dead\"?"
2011-01-14 00:07,,Dead!

…apart from either of these two JSON/JSONish ones currently supported of the same data:

[ { "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
, { "date": "2011-01-13 21:30", "count": 4711, "title":"What do you mean, \"dead\"?" }
, { "date": "2011-01-14 00:07", "title":"Dead!" }
]

…or:

{ "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
{ "date": "2011-01-13 21:30", "count": 4711, "title":"What do you mean, \"dead\"?" }
{ "date": "2011-01-14 00:07", "title":"Dead!" }

Is that what you are shooting for, or something else that's useful for some other purpose?

@ghost
Copy link

ghost commented Jan 18, 2013

One issue is that order is significant for csv, but not for JSON fields: we can't rely on the order of fields in JSON. This can be solved by specifying the mapping from JSON named fields to csv positional fields, by constructing an array of those fields, using [.date,.count,.title]:

input: { "date": "2011-01-12 13:14", "count": 17, "title":"He's dead, Jim!" }
jq -r '[.date,.count,.title] | @csv'
"2011-01-12 13:14",17,"He's dead, Jim!"

The csv output has slightly overzealous quoting (more than your target), but does no harm.

A second issue is you want to apply this to an array of objects, not just one. The .[] operator streams each item of an array in turn:

jq  -r '.[] | [.date, .count, .title] | @csv'
"2011-01-12 13:14",17,"He's dead, Jim!"
"2011-01-13 21:30",4711,"What do you mean, ""dead""?"
"2011-01-14 00:07",,"Dead!"

Finally, you also want a header, stating the csv field names at the top. The easiest way to do this is literally (we need parentheses because , binds tighter than |):

jq  -r '["date", "count", "title"], (.[] | [.date, .count, .title]) | @csv'
"date","count","title"
"2011-01-12 13:14",17,"He's dead, Jim!"
"2011-01-13 21:30",4711,"What do you mean, ""dead""?"
"2011-01-14 00:07",,"Dead!"

But duplication is bad. We can avoid repeating the same list of field names, by reusing the header array to lookup the fields in each object - but this looks confusing to me, I wonder if there's a clearer way?:

jq  -r '["date", "count", "title"] as $fields| $fields, (.[] | [.[$fields[]]]) | @csv'

EDIT here it is as a function , with a slightly nicer field syntax, using path() (but you still need to remember the -r switch - maybe a --csw does make sense):

def csv(fs): [path(null|fs)[]] as $fields| $fields, (.[] | [.[$fields[]]]) | @csv;
USAGE: csv(.date, .count, .title)

PS: if the input is not an array of objects, but just a sequence of objects (your second, JSONish, input), then we could just omit the .[] - but then we can't get the header at the top. It's probably best to convert it to an array, using the --slurp/-s option (or put [] around it, if it's generated within jq).

PPS: You can also get an array of the fields of an object using keys, but it returns them in alphabetical ("unicode codepoint") order. Of course, one could create a tool that does retain the ordering of JSON fields - but it's not part of the JSON spec, and JSON tools and web APIs don't seem to do it. (Though I think it would be convenient in some ways e.g. ordering the string/number fields first make JSON much more human-readable when deeply nested).

@johan
Copy link
Author

johan commented Jan 18, 2013

While json attribute order can be arbitrary, I would be happy if (in situations where the jq command line does not stipulate an ordering) the csv column order is given by the order of attribute discovery from the input file, i e date, count and title in my example, as given by the input order of the first object (while ignoring the order of all subsequent input lines, since none of them introduce any additional attributes / columns).

For (typical?) cases where the jq command line itself names what fields the output should have, it would indeed be nice to figure out a DRY syntax that infers order by your already stated order on that command line, and either always print out the header names on the first line, or have a special csv-mode flag to specifically disable that output (make the rare case a flag).

My gut feel is that asking for csv output is best done as a command-line mode flag rather than as a step in the pipeline, so that kind of housekeeping logic becomes the responsibility of jq rather than yourself, while also making it easier to switch output formats back to json again, should you want to, but maybe that is substantially harder to implement?

@neq1337
Copy link

neq1337 commented Mar 14, 2013

Sorry for digging out this topic, but does @csv work in the currrent version? I got "compile error" when I try to use @csv, @html or others. I am using the windows jq version.

@lluchs
Copy link

lluchs commented Mar 14, 2013

No, it's not included in the current version.

@neq1337
Copy link

neq1337 commented Mar 14, 2013

Is it possible to download somewhere the previous version that had this implemented? I found that it is the only way to parse json...

@nicowilliams
Copy link
Contributor

IMO this could be left to a separate utility.

@neq1337
Copy link

neq1337 commented May 7, 2013

@svpenn Thank you! CSV works, however this build adds some numbers to the parsed data. For example:

←[0m←[34;1m"id"←[0m←[37m: ←[0m←[0m518377←[0m←[37m,

The original data is id:518377.

Does anybody know how to resolve this issue?

@neq1337
Copy link

neq1337 commented May 7, 2013

@svnpenn Thanks a lot! :)

@factom
Copy link

factom commented May 17, 2013

👍 working well for me in master. love this feature.

@jdanbrown
Copy link

@csv is great! Can we get a @tsv as well?

@pablomendes
Copy link

+1 for @TSV 👍

Update: if you arrived here trying to find a way to do tsv formatting, the command below should also work:
jq -r '"\(.date)\t\(.count)\t\(.title)"'

@stevezieglerva
Copy link

Here's a the syntax for tab delimited when using DOS:
jq -r ".aggregations[] | .buckets[] | "(.key)\t(.doc_count)""

@drorata
Copy link

drorata commented Sep 11, 2014

I fail to use the @csv and the suggestions of @stevezieglerva . Assume that I have:

[
  {
    "a": 1,
    "b": 2
  },
  {
    "a": 3,
    "b": 3
  },
  {
    "a": 2,
    "b": 1
  }
]

stored in foo.bar. How can I export it to CSV? Tried some combination, but I didn't manage to make it work...

@drorata
Copy link

drorata commented Sep 11, 2014

OK. Got it...

cat foo.bar | jq '.[] | [.a, .b] | @csv'

Follow up: how can I get rid of the "'s and their escapes (in the case where the value is a string for instance)?

@pkoppstein
Copy link
Contributor

@drorata - You might find the -r command-line option useful:

$ $ jq -n '["a","b","a\"b"]|@csv'
"\"a\",\"b\",\"a\"\"b\""

$ jq -r -n '["a","b","a\"b"]|@csv'
"a","b","a""b"

Notice in particular the double-double-quotes in the last line.

However, using the "-r" option may also not be want you want, depending on which variant of CSV you expect. For example, the "-r" option will cause the newline in "a\nb" to become an actual newline.

@csv is useful but it was not designed to produce output that conforms strictly with RFC 4180 (which, for example, requires CR/LF). It looks as though the intent was to provide a simple "UTF-8-style CSV", so that if some specific style of CSV is required, an external tool (or tool chain) can be used.

@ghost
Copy link

ghost commented Sep 12, 2014

Perhaps @csv should output actual CSVs?

2014-09-11 15:20 GMT+02:00 pkoppstein notifications@github.com:

@drorata https://github.com/drorata - You might find the -r
command-line option useful:

$ $ jq -n '["a","b","a"b"]|@csv'
""a","b","a""b""

$ jq -r -n '["a","b","a"b"]|@csv'
"a","b","a""b"

Notice in particular the double-double-quotes in the last line.

However, using the "-r" option may also not be want you want, depending
on which variant of CSV you expect. For example, the "-r" option will cause
the newline in "a\nb" to become an actual newline.

@csv https://github.com/csv is useful but it was not designed to
produce output that conforms strictly with RFC 4180 (which, for example,
requires CR/LF). It looks as though the intent was to provide a simple
"UTF-8-style CSV", so that if some specific style of CSV is required, an
external tool (or tool chain) can be used.

"a
","b"
peter:~/jq$


Reply to this email directly or view it on GitHub
#48 (comment).

@pkoppstein
Copy link
Contributor

@slapresta asked:

Perhaps @csv should output actual CSVs?

@csv must produce valid JSON, since it's a jq filter, not a "post-filter". I believe that jq in combination with -r does in fact produce "valid CSV" (see below), it being understood that there are many variations in common use.

Regarding the particular point about embedded newlines, jq is in conformance with the "800 pound gorilla CSV" standard:

http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs

In fact, this document also sanctions the use of EITHER CR or CRLF as a record separator:

Each record is one line   ...but
A record separator may consist of a line feed (ASCII/LF=0x0A), or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A).

Perhaps what's needed is a reference to some "CSV standard" to which @csv conforms? Or @csv4180? In addition to @TSV, of course :-)

@JohnParkerXNU
Copy link

Instead of polluting jq, why not use an external tool to convert JSON to CSV (e.g. https://github.com/jehiah/json2csv) ?

@ghost
Copy link

ghost commented Sep 12, 2014

Agree with JohnParkerXNU

@joelpurra
Copy link
Contributor

Update: see jq-hopkok's tabular folder.


For reference: RFC4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files
http://www.ietf.org/rfc/rfc4180.txt

I use two scripts I pipe JSON into, and get CSV/TSV output. Object keys are used to generate a header.
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-tsv.sh

Because of implementation details in to_entries (see #561 to_entries always outputs keys sorted - can it be avoided?), I have to name keys so they're sorted in the output. I follow the format 01--My first column, 02--My second column, 03--A third column, and clean them with another script.
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/clean-csv-sorted-header.sh
https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/clean-tsv-sorted-header.sh

I'll throw the scripts into the (future) pile of utilities that is jq-hopkok when I have time to clean up code. Update: see jq-hopkok's tabular folder.

[
    {
        "01--Employee name": "Homer Simpson",
        "02--Logins": 578
    },
    {
        "01--Employee name": "Carl Carlsson",
        "02--Logins": 75926
    }
]
<"input.json" array-of-objects-to-tsv.sh | clean-tsv-sorted-header.sh >"output.json"
Employee name   Logins
Homer Simpson   578
Carl Carlsson   75926

The same input piped through the CSV scripts produces this.

"Employee name","Logins"
"Homer Simpson",578
"Carl Carlsson",75926

I prefer TSV as it's much easier to split the raw data into columns (some tex/latex packages don't play nice with RFC4180 input). I've taken a shortcut in replacing \t in strings with \\t - in my own data I "assume" there are no tabs in the output.

Hope these scripts can help someone.


Update: see jq-hopkok's tabular folder.

@kmatt
Copy link

kmatt commented Sep 22, 2014

If I have a structure like this:

{
    "timestamp": 1319730758,
    "base": "USD",
    "rates": {
        "AED": 3.672626,
        "AFN": 48.3775,
        "ALL": 110.223333,
        "AMD": 409.604993
     } 
}

jq '.rates | @csv' responds with "jq: error: object cannot be csv-formatted, only array"

Is it necessary to "cast" the object to an array? If so, how?

@wtlangford
Copy link
Contributor

There's no real casting, here. All the data types are too discrete, overall, for casting (except via string interpolation, but then all you have is a string.). The issue is that all @csv does is print an array out in csv form. How exactly did you want the output printed?

@kmatt
Copy link

kmatt commented Sep 22, 2014

@wtlangford In this case, I am converting a json source to flat delimited for bulk loading into a relational database table (PostgreSQL). I can pipe the output of " jq '.rates' " into awk, filter for the pairs and replace delimiter characters, but keeping everything in jq would be cleaner.

Example:

jq '.rates' rates.json | awk -v OFS=',' '/:/ { gsub(/[\":,]/,"",$0); print "'\''" $1 "'\''" , $2 }'

@wtlangford
Copy link
Contributor

Looks like .rates | to_entries | .[] | [.key,.value] | @csv is what you want, then.

@kmatt
Copy link

kmatt commented Sep 22, 2014

@wtlangford Very close! Is there a way to omit the brackets, and print key, value on same line?

"AZN", 0.786006

instead of

[
  "AZN",
  0.786006
]

@wtlangford
Copy link
Contributor

Don't forget the @csv on the end. You might also want to pass -r to jq when you invoke it to strip out the quoting escapes. Keep in mind that this may not quite be what you want, however. If any of your strings has a newline in it, that'll get printed out unescaped.

@wtlangford
Copy link
Contributor

Just out of curiosity, why was @csv not sufficient at the end?

On Mon, Sep 22, 2014 at 6:09 PM, Matt Keranen notifications@github.com
wrote:

Thanks - this is close enough:

jq -c '.rates | to_entries | .[] | [.key,.value]' rates.json | tr -d "[]" | sed s/"/'/g


Reply to this email directly or view it on GitHub
#48 (comment).

@kmatt
Copy link

kmatt commented Sep 22, 2014

It was, I neglected it in a cut-and-paste error:

jq -c -r '.rates | to_entries | .[] | [.key,.value] | @csv'

@wtlangford
Copy link
Contributor

Excellent! Glad to have helped.

@Fjelley85
Copy link

Hi, I also have a problem with "Cannot index array with string"

My code: curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | ./jq -r '.[] | .committer | {login: .login,id: .id} | @csv'

Help would be appreciated

@pkoppstein
Copy link
Contributor

The login and id values are associated with the "author" field:

jq -c -r '.[] | .author | {login,id} ' 

To output them using @csv:

jq -c -r '.[] | .author | [.login, .id] | @csv ' 

@Fjelley85
Copy link

Thank you!

@sankalp-khare
Copy link

Thanks! the @csv option is very helpful 👍

@joelpurra
Copy link
Contributor

I've finally moved my CSV/TSV utility scripts to jq-hopkok's tabular folder -- see also other helpful shell scripts for parallel processing etcetera.

@AlJohri
Copy link

AlJohri commented Jul 10, 2019

It would be really great to have a way to turn lists of objects into csvs (with header) automatically.

Right now I'm resorting to json2csv:

curl 'http://.../nfl/participants/' | jq -c '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName}' | json2csv > players.csv

I would love to instead do | @csv directly to a list of objects.

curl 'http://.../nfl/participants/' | jq -r '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName} | @csv' > players.csv

Is there any chance something like this can be implemented?

EDIT: while https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh exists, I really need something that's either built into jq or at least easily installable like npm install -g json2csv. Built into JQ would be the ideal here.

EDIT2: woah, did not see this before commenting: https://github.com/joelpurra/jqnpm

@pkoppstein
Copy link
Contributor

pkoppstein commented Jul 10, 2019

Here's a def that takes a stream of JSON objects and produces output suitable for feeding directly to @csv or @TSV. It does NOT require the keys to be consistently ordered, and only pays attention to the keys in the first object. An illustrative example follows.

def object2array(stream):
  foreach stream as $x (null;
    if . == null then $x | [true, keys_unsorted] else .[0]=false end;
    (if .[0] then .[1] else empty end),
    .[1] as $keys | $x | [getpath( $keys[] | [.]) ] );

Example

def data: [{a:1,b:2}, {b:22,a:11,c:0}];

object2array(data[])

Output

["a","b"]
[1,2]
[11,22]

to_table

to_table (available here) takes a very different approach. It can handle JSON irregardless of the depth of nesting, but is only useful if there is sufficient regularity, notably of the ordering of keys within corresponding objects.

@AlJohri
Copy link

AlJohri commented Jul 10, 2019

How can I easily use “object2array” myself and on my team? Is there some way to definite global functions and share them? I would like to just run “| object2array” and have it work on my laptop and my colleagues laptops. What’s the best distribution strategy

@pkoppstein
Copy link
Contributor

Unfortunately, all the simple options that I know of have one drawback or another.

Probably the simplest approach would be for everyone to add the desired utility functions to a file called ~/.jq since, if it exists, it will automatically be read in on startup. The main drawback of this approach is that it prevents ~/.jq from being used as a directory, which jq's module system by default recognizes as the location of jq modules.

Another option would be to take advantage of jq's module system. This is a tiny bit complicated, but is certainly worth considering. Here is one approach:

  1. Copy the desired def into ~/.jq/jq.jq
  2. Invoke jq as usual but preface the jq program with include "jq";
    e.g. jq 'include "jq"; ...' input.json

A third option which I've had some success with is to piggy-back off npm, but as best I can tell, it's not completely trivial.

@richardjharris
Copy link

@tsv seems to work for me on jq-1.6 (Manjaro Linux)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests