-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsv and/or csv output support #48
Comments
Hrm. This would definitely be useful, but there are a lot of annoying special cases here. I'm not sure what should happen if the output's not a set of json objects, or if they have different fields. Also, CSV quoting rules are a nightmare, but I suppose I wouldn't have to parse the CSV afterwards... |
Just make it Excel compliant... |
+1 Just require the output to be arrays, and map those arrays directly to tsv/csv. Quoting shouldn't be too hard since you're generating it and not parsing it (as you point out). |
In the meantime, here's a simple tsv hack that takes advantage of string substitution (and will break if you have any tabs embedded in strings):
|
+1 |
This is working but undocumented in master, with a slightly strange syntax that may change.
will output |
That sort of sounds like something else. If unclear, my wish is to get the csv output:
…apart from either of these two JSON/JSONish ones currently supported of the same data:
…or:
Is that what you are shooting for, or something else that's useful for some other purpose? |
One issue is that order is significant for csv, but not for JSON fields: we can't rely on the order of fields in JSON. This can be solved by specifying the mapping from JSON named fields to csv positional fields, by constructing an array of those fields, using
The csv output has slightly overzealous quoting (more than your target), but does no harm. A second issue is you want to apply this to an array of objects, not just one. The
Finally, you also want a header, stating the csv field names at the top. The easiest way to do this is literally (we need parentheses because
But duplication is bad. We can avoid repeating the same list of field names, by reusing the header array to lookup the fields in each object - but this looks confusing to me, I wonder if there's a clearer way?:
EDIT here it is as a function , with a slightly nicer field syntax, using
PS: if the input is not an array of objects, but just a sequence of objects (your second, JSONish, input), then we could just omit the PPS: You can also get an array of the fields of an object using |
While json attribute order can be arbitrary, I would be happy if (in situations where the For (typical?) cases where the My gut feel is that asking for csv output is best done as a command-line mode flag rather than as a step in the pipeline, so that kind of housekeeping logic becomes the responsibility of |
No, it's not included in the current version. |
Is it possible to download somewhere the previous version that had this implemented? I found that it is the only way to parse json... |
IMO this could be left to a separate utility. |
@svpenn Thank you! CSV works, however this build adds some numbers to the parsed data. For example: ←[0m←[34;1m"id"←[0m←[37m: ←[0m←[0m518377←[0m←[37m, The original data is id:518377. Does anybody know how to resolve this issue? |
@svnpenn Thanks a lot! :) |
👍 working well for me in master. love this feature. |
|
+1 for @TSV 👍 Update: if you arrived here trying to find a way to do tsv formatting, the command below should also work: |
Here's a the syntax for tab delimited when using DOS: |
I fail to use the
stored in |
OK. Got it...
Follow up: how can I get rid of the |
@drorata - You might find the -r command-line option useful:
Notice in particular the double-double-quotes in the last line. However, using the "-r" option may also not be want you want, depending on which variant of CSV you expect. For example, the "-r" option will cause the newline in "a\nb" to become an actual newline. @csv is useful but it was not designed to produce output that conforms strictly with RFC 4180 (which, for example, requires CR/LF). It looks as though the intent was to provide a simple "UTF-8-style CSV", so that if some specific style of CSV is required, an external tool (or tool chain) can be used. |
Perhaps @csv should output actual CSVs? 2014-09-11 15:20 GMT+02:00 pkoppstein notifications@github.com:
|
@slapresta asked:
Regarding the particular point about embedded newlines, jq is in conformance with the "800 pound gorilla CSV" standard: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm#EmbedBRs In fact, this document also sanctions the use of EITHER CR or CRLF as a record separator:
Perhaps what's needed is a reference to some "CSV standard" to which @csv conforms? Or @csv4180? In addition to @TSV, of course :-) |
Instead of polluting jq, why not use an external tool to convert JSON to CSV (e.g. https://github.com/jehiah/json2csv) ? |
Agree with JohnParkerXNU |
Update: see jq-hopkok's tabular folder. For reference: RFC4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files I use two scripts I pipe JSON into, and get CSV/TSV output. Object keys are used to generate a header. Because of implementation details in
[
{
"01--Employee name": "Homer Simpson",
"02--Logins": 578
},
{
"01--Employee name": "Carl Carlsson",
"02--Logins": 75926
}
] <"input.json" array-of-objects-to-tsv.sh | clean-tsv-sorted-header.sh >"output.json" Employee name Logins
Homer Simpson 578
Carl Carlsson 75926 The same input piped through the CSV scripts produces this.
I prefer TSV as it's much easier to split the raw data into columns (some tex/latex packages don't play nice with RFC4180 input). I've taken a shortcut in replacing Hope these scripts can help someone. Update: see jq-hopkok's tabular folder. |
If I have a structure like this:
jq '.rates | @csv' responds with "jq: error: object cannot be csv-formatted, only array" Is it necessary to "cast" the object to an array? If so, how? |
There's no real casting, here. All the data types are too discrete, overall, for casting (except via string interpolation, but then all you have is a string.). The issue is that all |
@wtlangford In this case, I am converting a json source to flat delimited for bulk loading into a relational database table (PostgreSQL). I can pipe the output of " jq '.rates' " into awk, filter for the pairs and replace delimiter characters, but keeping everything in jq would be cleaner. Example:
|
Looks like |
@wtlangford Very close! Is there a way to omit the brackets, and print key, value on same line?
instead of
|
Don't forget the |
Just out of curiosity, why was On Mon, Sep 22, 2014 at 6:09 PM, Matt Keranen notifications@github.com
|
It was, I neglected it in a cut-and-paste error:
|
Excellent! Glad to have helped. |
Hi, I also have a problem with "Cannot index array with string" My code: curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | ./jq -r '.[] | .committer | {login: .login,id: .id} | @csv' Help would be appreciated |
The login and id values are associated with the "author" field:
To output them using @csv:
|
Thank you! |
Thanks! the @csv option is very helpful 👍 |
I've finally moved my CSV/TSV utility scripts to jq-hopkok's tabular folder -- see also other helpful shell scripts for parallel processing etcetera. |
It would be really great to have a way to turn lists of objects into csvs (with header) automatically. Right now I'm resorting to curl 'http://.../nfl/participants/' | jq -c '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName}' | json2csv > players.csv I would love to instead do curl 'http://.../nfl/participants/' | jq -r '.apiResults[].league.players[] | {teamId: .team.teamId, nickname: .team.nickname, playerId: .playerId, firstName: .firstName, lastName: .lastName} | @csv' > players.csv Is there any chance something like this can be implemented? EDIT: while https://github.com/joelpurra/jq-hopkok/blob/master/src/tabular/array-of-objects-to-csv.sh exists, I really need something that's either built into jq or at least easily installable like EDIT2: woah, did not see this before commenting: https://github.com/joelpurra/jqnpm |
Here's a def that takes a stream of JSON objects and produces output suitable for feeding directly to @csv or @TSV. It does NOT require the keys to be consistently ordered, and only pays attention to the keys in the first object. An illustrative example follows.
Example
Output
to_table
|
How can I easily use “object2array” myself and on my team? Is there some way to definite global functions and share them? I would like to just run “| object2array” and have it work on my laptop and my colleagues laptops. What’s the best distribution strategy |
Unfortunately, all the simple options that I know of have one drawback or another. Probably the simplest approach would be for everyone to add the desired utility functions to a file called ~/.jq since, if it exists, it will automatically be read in on startup. The main drawback of this approach is that it prevents ~/.jq from being used as a directory, which jq's module system by default recognizes as the location of jq modules. Another option would be to take advantage of jq's module system. This is a tiny bit complicated, but is certainly worth considering. Here is one approach:
A third option which I've had some success with is to piggy-back off npm, but as best I can tell, it's not completely trivial. |
|
Could we get a command-line flag to spit out tab- or comma-separated values instead of concatenated json objects, csv/tsv-quoting cells as necessary?
It would make jq a real power house for reports and spreadsheets generation.
The text was updated successfully, but these errors were encountered: