Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made a tool to rename columns, want to integrate it #814

Closed
culebron opened this issue Mar 27, 2017 · 8 comments
Closed

Made a tool to rename columns, want to integrate it #814

culebron opened this issue Mar 27, 2017 · 8 comments
Labels

Comments

@culebron
Copy link

culebron commented Mar 27, 2017

Example: I often have to export data from MongoDB where columns are extracted from an hierarchy, and have weird names.

mongoexport -d my_db -c my_collection --csv
       -f id,geopos.coordinates.0,geopos.coordinates.1,address > data.csv

There's no direct way in csvkit to rename those columns, only remove them completely or write a pipe, something like echo "column1,column2" > my_file.csv && head +2 >> my_file.csv.

I suggest to make a tool or an option to rename columns.

  • Option 1: a parameter to csvcut:

     csvcut source.csv --rename-columns geopos.coordinates.0:x,geopos.coornidates.1:y > dest.csv
    
  • Option 2: a separate tool:

     csvrename source.csv -c geopos.coordinates.0:x,geopos.coornidates.1:y > dest.csv
    

I have a working version of the latter in my repo. Please write what you think of this.

@jpmckinney
Copy link
Member

Adding link to review: master...culebron:master

@jpmckinney
Copy link
Member

How do you handle a column name containing a colon? (or whatever separator character you might switch to)

@jpmckinney
Copy link
Member

For reference, earlier issues include #530. The above implementation is streaming (good). Just noting that the implementation of agate.Table.rename may be relevant.

@culebron
Copy link
Author

Good question about the colon. Let me look into agate.Table.rename first. I looked at the tools and think we may just extend csvcut a bit, otherwise it's just too much pipelining and name repetition in the scripts.

My idea now is to add another parameter: --rename-columns old1:new1,old2:new2. I'll probably use backslash to quote colons and commas in the names.

@halloleo
Copy link

I'm in favour of a rename tool in csvkit! BTW, renaming columns is very close to adding columns - any ideas to integrate this functionality in the new tool? (I know about the adding via csvjoin, but it doesn't work that well.)

@jpmckinney
Copy link
Member

@halloleo In what way does it not work well with csvjoin (as described in the docs)?

@halloleo
Copy link

I have to clarify: It does work as advertised, but it is a bit cumbersome: First you create a new column at the end,then yiou can move the column via a 2nd csvcut call into the order where you want it...

@jpmckinney
Copy link
Member

Closing in favor of #396

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants