Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvsql Killed #633

Closed
NickBuchny opened this issue Jun 30, 2016 · 11 comments
Closed

csvsql Killed #633

NickBuchny opened this issue Jun 30, 2016 · 11 comments

Comments

@NickBuchny
Copy link

NickBuchny commented Jun 30, 2016

I've been attempting to use csvsql to create a table from an ~4gb .csv file and after around 10-15 minutes the ubuntu terminal returns 'killed'. I'm generally new to using this module etc. so if you need more information let me know.

Input:

csvsql train.csv

csvsql --no-contraints train.csv

output:

killed

edit: if anyone could let me know what's happening/give another route I'd appreciate it

@vielmetti
Copy link

Can you run the "top" command while this is going on - you might be running out of some system resource along the way.

@jpmckinney
Copy link
Member

If you run csvsql -v you should see some more verbose output.

@NickBuchny
Copy link
Author

Will do! In the middle of summer finals but I'm picking my project up right after.

@aoa4eva
Copy link

aoa4eva commented Aug 28, 2016

I'm having a similar problem, but with a file that's about 40Mb. There is no additional output when I use -v, just "Killed".

The table is created, but is empty, and so running the same command a second time does actually yield verbose input indicating that the table already exists.

Small files work well, though.

@polvoazul
Copy link

I suspect your RAM was full! Search for 'oom killer'

@aoa4eva
Copy link

aoa4eva commented Aug 30, 2016

Thanks. I monitored memory usage as I ran the command - yes, available memory decreases beyond the amount of memory used while the device is 'at rest', so that's likely. Are there any settings I can use to expand available memory using storage? I've already started splitting the file - that may be the easiest work around for now.

@polvoazul
Copy link

If you are on linux you can increase the size of your swap partition, it will save your program from being killed, but it will probably slow everything down, as the OS will use HDD as substitute for memory and it is MUCH slower.
I suggest closing everything that consumes memory (browser is a big one) before running the memory intensive program.
Your ideia of splitting the files is probably the best solution for big files.
Buying more RAM is also good, it is cheap and the whole system can benefit from it automatically (through OS file caches)

@aoa4eva
Copy link

aoa4eva commented Aug 31, 2016

It's running on a cloud server, so there's no browser, but I'll get more memory if I need to process more files. So far, it's just the one. Thanks for your help!

@dannguyen
Copy link
Contributor

csvsql is not performant on huge data, last I tried (and remembered from the docs). I recommend that you use csvsql for just creating schemas or for smaller jobs, and pipe your CSV data into a database using another tool. csvkit is great for getting the data ready for a database. I don't think it was meant for big data tasks.

As an example of another tool, datanews/tables, when i used it a long while ago when it was still in development, worked quite beautifully. It's a CLI tool too, but built in node:

https://github.com/datahoarder/fec_individual_donors#use-datanewstables-a-data-to-sql-importer-in-nodejs

@aoa4eva
Copy link

aoa4eva commented Oct 19, 2016

It worked when I split the file into a max of 20k records. Took some
experimentation.

Wrote a script to load each of the split files and it worked well.

On Oct 18, 2016 9:20 PM, "Dan Nguyen" notifications@github.com wrote:

csvsql is not performant on huge data, last I tried (and remembered from
the docs). I recommend that you use csvsql for just creating schemas or for
smaller jobs, and pipe your CSV data into a database using another tool.
csvkit is great for getting the data ready for a database. I don't think it
was meant for big data tasks.

As an example of another tool, datanews/tables, when i used it a long
while ago when it was still in development, worked quite beautifully. It's
a CLI tool too, but built in node:

https://github.com/datahoarder/fec_individual_donors#use-datanewstables-a-
data-to-sql-importer-in-nodejs


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#633 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHoJuVkvbCou1PJfBPzmPSd3ltXKj4Qjks5q1XBHgaJpZM4JBt6E
.

@onyxfish
Copy link
Collaborator

Closing this since there was a resolution. Opened a documentation ticket #735.

lcorbasson pushed a commit to lcorbasson/csvkit that referenced this issue Sep 7, 2020
Add a global ‘max_precision’ argument to print_table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants