csvsql Killed #633

NickBuchny · 2016-06-30T01:18:46Z

I've been attempting to use csvsql to create a table from an ~4gb .csv file and after around 10-15 minutes the ubuntu terminal returns 'killed'. I'm generally new to using this module etc. so if you need more information let me know.

Input:

csvsql train.csv

csvsql --no-contraints train.csv

output:

killed

edit: if anyone could let me know what's happening/give another route I'd appreciate it

vielmetti · 2016-08-04T03:15:04Z

Can you run the "top" command while this is going on - you might be running out of some system resource along the way.

jpmckinney · 2016-08-04T15:10:55Z

If you run csvsql -v you should see some more verbose output.

NickBuchny · 2016-08-06T11:40:18Z

Will do! In the middle of summer finals but I'm picking my project up right after.

aoa4eva · 2016-08-28T02:47:05Z

I'm having a similar problem, but with a file that's about 40Mb. There is no additional output when I use -v, just "Killed".

The table is created, but is empty, and so running the same command a second time does actually yield verbose input indicating that the table already exists.

Small files work well, though.

polvoazul · 2016-08-29T16:46:38Z

I suspect your RAM was full! Search for 'oom killer'

aoa4eva · 2016-08-30T17:07:43Z

Thanks. I monitored memory usage as I ran the command - yes, available memory decreases beyond the amount of memory used while the device is 'at rest', so that's likely. Are there any settings I can use to expand available memory using storage? I've already started splitting the file - that may be the easiest work around for now.

polvoazul · 2016-08-30T20:53:51Z

If you are on linux you can increase the size of your swap partition, it will save your program from being killed, but it will probably slow everything down, as the OS will use HDD as substitute for memory and it is MUCH slower.
I suggest closing everything that consumes memory (browser is a big one) before running the memory intensive program.
Your ideia of splitting the files is probably the best solution for big files.
Buying more RAM is also good, it is cheap and the whole system can benefit from it automatically (through OS file caches)

aoa4eva · 2016-08-31T12:45:34Z

It's running on a cloud server, so there's no browser, but I'll get more memory if I need to process more files. So far, it's just the one. Thanks for your help!

dannguyen · 2016-10-19T01:20:03Z

csvsql is not performant on huge data, last I tried (and remembered from the docs). I recommend that you use csvsql for just creating schemas or for smaller jobs, and pipe your CSV data into a database using another tool. csvkit is great for getting the data ready for a database. I don't think it was meant for big data tasks.

As an example of another tool, datanews/tables, when i used it a long while ago when it was still in development, worked quite beautifully. It's a CLI tool too, but built in node:

https://github.com/datahoarder/fec_individual_donors#use-datanewstables-a-data-to-sql-importer-in-nodejs

aoa4eva · 2016-10-19T01:29:18Z

It worked when I split the file into a max of 20k records. Took some
experimentation.

Wrote a script to load each of the split files and it worked well.

On Oct 18, 2016 9:20 PM, "Dan Nguyen" notifications@github.com wrote:

csvsql is not performant on huge data, last I tried (and remembered from
the docs). I recommend that you use csvsql for just creating schemas or for
smaller jobs, and pipe your CSV data into a database using another tool.
csvkit is great for getting the data ready for a database. I don't think it
was meant for big data tasks.

As an example of another tool, datanews/tables, when i used it a long
while ago when it was still in development, worked quite beautifully. It's
a CLI tool too, but built in node:

https://github.com/datahoarder/fec_individual_donors#use-datanewstables-a-
data-to-sql-importer-in-nodejs

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#633 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHoJuVkvbCou1PJfBPzmPSd3ltXKj4Qjks5q1XBHgaJpZM4JBt6E
.

onyxfish · 2016-12-29T15:01:43Z

Closing this since there was a resolution. Opened a documentation ticket #735.

Add a global ‘max_precision’ argument to print_table

jpmckinney added the performance label Jun 30, 2016

onyxfish mentioned this issue Dec 29, 2016

csvsql: Document creating schema and COPYing data for large jobs #735

Closed

onyxfish closed this as completed Dec 29, 2016

jpmckinney mentioned this issue Jan 30, 2017

Buffering tools: warn if running against extremely large files #737

Open

lcorbasson pushed a commit to lcorbasson/csvkit that referenced this issue Sep 7, 2020

Merge pull request wireservice#633 from orsharir/master

79c121f

Add a global ‘max_precision’ argument to print_table

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csvsql Killed #633

csvsql Killed #633

NickBuchny commented Jun 30, 2016 •

edited

Loading

vielmetti commented Aug 4, 2016

jpmckinney commented Aug 4, 2016

NickBuchny commented Aug 6, 2016

aoa4eva commented Aug 28, 2016 •

edited

Loading

polvoazul commented Aug 29, 2016

aoa4eva commented Aug 30, 2016 •

edited

Loading

polvoazul commented Aug 30, 2016

aoa4eva commented Aug 31, 2016

dannguyen commented Oct 19, 2016

aoa4eva commented Oct 19, 2016

onyxfish commented Dec 29, 2016

csvsql Killed #633

csvsql Killed #633

Comments

NickBuchny commented Jun 30, 2016 • edited Loading

vielmetti commented Aug 4, 2016

jpmckinney commented Aug 4, 2016

NickBuchny commented Aug 6, 2016

aoa4eva commented Aug 28, 2016 • edited Loading

polvoazul commented Aug 29, 2016

aoa4eva commented Aug 30, 2016 • edited Loading

polvoazul commented Aug 30, 2016

aoa4eva commented Aug 31, 2016

dannguyen commented Oct 19, 2016

aoa4eva commented Oct 19, 2016

onyxfish commented Dec 29, 2016

NickBuchny commented Jun 30, 2016 •

edited

Loading

aoa4eva commented Aug 28, 2016 •

edited

Loading

aoa4eva commented Aug 30, 2016 •

edited

Loading