-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: specify chunksize for read_sql #2908
Comments
The exact error I got was this on pandas version 0.10.1:
The TypeError is a little confusing as it took me awhile to figure out it was happening because I was hitting the memory limit. Maybe just a clearer error message would be enough (max query sized reached or something), perhaps suggesting that the user use the SQL LIMIT command to prevent this problem (See http://php.about.com/od/mysqlcommands/g/Limit_sql.htm) |
I just ran:
with no problem |
so this would need (to be consistent) |
For the time being, here is a simple implementation of the requested functionality: https://gist.github.com/lebedov/6831387 |
@jorisvandenbossche @hayd this needs to go on the new sql issues list? |
Hmm, I preferably keep the list in #6292 as the important todo's that should ideally be finished before releasing it. And this is a nice feature request, but not a blocker for the basic functionality. Just keep it as a seperate issue? |
ok....how about you create another issue (mark as 0.15), then will include items that are not in #6292 #3745, #5008, #2754 I think should go on one of these as well (or if already satisifed by another issue go ahead and close) |
This came up again here: http://stackoverflow.com/q/25633830/1240268 |
I take full responsibility for asking how to pull large amounts of data from a remote server, into a DataFrame, that @hayd just referenced and answered in such good detail on SO ––for which I thank you! I've updated the SO question with more context, but if I can help / contribute in any way here, I'd be more than happy to. |
@mariusbutuc if you want to try to implement it and send a pull request, that would be very welcome! I think this could be done inside the |
It would be helpful to iterate through rows returned from an sql query (sqlite specifically) chunk by chunk just as is done in the read_csv and text files function as described here: http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk
The return value should be an iterable object. This will prevent queries from returning too large an amount of data, (possibly) exceeding the system memory.
The text was updated successfully, but these errors were encountered: