-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: sql support #4163
Comments
I think it can be good if the "way to go" was defined before doing remaining tasks. I think SQLAlchemy is a good way:
I don't know well SQLAlchemy. I can't tell if all wanted features can be implemented with it, or if it can cause some tricky problems. Damien G. |
@garaud I agree, I suspect it'll create fewer tricky problems than maintaining every flavour. :) I think I've pulled in all the open issues re sql (as they'll have to be addressed in either case). |
And while we're addressing the big picture, |
you should write the index for sure (and provide I guess this is a bit tricky because you don't have control of the table format (in maybe provide an option on writing, maybe : or have the user provide a column name (whichif its already being written won't then do anything) |
I guess the question is what to call it, then, if "index" is taken. |
...and whether to attempt to use it as the PRIMARY KEY |
index.name or index.names (do we have MultiIndex support ?) otherwise 'index' (or if Multi: [index_0, index_1, ...]) ? Can check if already a column name and raise, I guess that is also under the multiple columns with the same name.. raise (or mangle_dup_columns first)... :s Not sure can PK as index may not be unique... does pk matter? |
@danielballan what am I supposed to be adding to get mysql tests to pass? Does travis run them?
|
Travis does not run them. (I wrote those tests before I had any experience with Travis, but I will start a branch to install MySQL on the full_deps build.) Add a block to your MySQL configuration file that provides valid connection parameters for pandas to use.
|
I think #4110 should be in here/closed is it a dup? |
No, not a dup, and still open I believe. @hayd can add it to the checklist. Half the issues on the list could be closed in an afternoon's work, but maybe we should first decide whether we are moving to SQLAlchemy. @mangecoeur? |
I rebased @mangecoeur's branch cleanly (no conflicts!) to master a few days ago, but couldn't get the tests working on my system - haven't got time to play for a couple of days. Do you mind having a go @danielballan ? |
Sure... To avoid git surgery down the road, I'll ask a dumb question now. I should add your fork of pandas as a remote and work on a local copy of this branch, correct? |
fyi...need to add SQLAlchemy as a build dep (in the requirements file) for full deps, @cpcloud ? (its alread in ci/print_versions.py thought) |
fyi u can add to the requirements file "at will" (meaning, of course, when it makes sense) and things will be pulled from pypi unless i put up a wheel. wheels should be used only if the package is slow to install. |
@danielballan sadly because of workload i haven't had time to look into this much - i know pretty much what needs doing but i just don't have time to actually implement+test it. At the simplest level reading a frame using SQLAlchmey as a db abstraction looks like this: (That's code i actually needed in production, so it works and is neater than what's in my branch) The code for writing is in my branch, the principle is simple: map each Pandas Dtype to an SQLAlchemy supported type (most python builtins included), create a Table, write the values. Testing is the kicker, i have very little experience with TDD but i do know that running tests that require DBs is generally a pain. On the other hand, SQLAlchemy's cross DB support is very good, so if it works with one DB it should work for them all, with SQLite as a kind of lowest common denominator. Finally, I would recommend making SQLAlchemy a soft dependency (like a lot of the other IO) with a fallback to SQLite-support only since that's built into python - there's no point preventing people accessing SQLite just because they don't have SQLAlchemy installed. It's a simple try/except import, set an "sqlalchemy mode" flag, choose either genric or SQLite functions accordingly. |
@danielballan I didn't get the travis build working anyway so feel free throw away my commit (I'd have to rebase after we pull in your mysql travis testing anyway). |
I think that |
Perhaps get SQLAlchemy working (passing the same tests as MySQL) first and then worry about precisely how to call from read_sql and to_sql... Saying that other operations use an |
@jreback or maybe it should be more explicit: read_sqlite and read_sqlalchemy |
added extra cool |
@mangecoeur I disagree, I believe you could also provide a module level |
that could go in |
ahh...yes makes more sense, and fallback really doesn't make sense, and people rarely use more than one SQL backend |
probably some of the other module level variables floating around could go in |
@jreback thought about it some more and actually I agree with you now. The SQL string given to I think however we need to distinguish between 2 distinct user stories (use cases):
This case is served by
This is different from 1. because we have to assume a save format convention, or save additional metadata. You don't want to have to specify an SQL string when loading because the save/load layer should deal with it for you. This is akin to an ORM mapping, while 1. is dealing with raw SQL. I believe the I think it's important to keep this difference in mind in order to have a clear understanding of what the IO functions should and should not attempt to do, and what problems they're actually trying to address. It should also make the api simpler since, as I said above, there's no need for the user to specify SQL flavours as it can all be done automatically, and we don't need any additional machinery for defining save conventions because we assume they do not apply for this specific feature. |
but if sqlalchemy is the flavor, then you dont't need to specify DB specific sql (which is a good thing), but you DO need to know the backend (I agree conn already has some of this info, so maybe the just trying to keep api as simple as possible (but not too simple!) |
@jreback I think you slightly misunderstood how sqlalchemy works. It provides you with a variety of abstractions in order to generate DB specific SQL (e.g. you create a Table object, SQLAlchemy figures out the correct CREATE TABLE syntax for the DB you're using). It also deals with things like transactions and connection pools for you. However if you tell the engine to execute a plain SQL string, it will not attempt to "translate" that string - how could it seeing as there is in practice no "strictly correct" SQL syntax and feature set for it to translate from in the first place. So if you are supply raw sql, you do need to know the DB you are using and to write SQL specific for that DB. SQLAlchemy also knows the db you're using, since you have to specify it when you create the engine with the connection URI, and it deals with all the communication business for you. The main strength of using SQLAlchemy in this case is not for reading using a provided SQL string (where it just provides a light abstraction from DBAPI) but for writing to the DB, because we can programatically generate an SQLAlchemy Table and let it generate the correct SQL based on the db you provided it through The only condition is see is if you have
|
Pull request: #5950 |
TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings ENH pandas-dev#4163 Added tests and documentation Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH pandas-dev#4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH pandas-dev#4163 added version added coment ENH pandas-dev#4163 added depreciation warning for tquery and uquery ENH pandas-dev#4163 Documentation and tests ENH pandas-dev#4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH pandas-dev#4163 Fixed class renaming, expanded docs ENH pandas-dev#4163 Fixed tests in legacy mode ENH pandas-dev#4163 Use SQLAlchemy for DB abstraction TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings ENH pandas-dev#4163 Added tests and documentation Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH pandas-dev#4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH pandas-dev#4163 added version added coment ENH pandas-dev#4163 added depreciation warning for tquery and uquery ENH pandas-dev#4163 Documentation and tests ENH pandas-dev#4163 Added more robust type coertion, datetime parsing, and parse date options. Updated optional dependancies Added columns optional arg to read_table, removed failing legacy tests. Added columns to doc ENH pandas-dev#4163 Fixed class renaming, expanded docs ENH pandas-dev#4163 Fixed tests in legacy mode ENH pandas-dev#4163 Tweaks to docs, avoid mutable default args, mysql tests ENH pandas-dev#4163 Introduce DataFrame Index support. Refactor to introduce PandasSQLTable for cleaner OOP design ENH pandas-dev#4163 Fix bug in index + parse date interaction, added test case for problem ENH pandas-dev#4163 Fixed missing basestring import for py3.3 compat ENH pandas-dev#4163 Fixed missing string_types import for py3.3 compat
TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings
Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH #4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH #4163 added version added coment ENH #4163 added depreciation warning for tquery and uquery ENH #4163 Documentation and tests
TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. TST to use sqlachemy syntax in tests CLN sql into classes, legacy passes FIX few engine vs con calls CLN pep8 cleanup add postgres support for pandas.io.sql.get_schema WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon TODO: renamed _engine_read_table, need to think of a better name. TODO: clean up get_conneciton function ENH: cleanup of SQL io TODO: check that legacy mode works TODO: run tests correctly enabled coerce_float option Cleanup and bug-fixing mainly on legacy mode sql. IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized. TODO: tests and doc Added Test coverage for basic functionality using in-memory SQLite database Simplified API by automatically distinguishing between engine and connection. Added warnings
Initial draft of doc updates minor doc updates Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names Documentation updates, more tests Added depreciation warnings for legacy names. Updated docs and test doc build ENH #4163 - finalized tests and docs, ready for wider use… TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3 TST Import sqlalchemy on Travis. DOC add docstrings to read sql ENH read_sql connects via Connection, Engine, file path, or :memory: string CLN Separate legacy code into new file, and fallback so that all old tests pass. ENH #4163 added version added coment ENH #4163 added depreciation warning for tquery and uquery ENH #4163 Documentation and tests
ENH #4163 Fixed missing string_types import for py3.3 compat
@jorisvandenbossche want to close this and open another issue with things that are not covered by #6292 (if they are at all relevant), but maybe want to address 'later' |
@hayd You opened this issue, what do you think? Do you have a clear view what of this list is still relevant? |
Let's close and move parts into 6292, this issue has got very long. |
UPDATE: this is continued in #6292 after the refactor based on sqlalchemy.
This is a central issue to track sql support (migrated from the mysql support ticket - #2482 and #1662).
I think we are agreed we will be moving to use SQLAlchemy, in the hope of comprehensive cross-platform support:
Todo:
flavor
argument toread_sql
read_frame
(Feature request: specify chunksize for read_sql #2908)config
to specify default engine/connection for SQL (maybe?)Better Dtype coercion:
datetime.date
conversions https://groups.google.com/forum/#!topic/pydata/KEBKxj4cWNIPeformance
cc @garaud @mangecoeur @danielballan @changhiskhan
The text was updated successfully, but these errors were encountered: