-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTables enhancements for selection #1996
Comments
1. added __str__ (to do __repr__) 2. row removal in tables is much faster if rows are consecutive 3. added Term class, refactored Selection (this is backdwards compatible) Term is a concise way of specifying conditions for queries, e.g. Term(dict(field = 'index', op = '>', value = '20121114')) Term('index', '20121114') Term('index', '>', '20121114') Term('index', ['20121114','20121114']) Term('index', datetime(2012,11,14)) Term('index>20121114') updated tests for same this should close GH pandas-dev#1996
1. added __str__ (to do __repr__) 2. added __delitem__ to support store deletion syntatic sugar 3. row removal in tables is much faster if rows are consecutive 4. added Term class, refactored Selection (this is backwards compatible) Term is a concise way of specifying conditions for queries, e.g. Term(dict(field = 'index', op = '>', value = '20121114')) Term('index', '20121114') Term('index', '>', '20121114') Term('index', ['20121114','20121114']) Term('index', datetime(2012,11,14)) Term('index>20121114') added alias to the Term class; you can specify the nomial indexers (e.g. index in DataFrame, major_axis/minor_axis or alias in Panel) this should close GH pandas-dev#1996 5. added Col class to manage the column conversions 6. added min_itemsize parameter and checks in pytables to allow setting of indexer columns minimum size 7. added indexing support via method create_table_index (requires 2.3 in PyTables) btw now works quite well as Int64 indicies are used as opposed to the Time64Col which has a bug); includes a check on the pytables version requirement this should close GH pandas-dev#698 8. signficantlly updated docs for pytables to reflect all changes; added docs for Table sections 9. BUG: a store would fail if appending but the a put had not been done before (see test_append) this the result of incompatibility testing on the index_kind 10. BUG: minor change to select and remove: require a table ONLY if where is also provided (and not None) all tests pass; tests added for new features
I wonder if this would be a useful feature to extend this notation to regular DataFrames... has it been discussed before? (I think it may have been.) Someone was trying to roll theIr own DSL for this on SO... |
absolutely, hopefuly in #3202, #3393 going to implement
The theory is to accept a numpy-like DSL (but with frames/series/constants) that potentially need alignment and then pass the numpified to Which is also similar to the expressions in a bit non-trivial as to have to take the string expression, compile/parse it, walk the ast tree to find the aligning sections, then repackage to numexpr @hayd up for it???? |
Do you think going via Terms is a good solution:
and then the eval'd string would be parsed into that. That way, we could first get select working with Terms (which shouldn't be too bad), and then write the parser for the DSL (we have to come to a consensus on the grammar...). ? |
That is definitely a good start on it. The thoughts I had were:
Roughty equivalent to:
which is an easy way to compose (not that user friendly though), This then could replace the syntax in want to give it a try? |
Happy to give this a try. Will thrash this out to expressions later in the week, and ping back on the other thread. :) |
great! FYI the |
now
changes to pandas.io.pytables to support more natural selection (from tables):
store.select('mypanel', where = [ 'major>=20120103', 'major<=20120401', dict(minor = ['A','B','C' ]))
rather than existing
future
not sure that pandas should get really fancy just yet with operations - (e.g. 'or' operations, and actual value selection)
but probably necessary once pandas support 'chunking' type operations on pytables
need to build a full-fledged selection parser to translate to the numexpr type operations (maybe with a patsy backend????)
BUT this may actually be useful to support generic operations in this way on in-memory panels/frames
not sure of use cases here though - I usually just read in 'about' what data I need and sub-select from there
unless you have hundreds of millions of rows I don't know if its necessary to optimize more (in which case it is!)
The text was updated successfully, but these errors were encountered: