WIP: use eval expression parsing as replacement for Term in HDFStore #4155

jreback · 2013-07-07T22:28:00Z

allows natural syntax for queries, with in-line variables allowed

some examples
"ts>=Timestamp('2012-02-01') & users=['a','b','c']"

['A > 0']

dates=pd.date_range('20130101',periods=5)
'index>dates'

Todos:

update docs / need new examples
API changes (disallow dict, and separated expressions)
tests for | and ~ operators, and invalid filter expressions
code clean up, maybe create new Expr base class to opt in/out of allowed operations
(e.g. Numexpr doesn't want to allow certain operations, but PyTables needs them,
so maybe define in the base class for simplicity, and just not allow them (checked in visit),
which right now doesn't allow a generic_visitor)
can ops.Value be equiv of ops.Constant?

…if none exists

cpcloud · 2013-07-08T15:09:50Z

That should wait for meta data though. Would be nice if you could encapsulate a bunch of related frames of different size and columns with some overlapping without having to merge them to make a frame with all of them. That's a bit out of scope here, but you can raise an issue maybe. Maybe there's a way to simulate this currently?

alvorithm · 2013-07-08T15:10:53Z

I think what one needs for that is an Schema object that will point to in memory and eventually offline columns in data frames.

jreback · 2013-07-08T15:12:03Z

@cpcloud ok....i am sold on the query, where you can specify column names and have them picked up.....i think pretty useful

so instead of

df[df.a>0 & df.b>0]

this

df.query('a>0 & b>0') and bonus it uses ne

cpcloud · 2013-07-08T15:13:00Z

Exactly.

alvorithm · 2013-07-08T15:23:08Z

Excellent! Just to make sure, indexing cannot support this syntax because it would be hard to disambiguate column names from query expressions?

cpcloud · 2013-07-08T15:29:38Z

Should be ok since the parsed expression of a single column name would just return the column. It's either a column name or an expression so we could support that.

…',5) (and show deprecation warning)

alvorithm · 2013-07-08T15:40:26Z

As soon as a special character appears it will /not/ be a column.

Where would it be a good place to argue for KeyError's on columns to check first on named indexes before giving up? I could also give a try at implementing it if there is interest.

cpcloud · 2013-07-08T16:30:25Z

@Meteore
not sure what you mean exactly about the special character...

it's perfectly reasonable to have unconventional column names...in fact I sometimes name my columns with latex markup if i'm going to plot it later.

e.g., with df['å < ∑'], df['ß'], etc., then if you have columns with those names that would work, but maybe this isn't what you're talking about...unicode column names would be fine...

@jreback
how can we coordinate our commits with minimal fuss? can we set up a branch on master and push to that? and then just push and pull (no rebase for now)?

cpcloud · 2013-07-08T16:34:22Z

@Meteore

re raising KeyError: wait until we refactor the parsers' code. i think it should go in the name resolution method of Terms...i haven't fully groked @jreback's stuff yet so that may change a bit...been busy setting up the faster travis stuff. should be able to get the refactor going tonight or later today.

jreback · 2013-07-08T16:34:27Z

@cpcloud
why don't u setup an eval branch in master then we can both push/pull
I think we can then track and rebase to master independently I think

cpcloud · 2013-07-08T16:40:14Z

@jreback done

pydata@f6feae7785f1331b01bd140653f8853d21bade1a

jreback · 2013-07-08T16:55:13Z

hmm....i must be doing something wrong....

i created a local branch which tracks eval-3393, then my branch eval3 tracks that
how to i push to update eval-3393 though? (i just create a new branch eval3 on the main repo, which is wrong)

cpcloud · 2013-07-08T17:18:50Z

assuming you're on eval3

git branch --set-upstream-to=upstream/eval-3393 # track the remote from eval3
git branch working-branch --track eval3 # working-branch tracks eval3
git fetch upstream
git rebase upstream/eval-3393
git checkout working-branch
git pull --rebase # <- rebase maybe not necessary

should do what you want and leaves you in the working branch

jreback · 2013-07-08T17:23:00Z

I think i am tracking origin/eval-3393

# On branch eval4
# Your branch is ahead of 'origin/eval-3393' by 11 commits.
#
nothing to commit (working directory clean)

and my commits are there
but when I

git push https://github.com/pydata/pandas.git origin/eval-3393

nothing gets updates (even if I -f)

??

cpcloud · 2013-07-08T17:26:39Z

what is the remote when u do git branch -vv?

jreback · 2013-07-08T17:27:59Z

This should be a direct tracking branch, not an indirect

* eval4           200b59f [origin/eval-3393: ahead 11] COMPAT: allow prior 0.12 query syntax for terms, e.g. Term('index','>',5) (and show deprecation warning)

cpcloud · 2013-07-08T17:29:18Z

what is origin? is that yours or pandas master?

cpcloud · 2013-07-08T17:31:40Z

i pushed your changes...

cpcloud · 2013-07-08T17:33:17Z

you should be able to checkout a branch now that tracks upstream/eval-3393 assuming upstream is git@github.com:pydata/pandas.git or https://github.com/pydata/pandas.git if you don't want to use the git protocol

cpcloud · 2013-07-08T17:36:09Z

https://github.com/pydata/pandas/tree/eval-3393

jreback · 2013-07-08T17:47:54Z

ok..let me start with that....origin is https://github.com/pydata/pandas.git

jreback · 2013-07-08T17:49:24Z

ok...all set, do we need to rebase? how do we act on this?

e.g. I make a commit and push...no prob

but I should git pull before (in case you pushed?)

what about rebasing?

cpcloud · 2013-07-08T17:51:08Z

everything is set to go. we can both push/pull, and we should both git pull before we git push every time just to make sure we have the latest

cpcloud · 2013-07-08T17:52:49Z

i'm not sure about rebasing though, i think we should be able to rebase at will...but i could be wrong there...e.g,. if you squash a bunch and then i pull will git want to keep the commits that i haven't squashed?

jreback · 2013-07-08T17:56:43Z

ok...cool.....should we do a PR on this just to have a central place?

jreback · 2013-07-08T17:56:55Z

and then close other ones?

jreback · 2013-07-08T18:18:08Z

closing in favor of #4162

cpcloud added 30 commits July 6, 2013 11:40

ENH: add new computation module and toplevel eval function

89a03be

ENH/TST: add new instance testing functions and their tests

bcd17b0

BUG: prevent certain index types from joining with DatetimeIndex

81bacd1

TST/ENH: add 2d bare numpy array and nan support

e380271

ENH: add modulus support

99a3d28

TST: add failing modulus tests

4db95fe

CLN: use format string for unicode

6000c89

CLN: remove engine detection and manip for datetimes

c25a1d4

CLN/ENH: add new interface to encapsulate Terms and Constants

1132bc4

ENH: allow an already-parsed expression to be passed to eval

54f1897

CLN: add automatic scope creating object

e20900a

CLN: make the environment an implementation detail

51d80f6

DOC: add docstring to eval

038d79c

CLN: cleanup pytables.py a bit

599cf32

CLN: clean up engines

ea769e6

CLN: clean up eval and have the Scope instance auto create the scope …

ff78c08

…if none exists

CLN: add six.string_types checking instead of basestring

f9f7fd7

TST: clean up some tests, add minor assertions where none existed

48eff13

CLN: clean up frame.py a bit

d87f027

CLN: clean up pytables arguments a bit

5b58a08

CLN: use shiny new string mixin to refactor repring

7482a27

CLN: move align to its own file

0d40fe1

CLN: clean up and use new stringmixin for Expr

87957d2

ENH/CLN: be more careful about unicode

e35cb5c

CLN: run autopep8 on pandas/io/pytables.py

1ceec39

DOC: reference future enhancingperf.eval section

c665a85

CLN/DOC: clean up docstrings in pytables

cb27934

CLN: actually pass fletcher32 in get_store

63ba37d

CLN: remove unused variables

dcde590

CLN: more pep8 and get rid of most raise Exception clauses

3c4e2b3

COMPAT: allow prior 0.12 query syntax for terms, e.g. Term('index','>…

e712762

…',5) (and show deprecation warning)

cpcloud mentioned this pull request Jul 8, 2013

ENH: add expression evaluation functionality via eval #4162

Merged

35 tasks

cpcloud closed this Jul 8, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: use eval expression parsing as replacement for Term in HDFStore #4155

WIP: use eval expression parsing as replacement for Term in HDFStore #4155

jreback commented Jul 7, 2013

cpcloud commented Jul 8, 2013

alvorithm commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

alvorithm commented Jul 8, 2013

cpcloud commented Jul 8, 2013

alvorithm commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

jreback commented Jul 8, 2013

jreback commented Jul 8, 2013

WIP: use eval expression parsing as replacement for Term in HDFStore #4155

WIP: use eval expression parsing as replacement for Term in HDFStore #4155

Conversation

jreback commented Jul 7, 2013

cpcloud commented Jul 8, 2013

alvorithm commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

alvorithm commented Jul 8, 2013

cpcloud commented Jul 8, 2013

alvorithm commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

jreback commented Jul 8, 2013

cpcloud commented Jul 8, 2013

cpcloud commented Jul 8, 2013

jreback commented Jul 8, 2013

jreback commented Jul 8, 2013

jreback commented Jul 8, 2013