-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: eval function #3393
Comments
To follow on from your comment, we shouldn't we be using Also I haven't got my head around numexpr yet, so I may be talking complete nonsense. (I've moved Term to expressions without breaking things, and changed the repr to eval back to itself (was there a reason for it not?). |
I agree about the operators (though I think you actually need to accept both), these are always going to be in a string expression in any event....because you need delayed evaluation e.g.
vs
|
i'm sure everyone involved in this thread knows this but just wanted to point out that the precedence of |
@cpcloud this is in a string context, so in theory you can make them the same (as this is a big confusion I think in operating in pandas, I think people expect them to be the same (even though they are wrong) |
@jreback sure. i was just semi-thinking-out-loud here, thought that it might warrant a discussion. this goes back to python core devs not wanting to provide the ability to overload |
it's a valid point the purpose of eval is to facilitate multi expression parsing that we will evaluate in Numexpr |
@jreback u can do it with the |
@hayd said he was giving a stab |
@cpcloud |
@cpcloud I reread your question the issue is this: while
that's the main reason for this function |
@jreback that is pretty cool. i haven't done much with numexpr, i assumed that pandas uses it when it can...is that a fallacious assumption? should i be explicitly using numexpr? |
it's used in pytables for query processing see the core/expressions module |
I haven't done much so far, I've moved Term to expressions and added some helper functions for that class, not have I really looked in to numexp yet. I kind of lost my way on the road map... and may be totally confused atm. Am I way off here?
|
so there are 3 goals here:
into this (call this the parsed_expr)
so I think your 4 is my 3 I don't think you need 5 |
@jreback i know u said skip 1 but i can do that if u want (lots of nice python builtins for dealing with python source) while @hayd does 2 and 3. what would be allowed to be parsed? |
the more there merrier! let's start with the example
This is really about the masks as that's where all the work is done but I think it would be nice evenutally to do something on the rhs as well:
so we can support (imagine engine = 'multi-process' or 'out-of-core')...... to the heck with blaze! (well maybe |
I think I was worried that nested Terms wouldn't come for free with _And and _Or, but I'll put something together imminently and we can see whether it does. :) |
We can just tell everyone it's blaze... |
i've got it parsing nested and terms already :) |
albeit they are strings right now and only |
@cpcloud I would just use the |
the end goal is to create a e.g.
Term align (pseudo codish)
maybe return a new expression that is aligned |
ah i see. so an |
I think you actually need 3 classes here:
e.g.
yields
|
lol gh doesn't like ur rst flavored monospace |
This was where I was up to: https://github.com/hayd/pandas/tree/term-refactor |
possible engines right now are |
well....pytables target is the same, numexpr, only difference is that the Terms need to do different alignment (as they are scalar type conditions, e.g. |
@jreback @hayd fyi for some reason |
oh that is nice. still have the issue of the different behav tho |
interesting and possibly alarming bit....
i'm guessing this is because of the large power term and because the arrays are big, but i don't see why the L2 norm should be that different (order of magnitude of difference is
|
ideally the norm should be 0 |
i think |
well i don't think it's a bug in |
k not "dumb" loop unrolling maybe there is some other optimization technique or this is just a straight up bug
|
maybe some sort of overflow? |
i believe the optimization is the cause of the divergence:
|
I would just cast them |
cast mod u mean right? can't really cast floordiv result as that would defeat the purpose of this... |
yes |
http://pandas.pydata.org/pandas-docs/dev/enhancingperf.html would be a good place for docs on eval |
I think it makes sense to cast to float, very simple to get back
|
yep this required a pretty large refactoring since to cast in a general way the op needs to know about the scope of its operands |
must cast recursively down the parse tree |
so that on eval the correct cast is performed...this will work unless there's floor division on both sides, but in that case you shouldn't be using eval anyway since that will run only on the python engine in other news...implementing an operator in numexpr is not trivial...i thought about doing it but it's kind of a beast...maybe i will anyway |
eval is useful for two things as stated above:
|
@jreback is it intentional that
|
basically force the frame on the lhs of modulus is what's happening |
i'll submit a pr to fix it |
Provide a top-level
eval
function, something like:pd.eval(function_or_string,method=None, **kwargs)
to support things like:
out-of-core computation (locally) (see ENH: create out-of-core processing module #3202)
string evaluation which does not use python lazy evaluation
(so pandas can process effiiently)
pd.eval('df + df2',method='numexpr')
(or maybe default to numexpr)see also:
http://stackoverflow.com/questions/16527491/python-perform-operation-in-string
pd.eval('df + df2',method='picloud')
http://www.picloud.com/ (though they seem to have really old versions of pandas),
but I think they handle it anyhow
The text was updated successfully, but these errors were encountered: