-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable unary math operations for pandas, sqlite #1071
Conversation
0dc84c3
to
13a0422
Compare
d6c9085
to
3322844
Compare
@wesm can you take a look here? bit of a rabbit hole with decimals, otherwise just adding unary ops to series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but otherwise LGTM
if column_name in schema: | ||
ibis_type = dt.validate_type(schema[column_name]) | ||
elif dtype == np.object_: | ||
inferred_dtype = infer_dtype(df[column_name].dropna()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. I guess we should make a NaN-friendly type inference function someplace (seems like an oversight in infer_dtype originally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u post an issue in pandas tracker about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: pandas-dev/pandas#17059
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I PR'd it :) pandas-dev/pandas#17066.
def execute_series_unary_op(op, data, scope=None): | ||
function = getattr(np, type(op).__name__.lower()) | ||
if data.dtype == np.dtype(np.object_): | ||
return data.apply(functools.partial(execute_node, op, scope=scope)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Series.apply
different from Series.map
(in behavior or performance)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think so, @jreback any idea here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it looks like Series.map
accepts a dict
, to support a simple CASE
-statement-like operation, as well as callables, whereas apply only deals with callables. For callables, both methods use the same underlying function lib.map_infer
to call the passed in callable in a Cython loop. Here we could use either since we're only dealing with callables.
ibis/pandas/execution.py
Outdated
def execute_series_log_with_base(op, data, base, scope=None): | ||
if data.dtype == np.dtype(np.object_): | ||
func = np.vectorize(functools.partial(execute_node, op, scope=scope)) | ||
return pd.Series(func(data, base), index=data.index, name=data.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this bit could be factored out into a helper function since it's repeated a couple times (in case it's useful in future execution rules)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good idea!
|
||
def _floor_divide(t, expr): | ||
left, right = map(t.translate, expr.op().args) | ||
return sa.func.floor(left / right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does integer division in postgres yield doubles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it yields integers. This is implemented so that it works regardless of the type of left
and right
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I was just curious =)
Implement decimal for pandas Add SQLite unary ops Fix operations in postgres that require numeric
Merging on green. |
Implement decimal for pandas
Add SQLite unary ops
Fix operations in postgres that require numeric