You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If statements are a pain in Pandas. The .where method is confusing, and most folks resort to writing a function and using .apply to use old-school Python style conditionals.
Also, I'm not a fan of using sequences of .loc assignment because it breaks chaining.
I've resorted to using np.select. It would be great to have this as native functionality.
Consider calculating On-balance Volume (OBV)
Many resort to this:
# naive
def calc_obv(df):
df = df.copy()
df["OBV"] = 0.0
# Loop through the data and calculate OBV
for i in range(1, len(df)):
if df["Close"][i] > df["Close"][i - 1]:
df["OBV"][i] = df["OBV"][i - 1] + df["Volume"][i]
elif df["Close"][i] < df["Close"][i - 1]:
df["OBV"][i] = df["OBV"][i - 1] - df["Volume"][i]
else:
df["OBV"][i] = df["OBV"][i - 1]
return df
calc_obv(aapl)
def select(self, condlist, choicelist, default=0):
"""
Return an array drawn from elements in choicelist, depending on conditions.
Parameters
----------
condlist : list of bool ndarrays
The list of conditions which determine from which array in `choicelist`
the output elements are taken. When multiple conditions are satisfied,
the first one encountered in `condlist` is used.
choicelist : list of ndarrays
The list of arrays from which the output elements are taken. It has
to be of the same length as `condlist`.
default : scalar, optional
The element inserted in `output` when all conditions evaluate to False.
Returns
-------
output : Dataframe/Series
The output at position m is the m-th element of the array in
`choicelist` where the m-th element of the corresponding array in
`condlist` is True.
"""
return np.select(condlist, choicelist, default)
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
If statements are a pain in Pandas. The
.where
method is confusing, and most folks resort to writing a function and using.apply
to use old-school Python style conditionals.Also, I'm not a fan of using sequences of
.loc
assignment because it breaks chaining.I've resorted to using
np.select
. It would be great to have this as native functionality.Consider calculating On-balance Volume (OBV)
Many resort to this:
Here's my attempt with
.where
:Here's my
np.select
version:Feature Description
New method (adapted from NumPy):
Alternative Solutions
np.select
- https://numpy.org/doc/stable/reference/generated/numpy.select.htmlPolars
when
- https://pola-rs.github.io/polars-book/user-guide/dsl/expressions.html#binary-functions-and-modificationAdditional Context
No response
The text was updated successfully, but these errors were encountered: