Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add option to use absolute value in pct_change #56618

Open
3 tasks done
Nick-Git2 opened this issue Dec 25, 2023 · 22 comments
Open
3 tasks done

ENH: Add option to use absolute value in pct_change #56618

Nick-Git2 opened this issue Dec 25, 2023 · 22 comments
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@Nick-Git2
Copy link

Nick-Git2 commented Dec 25, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
test = pd.DataFrame()
test['testdata'] = [-68, 5144,10000]
test['pct_changes'] = test['testdata'].pct_change()
print (test['pct_changes'])
0          NaN
**1   -76.647059**
2     0.944012
Name: pct_changes, dtype: float64


Issue Description

correct result for row 1 should be 76.647059 and not -76.647059. It seems the current implementation of pct_change() lacks the abs() function and thus returns the wrong sign when the base is negative

Expected Behavior

percentage_change = (new_value - old_value) / abs(old_value)
(5144 - (-68)) / abs(-68) = 76.647059

Installed Versions

INSTALLED VERSIONS

commit : a671b5a
python : 3.11.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.133.1-microsoft-standard-WSL2
Version : #1 SMP Thu Oct 5 21:02:42 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.4
numpy : 1.24.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.17.2
pandas_datareader : 0.10.0
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2023.9.1
gcsfs : None
matplotlib : 3.8.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 13.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@Nick-Git2 Nick-Git2 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 25, 2023
@rhshadrach
Copy link
Member

Thanks for the report. Taking the absolute value is common but not universal. It also gives counter-intuitive results - e.g. going from -20 to 20 is a smaller percent change than going from -10 to 20 (200% vs 300%).

In my opinion, one should not think of "percent change" as a measure how much something is getting more positive, but rather how much it's moving away from zero. Positive percent change means it's moving more away from zero, negative percent change means it's moving toward (and possibly crossing) 0. With this interpretation, not taking absolute value is correct.

Since it's fairly common to take the absolute value however, I wouldn't be opposed to adding an argument.

@rhshadrach rhshadrach added Enhancement Needs Discussion Requires discussion from core team before further action and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 26, 2023
@rhshadrach rhshadrach changed the title BUG: pct_change not calculated correctly (wrong sign) ENH: Add option to use absolute value in pct_change Dec 26, 2023
@rhshadrach
Copy link
Member

I've reworked this issue as an enhancement request.

@paridhisharma24
Copy link

can I work on this

@rhshadrach rhshadrach added the Numeric Operations Arithmetic, Comparison, and Logical operations label Dec 26, 2023
@rhshadrach
Copy link
Member

cc @pandas-dev/pandas-core - any thoughts on adding an argument here?

@attack68
Copy link
Contributor

attack68 commented Jan 2, 2024

isn't there an absolute function that can be applied to the result to return exactly the intent without the need for an argument?

I dont believe percent change should be absolute by default and it feels more pythonic, to me, to manipulate the outcome with existing functions than adding a keyword

@rhshadrach
Copy link
Member

@attack68 - we need to apply abs to the denominator alone, not the entire result. This can currently be accomplished with .shift, .abs, a division, and a multiplication, but it's a bit messy.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 2, 2024

Adding an argument allows keeping the current behavior, which can also be non-intuitive:

>>> import pandas as pd
>>> s=pd.Series([-40, -30, -20])
>>> s.pct_change()
0         NaN
1   -0.250000
2   -0.333333
dtype: float64

It seems odd to report -0.25 when the value increased by 25%

One could also argue that computing percent change relative to a negative number is not well defined, and we should raise an exception.

So maybe the extra argument should have 3 options:

  • "Keep sign" (current behavior)
  • "Ignore sign" (take the absolute value)
  • "Raise exception" (if any number is negative, raise an exception)

Advantage of the third option is that it might help identify an error in your dataset or data pipeline.

@rhshadrach
Copy link
Member

It seems odd to report -0.25 when the value increased by 25%

I don't think so. Ref: #56618 (comment)

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 2, 2024

It seems odd to report -0.25 when the value increased by 25%

I don't think so. Ref: #56618 (comment)

It probably depends on what you are measuring in the original series, and how you interpret the result. Your example was flipping the sign between 2 adjacent entries. My example is keeping the sign negative between two entries.

It seems to me that a positive percent change means things increased, and a negative percent change means things decreased. In my example, seeing a -0.25 when the value increased seems counter-intuitive. And if the sign flipped, I'm not sure how to interpret the result (hence my suggestion to raise an exception)

@rhshadrach
Copy link
Member

It probably depends on what you are measuring in the original series, and how you interpret the result. Your example was flipping the sign between 2 adjacent entries. My example is keeping the sign negative between two entries.

I don't think this is significant. In your example, we are going from -40 to -30. We agree going from 40 to 30 or 40 to 50 is a -25% and 25% change respectively. My claim here is that you should interpret the negative sign not as "getting more positive" or "getting more negative", but rather, moving toward 0. In this interpretation, going from -40 to -30 being a -25% change is not counter-intuitive.

It seems to me that a positive percent change means things increased, and a negative percent change means things decreased

It does - as long as you consider the "size of a number" to be its absolute value. For example, -100000 is a large number when compared to 0.1.

@WillAyd
Copy link
Member

WillAyd commented Jan 2, 2024

I think there are already similar issues to this that have been closed / rejected in the past:

#40911
#22596
#44102

I don't think there is a universal agreement on what these values would represent, nor do I think expanding API for this is worth it

@rhshadrach
Copy link
Member

Thanks @WillAyd. No strong opinion here - but from #22596 the closing comment has 14 down votes and 4 confused emojis. This is also the fourth time this request has come up. I do think that user feedback is something to consider.

@attack68
Copy link
Contributor

attack68 commented Jan 3, 2024

@attack68 - we need to apply abs to the denominator alone, not the entire result. This can currently be accomplished with .shift, .abs, a division, and a multiplication, but it's a bit messy.

Good point. No opinion either way then.

@MarcoGorelli
Copy link
Member

Do you have a suggestion on what the new argument should be?

Personally, between

  • s.pct_change(denominator='ignore sign')
  • s.diff()/s.shift().abs()

I think I'd find the second one clearer

@attack68
Copy link
Contributor

attack68 commented Jan 3, 2024

Do you have a suggestion on what the new argument should be?

Personally, between

  • s.pct_change(denominator='ignore sign')
  • s.diff()/s.shift().abs()

I think I'd find the second one clearer

Agree I would do the second by default and probably not even assume that a pct_change function exists. To avoid op risk I would probably also avoid it due the ambiguity highligted in this thread and the definite nature of the more fundamental ops.

@bashtage
Copy link
Contributor

bashtage commented Jan 3, 2024

I think the fundamental issue here is that pct change doesn't make much sense when the sign of the a series is not consistent (and in almost all applications, strictly positive). I can see some real downsides of letting errors in code or data go unnoticed if abs was applied by the denominator by default.

I agree that the existence of s.diff()/s.shift().abs() as a simple one-liner alternative should be sufficient for users who want a modified pct_change behavior.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 3, 2024

I read through the discussions in the 3 links that @WillAyd provided.

While s.diff()/s.shift().abs() is the workaround, I don't think we can assume that users would be aware of the ambiguities that would occur by using pct_change() as currently implemented on a Series that has inconsistent signs.

I'd like us to consider either removing pct_change() from the API (which then forces people to implement it differently) or changing the current behavior to produce a warning (or raise an exception) if the Series has numbers with mixed signs. Right now, without such a warning, people would never know to use the workaround.

@MarcoGorelli
Copy link
Member

could just document s.diff()/s.shift().abs() in the pct_change docs, instead of removing pct_change?

@attack68
Copy link
Contributor

attack68 commented Jan 3, 2024

I'd like us to consider either removing pct_change() from the API (which then forces people to implement it differently)

I also thought removing it might be the way to go - trim down the library with a structured deprecation notice advising how best to replicate the behaviour.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 3, 2024

could just document s.diff()/s.shift().abs() in the pct_change docs, instead of removing pct_change?

I don't think that helps the users discover when they should use the workaround, so that's why I also suggested raising a warning or exception if there are mixed signs in the series.

@rhshadrach
Copy link
Member

I don't think that helps the users discover when they should use the workaround, so that's why I also suggested raising a warning or exception if there are mixed signs in the series.

The current behavior of pct_change with mixed signs is a mathematically correct and interpretable result. I'm -1 on raising a warning in such a case.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 5, 2024

The current behavior of pct_change with mixed signs is a mathematically correct and interpretable result. I'm -1 on raising a warning in such a case.

Yes, but from the multiple people that have raised issues indicating they were confused with the result, that indicates the values are subject to interpretation, and there is even a debate on what is "mathematically correct". IMHO, if you have mixed signs in your series, it is a possible indication that you have an error in your data, and that the results may come out in an unexpected way.

We could also add an argument such as check_for_mixed_signs: bool = False as the default, and if the value is True, then we check for mixed signs and raise a warning. Or do as suggested in the issue, which is to create an argument that lets users choose the behavior. At least by creating an argument, we are making people more aware of the issue should they encounter it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Discussion Requires discussion from core team before further action Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants