Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timestamp origin takes no effect in resample for 'MS' frequency #53938

Merged
merged 14 commits into from
Jul 25, 2023

Conversation

mcgeestocks
Copy link
Contributor

@mcgeestocks mcgeestocks commented Jun 29, 2023

@mcgeestocks mcgeestocks marked this pull request as ready for review June 29, 2023 20:50
@mcgeestocks mcgeestocks marked this pull request as draft June 29, 2023 20:51
@mcgeestocks
Copy link
Contributor Author

It seems the _get_timestamp_range_edges() never accounted for the edge case involving a combination of a DateTimeLike origin and a freq.

The change adds a simple check to modify the way the range is set ( first - freq ) vs ( freq.rollback() ) depending on the inputs.

This gives the behaviour you're after @MarcoGorelli and preserves all other functionality.
Let me know if I'm missing anything.

@mcgeestocks mcgeestocks marked this pull request as ready for review June 29, 2023 22:45
@MarcoGorelli
Copy link
Member

thanks for your pr - this looks a bit complex, but it's encouraging that tests pass - I'll check next week!

@mcgeestocks
Copy link
Contributor Author

thanks for your pr - this looks a bit complex, but it's encouraging that tests pass - I'll check next week!

Hey @MarcoGorelli just bumping this up.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR

Does this always work? For example, if we have

In [39]: df = pd.DataFrame({'ts': [datetime(1999, 12, 31, 20)], 'values': [10.]})

In [40]: df
Out[40]:
                   ts  values
0 1999-12-31 20:00:00    10.0

and do

df.resample('3YS', on='ts', closed='left', label='left', origin=datetime(1995, 1, 1))['values'].sum()

then I'd expect the windows to be:

  • [1995-01-01, 1998-01-01)
  • [1998-01-01, 2001-01-01)

But instead, I see:

In [42]: df.resample('3YS', on='ts', closed='left', label='left', origin=datetime(1995, 1, 1))['values'].sum()
Out[42]:
ts
1999-01-01    10.0
Freq: 3AS-JAN, Name: values, dtype: float64

@mcgeestocks
Copy link
Contributor Author

I'll need to switch approaches. My solution only covers cases where the origin is within the period specified.

@MarcoGorelli
Copy link
Member

could always raise an error if it's not, we don't need to do 4-D gymnastics, the most important thing is that for the cases when the computation succeeds, it's correct

@mcgeestocks
Copy link
Contributor Author

Alright @MarcoGorelli I updated what I was doing to be a bit more straightforward.
Now when the origin is a timestamp it's taken into account when setting timestamp_range_edges during resampling.
I also had to update a few test to account for the desired behavior.

@mroeschke mroeschke added Resample resample method Frequency DateOffsets labels Jul 10, 2023
@mcgeestocks
Copy link
Contributor Author

Looks like I also need to update the Docs with the new expected behavior

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tz-naive, this seems correct. Nice one!

I think we just need some extra care for the tz-aware case

@mcgeestocks
Copy link
Contributor Author

For tz-naive, this seems correct. Nice one!

I think we just need some extra care for the tz-aware case

Added a check on the equality of Daylight Savings Time via dst. All checks pass including your proposed edge case.
I believe everything else is taken care of.

pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/core/resample.py Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Member

thanks for updating - this looks close 💪

@mcgeestocks
Copy link
Contributor Author

Anything else you want to see here @MarcoGorelli?

@MarcoGorelli
Copy link
Member

this is probably fine - will take a (hopefully final) look tomorrow

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, great work here

Leaving open a little in case anyone has objections, but I think this should go in for 2.1

@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Jul 24, 2023
@MarcoGorelli MarcoGorelli merged commit dfc4c39 into pandas-dev:main Jul 25, 2023
32 checks passed
@mcgeestocks mcgeestocks deleted the resample-bug branch July 25, 2023 16:25
jamie-harness pushed a commit to jamie-harness/pandas1 that referenced this pull request Jul 25, 2023
…andas-dev#53938)

* add missing origin check

* whats new

* resolve edge cases, fix tests

* update docs

* cleanup

* accomidate daylight savings time

* correct docs and remove nested checks

* slim down example

* add back tz aware test
jamie-harness added a commit to jamie-harness/pandas1 that referenced this pull request Jul 26, 2023
…andas-dev#53938) (#17)

* add missing origin check

* whats new

* resolve edge cases, fix tests

* update docs

* cleanup

* accomidate daylight savings time

* correct docs and remove nested checks

* slim down example

* add back tz aware test

Co-authored-by: Conrad Mcgee Stocks <mcgee.stocks@gmail.com>
jamie-harness pushed a commit to jamie-harness/pandas1 that referenced this pull request Jul 26, 2023
…andas-dev#53938)

* add missing origin check

* whats new

* resolve edge cases, fix tests

* update docs

* cleanup

* accomidate daylight savings time

* correct docs and remove nested checks

* slim down example

* add back tz aware test
jamie-harness added a commit to jamie-harness/pandas1 that referenced this pull request Jul 26, 2023
…andas-dev#53938) (#29)

* add missing origin check

* whats new

* resolve edge cases, fix tests

* update docs

* cleanup

* accomidate daylight savings time

* correct docs and remove nested checks

* slim down example

* add back tz aware test

Co-authored-by: Conrad Mcgee Stocks <mcgee.stocks@gmail.com>
MarcoGorelli added a commit to MarcoGorelli/pandas that referenced this pull request Sep 9, 2023
MarcoGorelli added a commit to MarcoGorelli/pandas that referenced this pull request Sep 9, 2023
MarcoGorelli added a commit to MarcoGorelli/pandas that referenced this pull request Sep 9, 2023
MarcoGorelli added a commit to MarcoGorelli/pandas that referenced this pull request Sep 9, 2023
mroeschke pushed a commit that referenced this pull request Oct 9, 2023
…equency (#53938)" (#55077)

* Revert "BUG: Timestamp origin takes no effect in resample for 'MS' frequency (#53938)"

This reverts commit dfc4c39.

* note that origin only takes effect for tick frequencies

* fixup doctest

* move whatsnew to v2.1.2

---------

Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com>
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Oct 9, 2023
MarcoGorelli added a commit that referenced this pull request Oct 24, 2023
…es no effect in resample for 'MS' frequency (#53938)") (#55459)

* Backport PR #55077: Revert "BUG: Timestamp origin takes no effect in resample for 'MS' frequency (#53938)"

* fixup doctest

* noop

* fixup doctests

---------

Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>
Co-authored-by: MarcoGorelli <33491632+MarcoGorelli@users.noreply.github.com>
Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Timestamp origin takes no effect in resample for 'MS' frequency
4 participants