Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: freq retention in value_counts #33830

Open
jbrockmendel opened this issue Apr 27, 2020 · 5 comments
Open

API/BUG: freq retention in value_counts #33830

jbrockmendel opened this issue Apr 27, 2020 · 5 comments
Assignees
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug freq retention User expects "freq" attribute to be preserved Frequency DateOffsets

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Apr 27, 2020

dti = pd.date_range('2016-01-01', periods=5)

dti.value_counts().index.freq    # <-- None
dti.factorize()[1].freq   # <-- None

mi = pd.MultiIndex.from_arrays([dti, dti])

mi.levels[0].freq   # <-- None

There is a comment in tests.indexes.datetimes.test_datetime test_factorize suggesting that freq should be preserved by factorize, but that is not checked and would fail if it were

        # freq must be preserved
        idx3 = date_range("2000-01", periods=4, freq="M", tz="Asia/Tokyo")
        exp_arr = np.array([0, 1, 2, 3], dtype=np.intp)
        arr, idx = idx3.factorize()
        tm.assert_numpy_array_equal(arr, exp_arr)
        tm.assert_index_equal(idx, idx3)

So the question: do we want to try to preserve freq in factorize?

xref #33677 for the MultiIndex case

Update One more: Categorical:

dti = pd.date_range('2016-01-01', periods=5)
cat = pd.Categorical(dti)
cat.categories.freq   # <-- None
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 27, 2020
@jbrockmendel jbrockmendel changed the title API/BUG: freq retention in DatetimeIndex factorize, value_counts, MultiIndex API/BUG: freq retention in DatetimeIndex factorize, value_counts, MultiIndex, Categorical Apr 27, 2020
@TomAugspurger TomAugspurger removed the Needs Triage Issue that has not been reviewed by a pandas team member label Apr 28, 2020
@mroeschke
Copy link
Member

Looks like all but the value_counts example is preserving freq now. Could use tests for those

In [8]: dti = pd.date_range('2016-01-01', periods=5)
   ...:
   ...: dti.value_counts().index.freq

In [9]: dti.factorize()[1].freq
Out[9]: <Day>

In [10]: mi = pd.MultiIndex.from_arrays([dti, dti])

In [11]: mi.levels[0].freq
Out[11]: <Day>

In [12]: dti = pd.date_range('2016-01-01', periods=5)
    ...: cat = pd.Categorical(dti)
    ...: cat.categories.freq
Out[12]: <Day>

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions Frequency DateOffsets labels Jul 31, 2021
@das-anubhav
Copy link

Hi!

I would like to solve this issue.

@bgollop
Copy link
Contributor

bgollop commented Mar 21, 2022

take

andjhall added a commit to andjhall/pandas that referenced this issue Apr 5, 2022
… was preserving DateTimeIndex freq attribute
@bgollop
Copy link
Contributor

bgollop commented Apr 7, 2022

@mroeschke, I noticed that tests were made for freq retention in DatetimeIndex factorize and MutliIndex (PR #38120). I was able to make a test case for Categorical, however I wasn't sure if a test case for value_counts was needed yet as it looks like it is still not preserving freq. Should I include it?

@mroeschke
Copy link
Member

I wasn't sure if a test case for value_counts was needed yet as it looks like it is still not preserving freq

Feel free to skip a test for value_counts unless you'd like to also fix the bug

andjhall added a commit to andjhall/pandas that referenced this issue Apr 14, 2022
…egorical was preserving DateTimeIndex freq attribute"

This reverts commit e4b8cbf.
@mroeschke mroeschke added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed good first issue Needs Tests Unit test(s) needed to prevent regressions labels Apr 18, 2022
mroeschke pushed a commit that referenced this issue Apr 18, 2022
…46779)

* Added test for github issue #33830 to test that categorical was preserving DateTimeIndex freq attribute

* Added an additional test for value_counts preserving frequency

* Modified tests in test_constructors.py to pass the pre-commit check

* Removed tests from test_constructors that were not needed

* Moved categorical freq retention test to pandas/tests/arrays/categorical/

Co-authored-by: Brian Gollop <bgollop@umich.edu>
@mroeschke mroeschke changed the title API/BUG: freq retention in DatetimeIndex factorize, value_counts, MultiIndex, Categorical API/BUG: freq retention in value_counts Apr 18, 2022
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this issue Jul 13, 2022
…andas-dev#46779)

* Added test for github issue pandas-dev#33830 to test that categorical was preserving DateTimeIndex freq attribute

* Added an additional test for value_counts preserving frequency

* Modified tests in test_constructors.py to pass the pre-commit check

* Removed tests from test_constructors that were not needed

* Moved categorical freq retention test to pandas/tests/arrays/categorical/

Co-authored-by: Brian Gollop <bgollop@umich.edu>
@jbrockmendel jbrockmendel added the freq retention User expects "freq" attribute to be preserved label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug freq retention User expects "freq" attribute to be preserved Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

5 participants