Skip to content

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Jul 18, 2022

PR Description

Please describe the changes proposed in the pull request:

  • add dropna parameter to drop nulls , similar to stack
  • improve speed for scenarios where dot value is not involved - close to melt speed
  • improve speed in some cases for multiple values_to

This PR resolves #1132.

Tests ... pinch of salt

url = 'https://raw.githubusercontent.com/tidyverse/tidyr/main/data-raw/billboard.csv'
df = pd.read_csv(url)
df = pd.concat([df]*100, ignore_index = True)
df.shape
(31700, 81)

 %timeit df.melt(['year', 'artist', 'track', 'time', 'date.entered'], ignore_index = False).dropna(subset=['value'])
232 ms ± 3.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.pivot_longer(column_names = 'wk*', dropna = True, ignore_index = True)
196 ms ± 2.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

A = df.melt(['year', 'artist', 'track', 'time', 'date.entered'], ignore_index = False).dropna(subset=['value'])
B = df.pivot_longer(column_names = 'wk*', dropna = True, ignore_index = True)

A.reset_index(drop=True).equals(B)
True

%timeit df.pivot_longer(column_names = 'wk*', dropna = False)
146 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df.melt(['year', 'artist', 'track', 'time', 'date.entered'])
154 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@samukweku samukweku self-assigned this Jul 18, 2022
@ericmjl
Copy link
Member

ericmjl commented Jul 18, 2022

@codecov
Copy link

codecov bot commented Jul 18, 2022

Codecov Report

Merging #1136 (e335a70) into dev (a25c821) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev    #1136      +/-   ##
==========================================
- Coverage   97.37%   97.36%   -0.02%     
==========================================
  Files          77       77              
  Lines        3163     3186      +23     
==========================================
+ Hits         3080     3102      +22     
- Misses         83       84       +1     

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve! Once we have two approvals, the 2nd person who approves can do the honours of merging 😄.

@thatlittleboy thatlittleboy merged commit f748406 into dev Jul 24, 2022
@samukweku samukweku deleted the samukweku/pivot_longer_dropna branch July 25, 2022 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow dropna for pivot_longer
3 participants