Speed up FERC 714 hourly demand transform #873
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pudl.transform.ferc714.demand_hourly_pa
took about 10 minutes to complete on 14 years of data. As of now, I have it down to 15 s.Below are some of the steps I took to speed things up. Timings below are for a single year (2006).
report_date
format is consistent, use an explicitformat=
inpd.to_datetime
.id_vars
inpd.DataFrame.melt
before they are replicated (25 times in this case) in the melt.pd.Series.map
, notpd.Series.replace
which is much slower.pd.DataFrame.assign(column=*)
instead ofdf['column'] = *
– chains are tidy but come at a performance cost that add up with large dataframes. The use of helper functions that accept a dataframe and return a dataframe only complicates matters. Such functions that do not alter the number of rows in the dataframe should probably just return the result (so you can choose what to do with the result) rather than copy the dataframe, modify it, and return it. (I would also discourage functions likef(df)
in favor of e.g.f(report_date, demand_mwh)
so that it is clear from the function signature what variables are needed for the calculation.)A contrived example to demonstrate: