-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add sparse option in pivot_table / pivot / unstack #14493
Comments
this is possible; would have to be integrated into unstacking proper. We do do something like this in |
i might try my best for a pull-request :) |
@jreback i noticed that there isn't many options for sparse_matrix. (which in my case want something df.to_csr_matrix) is that ok, i add some sparse_matrix to current limited scipy.coo_matrix only collection ? |
adding sparse means pandas sparse structures: http://pandas.pydata.org/pandas-docs/stable/sparse.html |
I have implemented, I think, most of the logic for sparse unstacking of series (or homogeneous dtypes) at master...jnothman:sparse-unstack So far I am:
I'd be very happy for someone else to complete the work! |
Maybe then it can be downgraded to Effort Medium :P |
In pandas 0.25, making a series have a sparse dtype and then unstacking it seems to produce a sparse DataFrame (see https://stackoverflow.com/questions/58617185/converting-a-list-of-counters-to-sparse-pandas-dataframe/58617186). I'm assuming this does not require a dense in-memory structure along the way. Maybe this can be closed? |
I know this is a really old issue, but I believe there are many scenarios where this functionality would be useful. For instance, in my case, I'm trying to pivot order details (line items, where one order has multiple lines, with different items with different quantities/prices ordered). I want to see if I can develop a predictive model where quantities of particular items predict outcomes for an order, so I need to engineer features to pivot the item numbers into columns. I have 10 million rows of item detail on 3.3 million orders with about 15K unique items. So the pivot would have 15K columns, with only about 3 columns populated on average. I run out of memory when I try to pivot with Pandas, and I have 64GB of RAM. |
I need to pivot a big table (100 million rows, 4 cols). The pivoted table is insanely big. It has to be returned as a sparse matrix, but it isn't. |
it is a hack around (would love to see this ENH become a feature) https://medium.com/@michelkluger/pivot-to-sparse-dataframe-d0b1759a9d14 |
A small, complete example of the issue
Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: