Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

melt function implementation #1205

Closed
aregm opened this issue Apr 15, 2020 · 2 comments · Fixed by #1689
Closed

melt function implementation #1205

aregm opened this issue Apr 15, 2020 · 2 comments · Fixed by #1689
Assignees
Labels
P1 Important tasks that we should complete soon pandas.dataframe Related to pandas.dataframe module
Milestone

Comments

@aregm
Copy link
Collaborator

aregm commented Apr 15, 2020

No description provided.

@aregm aregm added the pandas.dataframe Related to pandas.dataframe module label Apr 15, 2020
@devin-petersohn devin-petersohn added the P1 Important tasks that we should complete soon label May 13, 2020
@devin-petersohn devin-petersohn added this to the 0.7.4 milestone May 13, 2020
@dchigarev
Copy link
Collaborator

Current state:
For now there is two melt implementation that are ready, they both could be found in #1689. First applying melt to the row partitions, but requires slow reordering (performance of reordering could be improved after sort_values will be implemented in a scalable way). And the second applies melt to only subset of partitions (depends on columns that we needed in computation), but requires materialization of columns that specified in id_vars to broadcast them to every partition.

Discussion about these two approaches and the time measurements of both implementation could be found here.

@devin-petersohn
Copy link
Collaborator

@dchigarev I have left a comment in #1689 to use row partitioning for performance and scalability. Does that make sense?

dchigarev added a commit to dchigarev/modin that referenced this issue Jul 24, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
dchigarev added a commit to dchigarev/modin that referenced this issue Jul 24, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
dchigarev added a commit to dchigarev/modin that referenced this issue Jul 24, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
devin-petersohn pushed a commit that referenced this issue Jul 24, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
aregm pushed a commit to aregm/modin that referenced this issue Sep 16, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Important tasks that we should complete soon pandas.dataframe Related to pandas.dataframe module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants