Skip to content

Corrwith when "other" has different columns #9823

Open
@olgabot

Description

It would be great if corrwith could calculate correlations between dataframes that have different column names, but the same index.

For example, take the two (10, 5) df1 and df2 dataframes below.

import pandas as pd
import numpy as np
import string

nrow = 10
ncol = 5

axis = 0

index = list(string.ascii_lowercase[:nrow])
columns = list(string.ascii_uppercase[:ncol])

df1 = pd.DataFrame(np.random.randn(nrow, ncol), index=index, columns=columns)
df2 = pd.DataFrame(np.random.randn(nrow, ncol), index=index)

df1 and df2 have different columns, and I'd like to create a 5x5 matrix of the correlations of their columns, on the values in each row.

I've implemented a stopgap measure here: https://github.com/YeoLab/flotilla/blob/d9e53c219320c5d5dbbbfa41769abb2ab6f25574/flotilla/compute/generic.py#L429

Is this a planned feature for future releases?

Probably also related to the method issue: #9490

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions