Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738

Merged
merged 1 commit into from
Jul 25, 2014

Conversation

seth-p
Copy link
Contributor

@seth-p seth-p commented Jul 12, 2014

Closes #7542.

@seth-p
Copy link
Contributor Author

seth-p commented Jul 12, 2014

I left the following unchanged for two distinct DataFrames with pairwise=False, as it's not obvious to me what the desired behavior is in this case.

            if pairwise is False:
                ...
                res_columns = arg1.columns.union(arg2.columns)

results[i][j] = f(*_prep_binary(arg1[k1], arg2[k2]))
p = Panel.from_dict(results).swapaxes('items', 'major')
p.major_axis = arg1.columns[p.major_axis]
p.minor_axis = arg2.columns[p.minor_axis]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reindex instead here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, reindex() will rearrange existing rows/columns (labels and values), whereas what I'm doing here is renaming the row/columns without changing the values. Note that I construct results to be indexed by column index (rather than label), so that the order is maintained, and then I simply apply the labels without touching the values. I think if I use reindex() here I will just end up with all NaNs.

@seth-p
Copy link
Contributor Author

seth-p commented Jul 13, 2014

I updated the code to properly deal with multiple columns with the same label (except for the case of two distinct DataFrames and pairwise=False, where it's not obvious to me what the desired behavior would be).

@seth-p seth-p changed the title BUG: _flex_binary_moment() doesn't preserve column order BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label Jul 13, 2014
@jreback jreback added this to the 0.15.0 milestone Jul 14, 2014
@@ -873,6 +873,122 @@ def test_expanding_corr_pairwise_diff_length(self):
assert_frame_equal(result3, expected)
assert_frame_equal(result4, expected)

def test_pairwise_stats_column_names_order(self):
# GH 7542
df1_columns_ordered = DataFrame([[2,4],[1,2],[5,2],[8,1]], columns=[0,1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the ordering (in the current impl) has to do with the dtype mix. So can you do tests with say mixed int/floats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional tests. Let me know if that's not what you had in mind.

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

need a release note for v0.15.0 in bug fixes

@@ -278,7 +278,10 @@ Bug Fixes
- Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
- Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)


- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say fixes the issue where...... and non-unique columns (instead of multiple columsn with the same name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Better now?

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

can't use a context manager under. 2.6 like that

iirc just create a function
thrn call
self.assertRaises(ValueError, f)

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

How do I test the message in the ValueError? The following works for me on 3.4, but per https://docs.python.org/2.6/search.html?q=assertRaisesRegex&check_keywords=yes&area=default I suspect it doesn't work on 2.6.

                    self.assertRaisesRegex(ValueError, "'arg1' columns are not unique", f, df, df2)
                    self.assertRaisesRegex(ValueError, "'arg2' columns are not unique", f, df2, df)

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

their are lots of examples in the code, e.g. : https://github.com/pydata/pandas/blob/master/pandas/tests/test_index.py#L52 (and you can use a context manger). don't use the self.assertRaisesRegex, rather the tm.assertRaisesRegex (which is fixed for 2.6)....

hmm...maybe should fix that

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

OK, updated to use tm.assertRaisesRegexp. Thanks for the help...

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

@seth-p I added them to the TestCase anyhow (so when you do self.assertRaisesRegexp will work (for the future)

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

... tm.assertRaisesRegexp worked.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

ok, looks good....ping when green

@seth-p
Copy link
Contributor Author

seth-p commented Jul 24, 2014

One of the Travis builds failed for no discernible-to-me reason: https://travis-ci.org/pydata/pandas/jobs/30798802.

On Jul 24, 2014, at 7:11 PM, jreback notifications@github.com wrote:

ok, looks good....ping when green


Reply to this email directly or view it on GitHub.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

sometimes this happens np (Travis is flaky )

@seth-p
Copy link
Contributor Author

seth-p commented Jul 25, 2014

Anything else I need to do, then?

jreback added a commit that referenced this pull request Jul 25, 2014
BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label
@jreback jreback merged commit 99b7c8c into pandas-dev:master Jul 25, 2014
@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

thanks!

@seth-p seth-p deleted the flex_binary_moment_column_order branch September 10, 2014 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label
2 participants