BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738

seth-p · 2014-07-12T18:57:33Z

Closes #7542.

seth-p · 2014-07-12T19:05:44Z

I left the following unchanged for two distinct DataFrames with pairwise=False, as it's not obvious to me what the desired behavior is in this case.

            if pairwise is False:
                ...
                res_columns = arg1.columns.union(arg2.columns)

jreback · 2014-07-12T20:19:05Z

pandas/stats/moments.py

+                            results[i][j] = f(*_prep_binary(arg1[k1], arg2[k2]))
+                p = Panel.from_dict(results).swapaxes('items', 'major')
+                p.major_axis = arg1.columns[p.major_axis]
+                p.minor_axis = arg2.columns[p.minor_axis]


reindex instead here

If I understand it correctly, reindex() will rearrange existing rows/columns (labels and values), whereas what I'm doing here is renaming the row/columns without changing the values. Note that I construct results to be indexed by column index (rather than label), so that the order is maintained, and then I simply apply the labels without touching the values. I think if I use reindex() here I will just end up with all NaNs.

seth-p · 2014-07-13T05:41:00Z

I updated the code to properly deal with multiple columns with the same label (except for the case of two distinct DataFrames and pairwise=False, where it's not obvious to me what the desired behavior would be).

jreback · 2014-07-21T12:14:48Z

pandas/stats/tests/test_moments.py

@@ -873,6 +873,122 @@ def test_expanding_corr_pairwise_diff_length(self):
        assert_frame_equal(result3, expected)
        assert_frame_equal(result4, expected)

+    def test_pairwise_stats_column_names_order(self):
+        # GH 7542
+        df1_columns_ordered = DataFrame([[2,4],[1,2],[5,2],[8,1]], columns=[0,1])


I suspect the ordering (in the current impl) has to do with the dtype mix. So can you do tests with say mixed int/floats?

I added additional tests. Let me know if that's not what you had in mind.

jreback · 2014-07-21T12:15:04Z

need a release note for v0.15.0 in bug fixes

jreback · 2014-07-24T18:22:02Z

doc/source/v0.15.0.txt

@@ -278,7 +278,10 @@ Bug Fixes
 - Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`)
 - Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`)

-
+- Bug in ``expanding_cov``, ``expanding_corr``, ``rolling_cov``, ``rolling_cov``, ``ewmcov``, and ``ewmcorr``


say fixes the issue where...... and non-unique columns (instead of multiple columsn with the same name)

Updated. Better now?

jreback · 2014-07-24T22:00:54Z

can't use a context manager under. 2.6 like that

iirc just create a function
thrn call
self.assertRaises(ValueError, f)

seth-p · 2014-07-24T22:28:16Z

How do I test the message in the ValueError? The following works for me on 3.4, but per https://docs.python.org/2.6/search.html?q=assertRaisesRegex&check_keywords=yes&area=default I suspect it doesn't work on 2.6.

                    self.assertRaisesRegex(ValueError, "'arg1' columns are not unique", f, df, df2)
                    self.assertRaisesRegex(ValueError, "'arg2' columns are not unique", f, df2, df)

jreback · 2014-07-24T22:33:06Z

their are lots of examples in the code, e.g. : https://github.com/pydata/pandas/blob/master/pandas/tests/test_index.py#L52 (and you can use a context manger). don't use the self.assertRaisesRegex, rather the tm.assertRaisesRegex (which is fixed for 2.6)....

hmm...maybe should fix that

seth-p · 2014-07-24T22:46:52Z

OK, updated to use tm.assertRaisesRegexp. Thanks for the help...

jreback · 2014-07-24T23:04:55Z

@seth-p I added them to the TestCase anyhow (so when you do self.assertRaisesRegexp will work (for the future)

seth-p · 2014-07-24T23:05:35Z

... tm.assertRaisesRegexp worked.

…n-unique columns

jreback · 2014-07-24T23:10:55Z

ok, looks good....ping when green

seth-p · 2014-07-24T23:46:05Z

One of the Travis builds failed for no discernible-to-me reason: https://travis-ci.org/pydata/pandas/jobs/30798802.

On Jul 24, 2014, at 7:11 PM, jreback notifications@github.com wrote:

ok, looks good....ping when green

—
Reply to this email directly or view it on GitHub.

jreback · 2014-07-24T23:51:12Z

sometimes this happens np (Travis is flaky )

seth-p · 2014-07-25T01:43:52Z

Anything else I need to do, then?

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label

jreback · 2014-07-25T14:34:11Z

thanks!

jreback reviewed Jul 12, 2014
View reviewed changes

seth-p changed the title ~~BUG: _flex_binary_moment() doesn't preserve column order~~ BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label Jul 13, 2014

jreback added Bug labels Jul 14, 2014

jreback added this to the 0.15.0 milestone Jul 14, 2014

jreback reviewed Jul 21, 2014
View reviewed changes

jreback reviewed Jul 24, 2014
View reviewed changes

BUG: _flex_binary_moment() doesn't preserve column order or handle no…

34d2910

…n-unique columns

jreback added a commit that referenced this pull request Jul 25, 2014

Merge pull request #7738 from seth-p/flex_binary_moment_column_order

99b7c8c

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label

jreback merged commit 99b7c8c into pandas-dev:master Jul 25, 2014

seth-p mentioned this pull request Aug 21, 2014

BUG: Inconsistencies in calculating second moments of a single value #7900

Closed

seth-p deleted the flex_binary_moment_column_order branch September 10, 2014 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738

seth-p commented Jul 12, 2014

seth-p commented Jul 12, 2014

jreback Jul 12, 2014

seth-p Jul 13, 2014

seth-p commented Jul 13, 2014

jreback Jul 21, 2014

seth-p Jul 24, 2014

jreback commented Jul 21, 2014

jreback Jul 24, 2014

seth-p Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 25, 2014

jreback commented Jul 25, 2014

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738

BUG: _flex_binary_moment() doesn't preserve column order or handle multiple columns with the same label #7738

Conversation

seth-p commented Jul 12, 2014

seth-p commented Jul 12, 2014

jreback Jul 12, 2014

Choose a reason for hiding this comment

seth-p Jul 13, 2014

Choose a reason for hiding this comment

seth-p commented Jul 13, 2014

jreback Jul 21, 2014

Choose a reason for hiding this comment

seth-p Jul 24, 2014

Choose a reason for hiding this comment

jreback commented Jul 21, 2014

jreback Jul 24, 2014

Choose a reason for hiding this comment

seth-p Jul 24, 2014

Choose a reason for hiding this comment

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 24, 2014

jreback commented Jul 24, 2014

seth-p commented Jul 25, 2014

jreback commented Jul 25, 2014