Open
Description
I found what seems to me a behavior that might be wrong when joining a simple index df with a multiindex df.
I'm admittedly not an SQL expert and I'm not sure this is really a bug.
Consider the case:
X = pd.DataFrame([[2, 3], [5, 7]], columns=['a','p']).set_index('a')
Y = pd.DataFrame([[1, 2, 3], [3, 4, 8], [5, 6, 9]],
columns=['a','b','c']).set_index(['a','b'])
X
# p
# a
#2 3
#5 7
Y
# c
# a b
#1 2 3
#3 4 8
#5 6 9
X.join(Y, how='left')
# p c
# a b
#5 6 7 9
I see the rational for not returning the line indexed by a=2
in X
: since there is no value for the level b
to associate to this line it is discarded.
But I'm wondering if this output would not be also reasonable:
p c
a b
2 Nan 3 Nan
5 6 7 9
In SQL the result of (where I replace X
by X.reset_index()
and similarly for Y
):
select *
from X, Y
where Y.a = X.a (+)
would contain the line
a b p c
2 Nan 3 Nan
Any thought?
For completeness, the same issue is present for how='outer'
.