Closed
Description
Consider two DataFrames:
julia> df = DataFrame(a = [1, 2, missing, missing])
4×1 DataFrame
Row │ a
│ Int64?
─────┼─────────
1 │ 1
2 │ 2
3 │ missing
4 │ missing
julia> df2 = DataFrame(a = [1, 3, missing, missing], b = rand(4))
4×2 DataFrame
Row │ a b
│ Int64? Float64
─────┼───────────────────
1 │ 1 0.459054
2 │ 3 0.649346
3 │ missing 0.875563
4 │ missing 0.709856
currently to join them I would need to set matchmissing = :equal
, which produces duplicates:
julia> leftjoin(df, df2, on = :a, matchmissing = :equal)
6×2 DataFrame
Row │ a b
│ Int64? Float64?
─────┼─────────────────────────
1 │ 1 0.459054
2 │ missing 0.875563
3 │ missing 0.875563
4 │ missing 0.709856
5 │ missing 0.709856
6 │ 2 missing
I would like an option matchmissing = :ignore
(or whatever other name) that preserves the left table exactly, and only adds information on the right side where non-missing values match. Currently I think this would be achieved via
julia> leftjoin(df, dropmissing(df2), on = :a, matchmissing = :equal)
4×2 DataFrame
Row │ a b
│ Int64? Float64?
─────┼─────────────────────────
1 │ 1 0.459054
2 │ 2 missing
3 │ missing missing
4 │ missing missing
which is a bit counterintuitive from an API perspective (I need to set matchmissing
to :equal
even though I don't want to match missings!), and might also suboptimal from an efficiency perspective.