Change Series.to_dummies
to never include a "dummy null" column in the output
#19325
Labels
Series.to_dummies
to never include a "dummy null" column in the output
#19325
TL;DR: change the
to_dummies
behaviour so that it never creates a "dummy null" column - it's very easy for users to add it themselves later if they need itProblem statement
Say you see:
There's currently no way of knowing whether this was produced by
as they produce the same output
Furthermore,
to_dummies
may produce duplicate column names #19096Possible solutions
separator
required (and not allowed to be empty), and use a protected separator for null values. This addresses the known issues, but I think there's some downsides:separator=''
(which is needed to match pandas' output, so would make migrating harder)'null'
(ignoring the column name), then that would mean thatpl.concat([feature_1.to_dummies(), feature_2.to_dummies()], how='horizontal')
would raisef'{column_name}#null'
(so, using'#'
as protected separator), then that would probably not be clear to usersdummy_null
keyword argument, as suggested indummy_null
inSeries.to_dummies
#19095. Unfortunately, when set toTrue
, it still doesn't solve the problem highlighted in this issue whereby one cannot tell if a column'a_null'
corresponds to'null'
values or missing valuesdummy_null
keyword argument which behaves like it does in pandas:True
, then it always produces a dummy null column (even if there's no nulls)False
, then it never produces a dummy null column (even if there are nulls)This would also be a breaking change, and the
dummy_null=True
could still produce duplicate column names as in Series.to_dummies can create duplicate column names #19096.with_columns(s.is_null().cast(pl.UInt8).alias(f'{s.name}_null')
. Users would be free to name the dummy null column how they want, so the responsibility would be on them to choose a name which wouldn't produce duplicate column namesI think all solutions would represent a breaking change for at least some users, and I think solution 4 is the simplest
Pinging @coastalwhite as we'd discussed this, curious if you have further thoughts here - thanks 🙏
The text was updated successfully, but these errors were encountered: