Align should consider the dtype when creating new columns

Currently when you align columns and create a new column, align will create a new float64 column filled with NaNs.

```python
In [1]: import pandas as pd

In [2]: a = pd.DataFrame({"A": [1, 2], "B": [pd.Timestamp('2000'), pd.NaT]})

In [3]: b = pd.DataFrame({"A": [1, 2]})

In [4]: a.align(b)[1].dtypes
Out[4]:
A      int64
B    float64
dtype: object
```

I think it'd be more useful for the dtypes of new columns to be the same as the dtype from the other.

```python
# proposed behavior
In [4]: a.align(b)[1].dtypes
Out[4]:
A             int64
B    datetime64[ns]
dtype: object
```

The newly created `B` column has dtype `datetime64[ns]`, the same as `a.B`.

This proposal would make the `fill_value` keyword a bit more complex.

1. The default of `np.nan` would change to `None`, which means "the right NA value for the dtype". 
2. We would maybe need to accept a Mapping so users could specify specific fill values per column.

I think this would make the workaround in https://github.com/pandas-dev/pandas/pull/31679 unnecessary, as we'd have the correct dtype going into the operation.

---

If we think this is a good idea, it's probably an API breaking change. We *might* be able to deprecate this cleanly by (ab)using `fill_value`. We would warn when creating new columns.

```python
if new_columns and fill_value is no_default:
    warnings.warn("Creating new float64 columns filled with NaN. In the future... "
                             "Specify fill_value=None to accept the future behavior now.")
    fill_value = np.nan  # 
```

Unfortunately, that'll happen in the background during binops. Not sure how to get around that, aside from instructing users to explicitly align first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Align should consider the dtype when creating new columns #31874

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Align should consider the dtype when creating new columns #31874

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions