[ENH] Need a best dtype method

#### Introduction

Basically, the `pandas.read_*` methods and constructors are awesome at [assigning the highest level of dtype](http://pandas.pydata.org/pandas-docs/stable/basics.html#dtypes) that can include all values of a column. But such functionality is lacking for DataFrames created by other methods (stack, unstack are prime examples).

There has been a lot of discussion about dtypes here (ref. #9216, #5902 and especially #9589), and I understand it is a well rehearsed topic, but with no general consensus. An unfortunate result of those discussions was the deprecation of the `.convert_objects` method for being too forceful. However, the undercurrent in those discussions (IMHO) point to, and my needs often require a (DataFrame and Series) method which will intelligently assign the lowest generic dtype based on the data.

The method may optionally take a list of dtypes or a dictionary of column names, dtypes to assign user specified dtypes. Note that I am proposing this in addition to the existing `to_*` methods. The following example will help illustrate:

```
In [1]: df = pd.DataFrame({'c1' : list('AAABBBCCC'),
                           'c2' : list('abcdefghi'),
                           'c3' : np.random.randn(9),
                           'c4' : np.arange(9)})
        df.dtypes
Out[1]: c1     object
        c2     object
        c3    float64
        c4      int64
        dtype: object

In [2]: df = df.stack().unstack()
        df.dtypes
Out[2]: c1     object
        c2     object
        c3     object
        c4     object
        dtype: object
```
#### Expected Output

Define a method `.set_dtypes` which does the following:
1. Either takes a boolean keyword argument `infer` to infer and reset the column dtype to the least general dtype such that values are not lost.
2. Or takes a list or dictionary of dtypes to force each column into user specified dtypes, with an optional `errors` keyword argument to handle casting errors.

As illustrated below:

```
In [3]: df.set_dtypes(infer=True).dtypes
Out[3]: c1     object
        c2     object
        c3    float64
        c4      int64
        dtype: object

In [4]: df.set_dtypes(types=[np.int64]*4, errors='coerce').dtypes
Out[4]: c1     int64
        c2     int64
        c3     int64
        c4     int64
        dtype: object

In [5]: df.set_dtypes(types=[np.int64]*4, errors='coerce') # Note loss of data
Out[5]:     c1  c2  c3  c4
        0   NaN NaN 1   0
        1   NaN NaN 1   1
        2   NaN NaN 0   2
        3   NaN NaN 0   3
        4   NaN NaN 0   4
        5   NaN NaN 0   5
        6   NaN NaN 2   6
        7   NaN NaN 0   7
        8   NaN NaN 1   8

In [6]: df.set_dtypes(types=[np.int64]*4, errors='ignore').dtypes
Out[6]: c1     object
        c2     object
        c3     object
        c4      int64
        dtype: object
```
#### Additional Notes

I understand that date and time types will be a little difficult to infer. However, following the logic powering `pandas.read_*`, date and time types are not automatically inferred, but explicitly passed by the user.

It would be a one-size-fits-all solution if users were allowed to pass `True`, and `False` in addition to dtype to force when specifying dtypes per column. `True` in this case would indicate infer automatically (set the best dtype), while `False` would indicate ignore column from conversion.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH] Need a best dtype method #14400

Introduction

Expected Output

Additional Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[ENH] Need a best dtype method #14400

Description

Introduction

Expected Output

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions