[SPIKE] Investigate whether Woodwork can be expanded to handle incoming `string` dtypes

Currently, if a user creates a Pandas dataframe and passes it into Woodwork, certain dtypes are already inferred in Pandas which makes inference significantly easier. However there might be cases where all incoming data is in the form of text and has a dtype of `string`.

For a dataframe initialized like this:
```
df = pd.DataFrame()
df["ints"] = [i for i in range(100)]
df["floats"] = [i*1.1 for i in range(100)]
df["bools"] = [True, False, False, True, False] * 20
df["bools_nan"] = [True, False, False, True, pd.NA] * 20
df["strings"] = [f"{i}" for i in range(100)]
df["categoricals"] = np.random.choice(["Yellow", "Blue", "Red"], 100)
```

Subsequent Woodwork initialization yields as expected:
![Screen Shot 2023-01-13 at 4 03 12 PM](https://user-images.githubusercontent.com/29577911/212418699-75ff064e-9d6c-4508-b4e6-384444b06218.png)


But conversion of all dtypes to `string` prior to Woodwork initialization
```
for col in df.columns:
    df[col] = df[col].astype("string")
```
Yields this:
![Screen Shot 2023-01-13 at 4 03 21 PM](https://user-images.githubusercontent.com/29577911/212418815-28c96272-a6e9-44c5-8768-4c43a98b62ec.png)

This spike covers investigation into what solution(s) exist for this and how/in what order it should be tackled (by logical type, or is there an approach that can tackle all at once).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPIKE] Investigate whether Woodwork can be expanded to handle incoming `string` dtypes #1617

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SPIKE] Investigate whether Woodwork can be expanded to handle incoming string dtypes #1617

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[SPIKE] Investigate whether Woodwork can be expanded to handle incoming `string` dtypes #1617