ENH: Add parameter to read_html() that disables the _remove_whitespace() function

### Feature Type

- [ ] Adding new functionality to pandas

- [X] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I have an html file that shows question names from a surveilance survey and how they have changed over the years to merge multiple files with slightly different names from those different years. I need to read in this file with the text exactly as it is in the html file so that those question names map exactly to the ones I mine from a pdf. The _remove_whitespace() function replaces all of the extra whitespace with single spaces, but there are some errors in these column names where they accidentally put two spaces or other similar things, and I need that text to match exactly, so I can properly clean the other files in the dataset.

### Feature Description

Add a new parameter to the read_html() function that can disable the _remove_whitespace() function.

### Alternative Solutions

The file I am using is originally md that I converted to html because I didn't think there was a way to read from md into pandas. I recently found this out, so instead of converting to html and reading from that, I am reading straight from the md. However, if my original file was html, I would probably create a similar solution of going back down to the read_table() function and manually making the changes I want to the cleaning.

```
def read_md(filename):

    table = pd.read_table(filename, sep='|').dropna(axis=1, how='all').iloc[1:]

    table.columns = table.columns.str.strip()

    for col in table.columns:

        table[col] = table[col].str.strip()

    return table.reset_index(drop=True)
```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add parameter to read_html() that disables the _remove_whitespace() function #59827

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Add parameter to read_html() that disables the _remove_whitespace() function #59827

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions