ENH: Implement DataFrame.select to select columns

Add a new method `DataFrame.select` to select columns from a DataFrame. The exact specs are still open to discussion, here I write a draft of what the method could look like.

Basic case, select columns. Personally both as a list, or as multiple parameters with `*args` should be supported for convenience:

```python
df.select("column1", "column2")
df.select(["column1", "column2"])
```

Cases to consider.

**What if a provided column doesn't exist?** I assume we want to raise a `ValueError`.

**What if a column is duplicated?** I assume we want to return the column twice.

**How to select with a wildcard or regex?** Some options:

1. Not support them (users can do anything fancy with `df.columns` themselves.
2. Assume the column is a regex if name starts by `^` and ends with `$`.  For wildcards, I guess it could be ok if `column*` is provided, to first check if the column with the star exists, if it does return it, otherwise assume the star is a wildcard
3. Accept callables, so users can do `df.select(lambda col: col.startswith("column"))`
4. Have extra parameters `regex` like `df.select(regex="column\d")`
5. Same as 2 by make users enable if explicitly with a flag `df.select("column\d", regex=True)`

Personally, I'd start by 1, not supporting anything fancy, and decide later. It's way easier to add, than to remove something we don't like once released.

**What to do with MultiIndex?** I guess if a list of strings is provided, they should select from the first level of the MultiIndex. Should we support the elements being tuples to select multiple levels at once? I haven't worked much with MultiIndex myself for a while, @Dr-Irv maybe you have an idea on what the expectation should be.

Can anyone think of anything else not trivial for implementing this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Implement DataFrame.select to select columns #61522

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Implement DataFrame.select to select columns #61522

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions