-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aliases for column names #11723
Comments
related to #10349 I suppose this is possible. This would be fairly easy to implement, but would require a good number of test cases to ensure its propogating correctly (e.g. this is analagous to the Further would require an audit of the indexing code for it to be a synonymous application (e.g. you can use the alias where you could use the actual label). So while this is interesting, it would require a pull-request from the community to jump start it. |
I'll have a go at this when I get a chance. It also occurred to me that these aliases may be useful when dealing with |
no |
I'm not a big fan of including this feature in pandas itself, because it would make the pandas data model significantly more complex. Maybe this could be implemented in some sort of add-on package that wraps pandas DataFrames? Another option would be a DataFrame subclass. |
There are certainly risks that could be introduced from adding aliasing, but wouldn't a straightforward strategy be to augment the logic in # 1. works today:
df['Time of Sale']
# 2. fails today:
df.time_of_sale
# 3. could work in the future:
df.alias = dict(time_of_sale='Time of Sale')
df.time_of_sale Or maybe I misunderstand and 2. is already possible today. If so, could someone point me in the right direction toward documentation? I too would find this quite useful. |
In order to do 2., you would have to rename the column, possibly using http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html And then, when you'd like to print or plot it, you'd rename it back to the original version. I too think this would still be a good addition for interactive work. To make things even more interesting, I would alias "Time of Sale" to "tos", so I can work with the data as |
I'd also like to see such a feature. For me the favourite use case would be to have nice, legible axes labels (with units) in seaborn plots. I know one can manually set the axis labels, but I find this error prone, too verbose and it leads to code duplication. If you ask me, the easier way would be to keep the current name in the role of an alias as @bbirand proposes, and to add some other field for a longer name, which can default to the "normal" name if none is explicitly given. |
Any update on this feature? |
@luisfelipe18 - Actually, for aggregation you already have aliasing in Pandas, see here (I'd recommend reading through the entire post). The current issue refers to aliasing existing columns, regardless of aggregation. |
IMO, we shouldn't use this in pandas itself. Indexing is complicated enough without aliases. We'd be better served by adopting / defining a convention (similar to how xarray uses CF conventions) for mapping column names to descriptive names. These could be stored in the |
I'd like to echo @KeithWM's point about there being a "long form" for a column's contents, i.e. something that I understand that this "long form" is not a general scheme for creating aliases (that is a many-to-one correspondence) and it could make sense to understand what is the main use case and, perhaps, have a new thread. |
If I understood this issue correctly, it is about the intention to retain the (potentially long and verbose) original column names for everything but for accessing the columns in code. As I see it, we can already get exactly that without any modification to pandas at all: Just define some constants and then use those to access your columns in your code: class Columns:
colA = "My tediously long name for column A"
colB = "Yet another long column name"
colC = "Some column with $\emph{special}$ symbols in it"
df = pd.read_csv(...)
print(df[Columns.colA]) Using a separate class to create a namespace for the column constants is of course optional and you can omit it if you prefer. If I did not miss anything, this seems to fit all scenarios in which one would want to use aliases, unless you are trying to alias some columns to allow something like column-duck-typing. But I guess that would probably only get messy really quickly anyway 🤔 |
When I work with Pandas DataFrames, I prefer to keep the full column names for clarity. So when I print out the
head
, or use describe, I get a meaningful table. However, this also means I have column names like "Time of Sale" that become annoying to type out.A nice compromise seems like it would be to have short "aliases" for column names. For instance, I can define the
tos
average for the above, perhaps like so:Then, the
__get_attribute__
method can look up aliases in addition to column names, so I can refer to that column simply asdf.tos
. But for all other purposes, the columns name is still the descriptive full name.Would this make sense?
The text was updated successfully, but these errors were encountered: