Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why #618

Closed
kasrllc opened this issue Jun 9, 2014 · 10 comments
Closed

why #618

kasrllc opened this issue Jun 9, 2014 · 10 comments

Comments

@kasrllc
Copy link

kasrllc commented Jun 9, 2014

Perhaps this is more of a question than an issue. Why does DataFrames now disallow column names which are not valid identifiers? The column name is a visual label for the column. If I want it to be "Return standard deviation", I think it should be allowed.

@StefanKarpinski
Copy link
Member

Dup of #617.

@nalimilan
Copy link
Member

The idea is that in the future it will (likely) be possible to use the df.column syntax as a shorthand for df[:column]. Enforcing valid identifiers reduces the possible problems users may face when trying to access columns with that syntax. But as @StefanKarpinski said in the other report, one may also consider that users are adults and that they can manage.

Anyway, for column names like "Return standard deviation", I think column labels will be more useful (#35).

@johnmyleswhite
Copy link
Contributor

I agree with Milan: column labels are more useful than allowing names that aren't valid identifiers.

@StefanKarpinski
Copy link
Member

The argument isn't that it's useful to have column names with spaces (or otherwise not valid identifiers), but that preventing them when when the user explicitly asks for them is annoying and unnecessary.

@johnmyleswhite
Copy link
Contributor

As I said in #617, if the ability to do this exists, someone is going to do this in library code. Then you'll get broken names when you didn't ask for them.

If this really matters, we can change this restriction. But the absence of this restriction infuriates me when using R.

@kasrllc
Copy link
Author

kasrllc commented Jun 9, 2014

How about an optional named argument “allownonidentifiers”?

From: John Myles White [mailto:notifications@github.com]
Sent: Monday, June 09, 2014 5:33 PM
To: JuliaStats/DataFrames.jl
Cc: Kevin Atteson
Subject: Re: [DataFrames.jl] why (#618)

As I said in #617#617, if the ability to do this exists, someone is going to do this in library code. Then you'll get broken names when you didn't ask for them.

If this really matters, we can change this restriction. But the absence of this restriction infuriates me when using R.


Reply to this email directly or view it on GitHubhttps://github.com//issues/618#issuecomment-45547226.

@johnmyleswhite
Copy link
Contributor

I would rather either allow this or not, rather than add in a keyword argument.

@StefanKarpinski
Copy link
Member

My position is that DataFrames should carefully avoid generating column names that aren't valid identifiers, to make the default experience as smooth as possible, but should make no assumptions about the column names and accept any symbols, valid identifiers or not, as names.

@johnmyleswhite
Copy link
Contributor

Just to be clear: the experience is never going to be completely smooth when you use a symbol that's not an identifier, because that symbol won't parse correctly in things like formulas. But I'll stick to the deal proposed in #617.

@simonster
Copy link
Contributor

Closing as a dup of #617.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants