-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop printing row numbers in show(io, df)? #864
Comments
I prefer printing the row numbers since you're allowed to index using them at the moment, but I could be convinced to change that if enough of the current committers agree that the row numbers are a problem. If we ever remove the row numbers (which I'd like to do), this issue would just go away. I've retitled this issue since the term "index" is misleading. |
Well, the row numbers are useful for simple observations, but I think they become redundant when the user needs a real index to keep track of some parts of the data:
For the front-ends, the printed row numbers are very cumbersome. When the user defines a DataFrame with a single column, its HTML representation is actually a two column DataFrame. Usually, this is not a big deal because most front-ends are only displaying the data. However, I noticed the difference when I wanted to receive user-defined DataFrames using the datatables library with virtual scrolling and there was a mismatch in the number of columns. Furthermore, I think people that uses Python and R frequently want real row numbers. In any case, if we remove the row numbers, what would be offered as a replacement? A real column working as an index? |
We'll deal with those kind of user interface issues when the appropriate time comes for dealing with them. Right now, making progress on DataFrames is blocked on finalizing the NullableArrays package in time for Julia 0.4. |
That's okay. I just wanted to know if this was a decision already taken or if there was some likelihood to get fixed some of these issues. Thanks. |
I think that, for printing, a lot could be gained by just not printing the vertical bars left of the row number, and the identifier 'row'. That would be like
|
Good idea, @mkborregaard. |
+1 for @mkborregaard suggestion. |
+1 for @mkborregaard too |
I also like @mkborregaard's idea but I'm not sure that addresses the use of a real row of numbers (instead of using a printed representation). @alyst I'm not sure if this is what you mean but pandas keeps the index column even if the number of columns is too large to fit in the screen:
|
@rsmith31415 Yes, thank you, that's what I have meant. |
I think potentially there are two issues here - whether to print row numbers, and whether to automatically associate a row index to DataFrames that has special properties when printing. Note that this behaviour in R is not necessarily intuitive:
is intuitive, but it is not necessarily intuitive when slicing:
In the last case, the row names only make sense if they actually mean something. The way I read @alyst 's comment, the suggestion is to allow the DataFrames to have a custom key associated that has special properties when printing. I like that, to me that seems user-friendly and intuitive. Is some of that functionality already in the NamedArray package? |
@mkborregaard I've meant only specifying column(s) at print time. Column annotations within a data frame is a different story. It's quite some effort to implement keys, but even for simple annotations we would have to figure out how they should behave under data transformation (e.g. joining or grouping), which might be not so universally intuitive in the end. |
OK, I get it now. I think that idea is nice! |
@mkborregaard's proposal looks a lot like what Hadley Wickham's tibble does: https://github.com/tidyverse/tibble/blob/master/README.md |
duplicate of #592 |
@quinnj I think this is not a duplicate. The purpose was to propose a real index, not to hide the row numbers. |
If this issue is about adding a concept similar to row names in R or Pandas, I think it can be closed as this probably won't happen. What we could envisage is marking a specific column as being an index like in SQL databases. |
@nalimilan I think your suggestion is very reasonable, but let me point out that it is quite similar (or even equivalent) to using "row names". The main purpose is to have an index, so even if this index is not printed by default, it will still be useful. |
It's quite similar, but the advantage is that it wouldn't force you to have a useless column of row names when you don't use it. The problem with row names is that they are often redundant with an ID column which already exists in the data, but since row names don't behave like a standard column they are annoying to work with. |
Sure. I understand your point. It looks like this is a very subjective issue because I often work with datasets that don't have an ID column, so the additional column is very useful. In any case, I think we can agree that an optional index would be a nice feature. |
For whatever it's worth, I actually disagree that an optional index would be beneficial. In my opinion, if an index or some other set of names is significant in your data, it should be stored as a column of the dataset. |
I think that's because you don't use indexes. Regardless of their different behavior, indexes are also useful to increase speed. |
DataFrames is now pretty agnostic to the columns under the hood, so it would be totally possible to create an |
may we have something like |
What do you need that it doesn't do? |
This is largely resolved with PrettyTables.jl backend now. |
I was confused about the column called "Row" that is printed in all DataFrames since it doesn't keep track of indexes after slicing. For example:
As this column "Row" is printed, the index starts again with 1 instead of 20. There was an issue ( #187 ) a couple of years ago, but I think the idea was to not rely on indexes and use them only for speed. Since it has been a while, I'd like to know what is the current consensus regarding this issue.
The text was updated successfully, but these errors were encountered: