Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change ⍰ to ? when showing a DataFrame with missing eltype #2140

Merged
merged 8 commits into from
Mar 9, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Mar 4, 2020

Fixes #2139

@bkamins bkamins added the non-breaking The proposed change is not breaking label Mar 4, 2020
@bkamins bkamins added this to the 1.x milestone Mar 4, 2020
docs/src/man/getting_started.md Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Mar 8, 2020

OK - this should be good for a final check (if you agree with the description I have given for the reimplementation of compacttype).

# handle the case when after removing Missing union type name is short
length(sT) ≤ 8 && return sT * suffix
suffix = "?"
maxwidth -= 1 # we will add "?" at the end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we didn't include in the count, right? I think we'd better not include it, as it means that narrow columns allowing and not allowing for missing values will print a different abbreviation for the type, e.g. DataFrame(x=BigFloat[1, 2], y=Union{BigFloat, Missing}[1, 2]) will use "BigFloa" and "BigFlo" (IIUC).

Also, by taking into account ? and the number of characters used for the type is now 6 instead of 8 when it's too long and allows for missing values. Previously BigFloat would always be printed in full, and now it isn't. Is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - it was intentional (to normalize everything to 8), but I can change it. I will switch not to count "?" when determining width (which means sometimes it will be 9 like for BigFloat?).

maxwidth -= 1 # we will add "…" at the end

if T <: Union{CategoricalString, CategoricalValue}
return (maxwidth ≥ 11 ? "Categorical…" : "Cat…") * suffix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the column is wider than 17 characters, can we print CategoricalValue/CategoricalString in full? For simplicity, we can use the same threshold for both, anyway the latter is going away soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is one nuance that T can also be Union{CategoricalString, CategoricalValue} (probably a very improbable case). I will fix it then and we can do what you propose.

@bkamins
Copy link
Member Author

bkamins commented Mar 9, 2020

@nalimilan - your requests are fixed. As usual (and now I remember why I had problems last time) - things were more problematic than expected. I had to distinguish the phase of initial width counting and display phase to ignore "?" (the problem was when actually the type name is the widest entry in the column and it contains missing). Things should be good now.

@nalimilan
Copy link
Member

Yeah, sorry for derailing the PR. These things are quite tricky.

I hope the choice to special-case CategoricalValue will turn out to be the right one. The alternative would be to use Cat by default, but print as many characters as possible beyond that, like for other types.

@bkamins bkamins merged commit 7456a4c into JuliaData:master Mar 9, 2020
@bkamins bkamins deleted the change_display_question branch March 9, 2020 13:25
@bkamins
Copy link
Member Author

bkamins commented Mar 9, 2020

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⍰ character in header output
2 participants