-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for rich display of Markdown cells #2346
Conversation
Thank you for this PR. It seems OK - I will have to experiment with it before commenting. Also since in the long term we hope to change the default backend to PrettyTables.jl - @ronisbr can you please comment how you see this change from the perspective of that package? |
@bkamins IIUC, the idea is to support markdown notation inside the cells. In this case, this support must be added inside PrettyTables.jl. Currently, we can add all kind of decorations to the cell, but not add formats to individual words. The biggest problem (in Text backend) is to check the printed size of the string, but it should be possible using |
@ronisbr I've submitted a pull to PrettyTables with similar code. It really is a lot of bang for the buck - particularly for URLs and images. |
Looks good to me apart from the comment I have left and of course please make sure that CI passes before I do a final review.
It works for me. Are you sure you are using |
I think that the code is ready for review. I've resolved all the Travis errors in my new test cases. I will note that Markdown => latex is the weakest. There are markdown items like headers that don't translate in a well in a LaTeX tabular environment. If we want to fix this, we'd remap the #, ##, ### etc to font sizes rather then \section{} etc but this would involve reimplemening the markdown => latex code custom for this edge case. It could be done but... |
I would leave it as is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from a small comment looks good. Thank you!
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
@nalimilan - OK to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sounds nice! Just a few comments.
src/abstractdataframe/io.jl
Outdated
@@ -417,7 +423,12 @@ function printtable(io::IO, | |||
elseif isnothing(df[i, j]) | |||
print(io, nothingstring) | |||
else | |||
if ! (etypes[j] <: Real) | |||
if df[i,j] isa Markdown.MD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since accessing df[i, j]
is relatively slow, better store it in cell = df[i, j]
and use that afterwards as in the HTML method (not sure why we didn't do this here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it, nor the other changes you mention below. :-)
src/abstractdataframe/io.jl
Outdated
if ! (etypes[j] <: Real) | ||
if df[i,j] isa Markdown.MD | ||
print(io, quotemark) | ||
r=repr(df[i,j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=repr(df[i,j]) | |
r = repr(df[i,j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/abstractdataframe/show.jl
Outdated
function ourshow(io::IO, x::Markdown.MD) | ||
r = repr(x) | ||
len = min(length(r, 1, something(findfirst(==('\n'), r), lastindex(r)+1)-1), 32) | ||
return print(io, len < length(r) - 1 ? first(r, len)*"…" : first(r, len)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using chars should give a slightly more efficient code:
return print(io, len < length(r) - 1 ? first(r, len)*"…" : first(r, len)) | |
return print(io, len < length(r) - 1 ? first(r, len)*'…' : first(r, len)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
test/io.jl
Outdated
\t4 & 4 & \\textbackslash{}\\textbackslash{}alpha & S & 3.0 & d & \\emph{\\#undef} \\\\ | ||
\\end{tabular} | ||
""" | ||
str = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the triple quoted form? It's more readable. Same below.
@@ -54,6 +54,11 @@ ourshow(io::IO, x::Symbol) = ourshow(io, string(x)) | |||
ourshow(io::IO, x::Nothing; styled::Bool=false) = ourshow(io, "", styled=styled) | |||
ourshow(io::IO, x::SHOW_TABULAR_TYPES; styled::Bool=false) = | |||
ourshow(io, summary(x), styled=styled) | |||
function ourshow(io::IO, x::Markdown.MD) | |||
r = repr(x) | |||
len = min(length(r, 1, something(findfirst(==('\n'), r), lastindex(r)+1)-1), 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't abbreviate in other formats, so why do it for Markdown? Regarding newlines, I would just print them as \n
just like we do for strings.
We need a general mechanism to abbreviate too long cells, but this should be applied systematically disregarding the column type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep it here for the time being as md"
is likely to be overlong. Then maybe just add TODO to update it when we have a general mechanism.
In general maybe we will switch to PrettyTables.jl which actually handles this already.
These changes will be possible to implement after 1.0 release if needed, as display changes are not considered to be breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's do that. Though do you think it would make sense to apply this 32-char limit to other types as well? I guess it can be wasteful when there are only a few columns and that more space is available, but it's hard to handle. How does PrettyTables handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how wide terminal displays do you expect people have?
I have 210 characters width in my terminal, but having a column wider than 32 chars makes it very hard to follow what is in the column anyway. So I would say the rule - no column wider than 32 characters makes sense. If someone wants to check such a wide column in detail in REPL probably extracting it to a vector and investigating is the way to go anyway.
PrettyTables.jl behaves like this:
julia> df = DataFrame(a="a"^20, b="b"^20)
1×2 DataFrame. Omitted printing of 1 columns
│ Row │ a │
│ │ String │
├─────┼──────────────────────┤
│ 1 │ aaaaaaaaaaaaaaaaaaaa │
julia> pretty_table(df)
┌──────────────────────┬─────────────────── ⋯
│ a │ ⋯
│ String │ Stri ⋯
├──────────────────────┼─────────────────── ⋯
│ aaaaaaaaaaaaaaaaaaaa │ bbbbbbbbbbbbbbbbbb ⋯
└──────────────────────┴─────────────────── ⋯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, currently we can set the size of a column or its minimum size. However, I think it will be nice to have an option to set the maximum size also. In this case, we can set it to 32. Thus, columns that can be displayed with less characters will be, but columns larger than this will be cropped just like this:
julia> pretty_table(df, alignment = :l, columns_width = [10,10])
┌────────────┬────────────┐
│ a │ b │
│ String │ String │
├────────────┼────────────┤
│ aaaaaaaaa… │ bbbbbbbbb… │
└────────────┴────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! It is in master
:
julia> df = DataFrame(a="a"^5, b="b"^10, c="c"^15, d = "d"^20);
julia> pretty_table(df, alignment = :l, maximum_columns_width = 10)
┌────────┬────────────┬────────────┬────────────┐
│ a │ b │ c │ d │
│ String │ String │ String │ String │
├────────┼────────────┼────────────┼────────────┤
│ aaaaa │ bbbbbbbbbb │ ccccccccc… │ ddddddddd… │
└────────┴────────────┴────────────┴────────────┘
Off-topic: I think I just need to add 2 more features before starting to testing good configurations to print DataFrames!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then maybe we should set a limit to 32 characters for all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicholasWMRitchie - would you be willing to make this change everywhere in this PR or we leave it for a follow up PR (I can do it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's the conclusion on this?
test/io.jl
Outdated
] | ||
) | ||
@test sprint(show, "text/plain", df) == """ | ||
8×2 DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add 8-characters indentation. Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one still applies.
@NicholasWMRitchie - following our discussion I have written down https://bkamins.github.io/julialang/2020/08/13/strings.html. I hope you will find it useful. |
@NicholasWMRitchie - I would like to finalize this PR for 0.22 release if you have time to work on it. Thank you! |
@bkamins I've been on vacation for 2 weeks but I'll get back to this ASAP. |
Thank you!. In the meantime we have merge conflicts, but they should be simple to resolve 😄. |
I have pushed merge conflicts resolution to your branch (also a small suggestion - it is safer not to make a from a |
@nalimilan - I would merge this PR and leave a decision to limit printing to 32 everywhere to another PR, I have opened an issue #2389. |
NEWS.md
Outdated
@@ -55,3 +55,5 @@ | |||
|
|||
* Documentation is now available also in *Dark* mode | |||
([#2315](https://github.com/JuliaData/DataFrames.jl/pull/2315)) | |||
* add rich display support for Markdown cell entries in HTML, Markdown and LaTeX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean here by ", Markdown"? (we do not test shipping out data frame in Markdown anywhere - right?)
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
unfortunately you have performed a merge in a way that confused git (now some of the changes shown are not your changes + there are some duplicates). Can you please try fixing it? If you do not have much experience with git the easiest is to:
|
You are still mixing-in changes from other PRs. I do not want to merge it as I am not 100% sure if it would not confuse git (since it seems it confuses it now as these changes should not be shown). |
Yes, I'm struggling a bit with git. I fear that I'm only making things worse. I suspect that my mental model of what git is doing is flawed. While I can rebase my master against your master, I can't squash the result so... However, this isn't on the branch that I used to submit the pull request. |
Thank you for caring so much. |
Closing and will reopen as a new PR |
With a fairly small change to the HTML, LaTeX and text display code, it is possible to add rich text output support to cells within a DataFrame. Markdown strings are readily constructed using the default Markdown package and the @md_str macro. It already represents a generalized format definition language that most Julia users are familiar with.
It is already possible to place Markdown.MD cells in DataFrames. However, the display of these cells doesn't represent the richness of which Markdown is capable. With markdown, it is possible to output bold, italicized, headers, links and even images into DataFrame cell outputs.
For me personally, the biggest win is the ability to display functional URLs in a table. This is particularly useful when DataFrames are used in report-type documents like generated by Weave or other literate programming packages. I don't think there is another way to produce functional URL links in DataFrame outputs and this is often useful to detail the provenance of the data in a particular row.
The one shortcoming I've run into has been displaying LaTeX encoded in the markdown string in HTML. While the LaTeX displays correctly outside the table, it does not seem to display correctly in the HTML output table. It seems fine in the LaTeX table.