Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line-by-line conversion to HTML #59

Closed
krlmlr opened this issue Jan 13, 2019 · 5 comments
Closed

Line-by-line conversion to HTML #59

krlmlr opened this issue Jan 13, 2019 · 5 comments
Labels
Milestone

Comments

@krlmlr
Copy link

krlmlr commented Jan 13, 2019

In the tibble 2.0.1 blog post, converting each line of output separately helped work around how the blackfriday markdown -> HTML converter treats newlines embedded in HTML: tidyverse/tidyverse.org@32d6829. Perhaps conversion to HTML should treat newlines separately? I'm not sure if and where this change should happen, though.

In the reprex below, applying strsplit() just seems to work even if the newline is inside a block. I don't understand why.

options(crayon.enabled = TRUE)
text <- crayon::blue("a\nb")
fansi::sgr_to_html(text)
#> [1] "<span style='color: #0000BB;'>a\nb</span>"

text_split <- unlist(strsplit(text, "\n", fixed = TRUE))
fansi::sgr_to_html(text_split)
#> [1] "<span style='color: #0000BB;'>a</span>"
#> [2] "<span style='color: #0000BB;'>b</span>"

Created on 2019-01-13 by the reprex package (v0.2.1)

@brodieG
Copy link
Owner

brodieG commented Jan 13, 2019

Interesting, I'll need to investigate to see what's going on, because clearly there are still newlines at the end of the lines. It may be blackfriday only replaces internal ones, e.g. because it does something equivalent to readLines (in whatever language it's written in) that creates vectors of line strings without the terminating newline, and then does the equivalent of writeLines which puts them back in. Probably the main thing to look at is where the internal newlines were coming from.

I believe the above works because sgr_to_html closes the span elements at the end of each CHARSXP to ensure it produces valid HTML (going from memory here, could be wrong). It was easier to do it this way than track if there were any open HTML tags at the beginning or end of each line.

It seems though this does not need to be resolved immediately. Please let me know if you're looking for some changes in the near term, otherwise this will probably sit here for a while. I also don't think (not sure) that fansi should take too much initiative in splitting by newlines, as in many cases you want to preserve the newlines so things render correctly inside PRE blocks. Not sure. Another possibility is that this could be done as part of the hook scripts as you effectively did (i.e. the built in fansi hook script could do this).

@krlmlr
Copy link
Author

krlmlr commented Jan 13, 2019

Thanks. No need to rush here. I'm also not sure where this change belongs. Doesn't seem like the primary concern of sgr_to_html(), on the other hand it would be great if hooks just worked out of the box. Maybe a simple wrapper that calls strsplit() and then map_chr(..., paste, collapse = "\n") ?

I agree that it might be worth to look at the origin of the internal newlines too.

@brodieG brodieG added this to the 0.4.1 milestone Jan 19, 2019
@brodieG
Copy link
Owner

brodieG commented Jan 4, 2020

Note to self, related to: tidyverse/tidyverse.org#266

@brodieG
Copy link
Owner

brodieG commented Jan 4, 2020

Interesting, I'll need to investigate to see what's going on, because clearly there are still newlines at the end of the lines. It may be blackfriday only replaces internal ones, e.g. because it does something equivalent to readLines (in whatever language it's written in) that creates vectors of line strings without the terminating newline, and then does the equivalent of writeLines which puts them back in. Probably the main thing to look at is where the internal newlines were coming from.

From additional investigation what seems to be happening is that blackfriday replaces newlines that are inside "SPAN" tags (and maybe others too?). sgr_to_html allows newlines by default inside SPANs, that is, if it is applying a particular SGR style and a new line is encountered, that style remains unchanged as that is the correct semantic interpretation and produces more compact strings. So outputs such as:

<span ...>line 1\n
line2</span>

are perfectly okay as far as fansi is concerned, and in fact, better than:

<span ...>line 1</span>\n
<span ...>line 2</span>

If we want to produce the latter output we can split by \n as @krlmr notes as fansi automatically closes and re-opens SPANs across STRSXP elements (but not within a CHRSXP). I think this was done to ensure no vector element produces invalid HTML (or lazyness, or some combination of the two). We'll need to ensure this behavior remains.

@brodieG brodieG closed this as completed in 7ccb892 Jan 9, 2020
@brodieG
Copy link
Owner

brodieG commented Jan 9, 2020

Just for completeness, the solution to this particular issue is to set the split.nl parameter to set_knit_hooks to TRUE which will internally do roughly what @krlmlr work-around above does.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 5, 2021
# fansi Release Notes

## v0.5.0

* [#65](brodieG/fansi#65): `sgr_to_html` optionally
  converts CSI SGR to classes instead of inline styles (h/t @hadley).
* [#69](brodieG/fansi#69): `sgr_to_html` is more
  disciplined about emitting unnecessary HTML (h/t @hadley).
* New functions:
    * `sgr_256`: Display all 256 8-bit colors.
    * `in_html`: Easily output HTML in a web page.
    * `make_styles`: Easily produce CSS that matches 8-bit colors.
* Adjust for changes to `nchar(..., type='width')` for C0-C1 control characters
  in R 4.1.
* Restore tests bypassed in 0.4.2.

## v0.4.2

* Temporarily bypass tests due to R bug introduced in R-devel 79799.

## v0.4.1

* Correctly define/declare global symbols as per WRE 1.6.4.1, (h/t Professor
  Ripley, Joshua Ulrich for example fixes).
* [#59](brodieG/fansi#59): Provide a `split.nl` option
  to `set_knit_hooks` to mitigate white space issues when using blackfriday for
  the markdown->html conversion (@krlmlr).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants