fix(opinions): Add filter linebreaksbr to html #3378

flooie · 2023-11-15T20:01:12Z

Fix missing line breaks that sporadically affect our HTML with citations. Add django filter for line breaks where new lines exist and it cleans up our HTML beautifully.

Before

After

This actually resolves the issue with bad or missing newlines based on the document

cl/opinion_page/templates/opinion.html

mlissner · 2023-11-16T21:03:40Z

Yeah, I don't know, Bill. You don't want to put zillions of <br> tags into the page. It'll make it slower, but even if that were negligible, it would also just make it uglier? Is there a better way to do this?

flooie · 2023-11-17T18:33:56Z

@mlissner

The problem we are seeing is that the HTML is not rendering correctly when new lines are inside <pre> tags or maybe other types. I'm sorry that I wasn't clear about what is happening here so let me back up and take a second stab at it.

The above example I used for illustration has a stretch of text that looks like this on our website.

liability. See Burlington Industries, Inc. v. Ellerth, 524 U.S. 742(1998); Faragher v. Boca Raton, 524 U.S. 775 (1998). Liability

All one line - no spacing. If you look at the PDF - it should look like this.

liability. See Burlington Industries, Inc. v. Ellerth, 524 U.S. 742
(1998); Faragher v. Boca Raton, 524 U.S. 775 (1998). Liability

with a break after the Citation and a new line starting at (1998).

I'm not an expert in HTML with citations being generated but it's clear what is happening here. Lets look at the html_with_citations content im going to add new lines for ease of reading

...\r\n\r\nliability. See Burlington Industries, Inc. v. Ellerth, </pre>
<span class="citation" data-id="118244">
<a href="/opinion/118244/burlington-industries-inc-v-ellerth/">524 U.S. 742</a></span>
<pre class="inline">\r\n(1998); 
Faragher v. Boca Raton, 
</pre><span class="citation" data-id="118245">
<a href="/opinion/118245/faragher-v-boca-raton/">524 U.S. 775</a>
</span><pre class="inline"> (1998). 
Liability\r\nunder Monell v. New York City Department of Social Services,\r\n</pre>

See the <pre class="inline">\r\n(1998); <- those returns are what are being dropped and causing our runoffs.

mlissner · 2023-11-17T23:20:02Z

OK, sorry, I definitely misunderstood how linebreaksbr works. Somehow I thought it was going to convert all whitespace to br's.

I'm a bit concerned though because usually HTML is supposed to ignore line returns and white space. Have you tried this on a bunch of different content types and sources to make sure it doesn't have any weird effects?

flooie · 2023-11-21T16:13:18Z

I've looked at maybe a dozen or so examples, but there are more than that. Though I suspect that was enough to get a sense. I saw nothing other than it nicely processing the HTM - Do you want me to do a much more extensive review?

My expectation is that this is only for scraped opinions.

mlissner

So the first change you made is focused on things that have source of C. That seems mostly OK, because those are mostly PDFs with bad plain text, but even some of those come in as HTML where we don't control the line returns.

The second and third changes below are not from the scrapers and worry me. My concern is that if we scrape some HTML and it has a bunch of line returns in it, we'll convert those line returns to <br> tags and make the content really long without meaning to. In general line returns in HTML should be ignored, but this does the opposite.

So I think:

The first one might be fine. At least, for PDFs it should be.
The second and third ones worry me and seem like they could cause trouble.

flooie · 2023-11-22T18:53:56Z

great. So lets drop the second and third one - leave the first - which was my primary focus at the start and just keep a watchful eye. its still a very easy and quick revert @mlissner yes?

mlissner · 2023-11-22T19:00:20Z

Sounds good.

flooie · 2023-11-28T15:47:17Z

@mlissner this I think is ready for your approval. It failed a web hook test - but I am rerunning and it and would suggest that its likely not related and deals with some other bug in web hooks?

mlissner · 2023-11-28T17:18:04Z

Cool. Let's see how this goes.

fix(opinions): Add filter linebreaksbr to html

b441967

This actually resolves the issue with bad or missing newlines based on the document

flooie requested a review from mlissner November 15, 2023 20:01

semgrep-app bot reviewed Nov 15, 2023

View reviewed changes

cl/opinion_page/templates/opinion.html Outdated Show resolved Hide resolved

semgrep-app bot reviewed Nov 15, 2023

View reviewed changes

cl/opinion_page/templates/opinion.html Outdated Show resolved Hide resolved

semgrep-app bot reviewed Nov 15, 2023

View reviewed changes

cl/opinion_page/templates/opinion.html Show resolved Hide resolved

Merge branch 'main' into 3377-fix-text-width

b574670

Merge branch 'main' into 3377-fix-text-width

500f2a9

Merge branch 'main' into 3377-fix-text-width

82759c1

mlissner reviewed Nov 21, 2023

View reviewed changes

Merge branch 'main' into 3377-fix-text-width

dad2235

fix(opinion.html): Shrink linebreaksbr usage

3ffc9b3

Merge branch 'main' into 3377-fix-text-width

86f9d06

mlissner merged commit 24191e5 into main Nov 28, 2023
12 of 13 checks passed

mlissner deleted the 3377-fix-text-width branch November 28, 2023 17:17

ERosendo mentioned this pull request Dec 5, 2023

Text escapes div on some opinion content #3377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opinions): Add filter linebreaksbr to html #3378

fix(opinions): Add filter linebreaksbr to html #3378

flooie commented Nov 15, 2023

mlissner commented Nov 16, 2023

flooie commented Nov 17, 2023 •

edited

Loading

mlissner commented Nov 17, 2023

flooie commented Nov 21, 2023

mlissner left a comment

flooie commented Nov 22, 2023

mlissner commented Nov 22, 2023

flooie commented Nov 28, 2023

mlissner commented Nov 28, 2023

fix(opinions): Add filter linebreaksbr to html #3378

fix(opinions): Add filter linebreaksbr to html #3378

Conversation

flooie commented Nov 15, 2023

mlissner commented Nov 16, 2023

flooie commented Nov 17, 2023 • edited Loading

mlissner commented Nov 17, 2023

flooie commented Nov 21, 2023

mlissner left a comment

Choose a reason for hiding this comment

flooie commented Nov 22, 2023

mlissner commented Nov 22, 2023

flooie commented Nov 28, 2023

mlissner commented Nov 28, 2023

flooie commented Nov 17, 2023 •

edited

Loading