-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix #8780: long words in narrow columns may not be hyphenated #8781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is done by adding '\hskip0pt\relax\n' whenever a paragraph starts.
sphinx/writers/latex.py
Outdated
| self.body.append('\n') | ||
| # the \hskip0pt\relax is to allow hyphenation of first word of | ||
| # a paragraph in narrow contexts such as in a table cell | ||
| self.body.append('\\hskip0pt\\relax\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is crucial that this is never inserted before "vertical" material, in particular before a list. Else it will create extra vertical space. I hope that visit_paragraph() will always correspond to starting an actual paragraph in TeX sense of the word. But with substitutions I am not sure.
Adding this rather for example at start of each list item like I first considered (due to relevance to hlist of #8779) can not work because the item might itself be a list and adding this will, as I mentioned, then create extra whitespace due to entering horizontal mode with no reason, only to leave it immediately for a "vertical" structure.
Took me some time to change all test files, but if there is a doubt here, better to not merge this and find another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this has a bug of suppressing in output a blank line which is needed by TeX to recognize paragraphs. I am correcting this stupidity in next commit, which will abstract the modification into \sphinxAtStartPar.
| table having … | ||
| \begin{itemize} | ||
| \item {} | ||
| \item {} \hskip0pt\relax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about modifiying visit_list_item but as the item might be a nested sublist, the insertion must come from actual paragraph. Notice here that the blanks in the sequence {} \hskip0pt\relax introduces no extra space at all in pdf output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now \hskip0pt\relax is modified into \sphinxAtStartPar, and here it would be on next line so the \item {}<space> line remains exactly as in current Sphinx. So this will not modify things for those who parse output file for this specific line for whatever purpose.
Define it to do nothing with lualatex engine
|
The To review this, the thing to check is whether this insertion of |
|
Here is updated test file With current Sphinx it produces this With this PR it will give this Here, the note: some captions in the screenshots say that lists in table cells trigger usage of |
|
Relates to #3042 but does not fix it. |
Else, a non-hyphenatable long word as first word in a narrow column in a
longtable/tabular (with column type e.g. p{1cm} from tabularcolumns
directive) gets shifted downwards vertically in PDF output.
Memo:
1. I did not find other cases where such a vertical shift may occur (I
tried with deeply nested lists and artificial words such as 'A'*32) with
LaTeX mark-up produced by Sphinx,
2. but with the support of hlist directive via PR sphinx-doc#8779 using multicols
environment, there is again this situation of downwards shift of
non-hyphenatable long first words. But it occurs whether or not
\sphinxAtStartPar is used (\nobreak does not modify this).
tk0miya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!! +1 for merging. If your worry came from the vertical material of LaTeX, you don't need to care it because the children of paragraph node is all inline nodes (basically).
| self.body.append('\n') | ||
| # the \sphinxAtStartPar is to allow hyphenation of first word of | ||
| # a paragraph in narrow contexts such as in a table cell | ||
| self.body.append('\n\\sphinxAtStartPar\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The children of paragraph node are regularly text nodes or inline nodes. It is defined at the doctree specification.
https://docutils.sourceforge.io/docs/ref/doctree.html
Almost of the inline nodes are decorated text. But, some of them are not; for example, math, image, problematic and raw. Additionally, some extensions might generate broken doctree under the paragraph node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A math node is fine (via the math role), an image (via a substitution) also is no problem. As per raw latex, also inserted by a substitution then if the raw latex starts some list environment it will be shifted one line down, but it seems then a user error to do this, the raw directive should have been used directly to insert the LaTeX "display" material in place of "inline" material. I can not think of a natural example causing a problem, after all docutils "paragraph" concept pretty well matches with latex "paragraph" concept I hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. From my perspective, docutils' paragraph model and LaTeX's horizontal mode are similar. (But I did not mention that yesterday because I'm not familiar with LaTeX's mode :-p)
| in a narrow context (like a table cell). For ``'lualatex'`` which | ||
| does not need the trick, the `\sphinxAtStartPar` does nothing. | ||
|
|
||
| .. versionadded:: 3.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this change will break PDF generation. But it would be better to add big change into the next major release. It's planned for this April.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge the change can not break PDF generation. As far as I can tell, at worst it could add an extra blank line if the first child is implemented via a "vertical" environment: typically a list, but this seems excluded: in my understanding a list can not be first child of a paragraph node. An image is not a problem if it is not embedded into "figure" (for example I image inserted from a substitution). I tested that math like equation or align do not seem to be affected. At any rate there is no way I can see this can break PDF build whatever nodes follow. The only thing it can cause is this extra blank line. But I can not construct an example from rst source creating this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge the change can not break PDF generation.
Okay. I understand and agreed this does not break PDF generation. Let us see what will be reported.
(for example I image inserted from a substitution)
Fortunately, reST's substitution only allows inline elements. Nobody can't insert non "inline" elements under the paragraph node without hacks (by raw node or extensions).
A substitution definition block contains an embedded inline-compatible directive (without the leading ".. "), such as "image" or "replace".
https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#substitution-definitions
|
Thank you @tk0miya always for reviewing and advice! What this code does is simply to add in LaTeX output at the start of (most) paragraphs a macro I think this can not be breaking change, thus now merging into 3.x! |
|
🎉 |
…inx-doc#8781 This is cosmetic as the blank line starting varwidth environment used for merged table cells in latex output changed nothing to PDF. Nevertheless I extended a unit test to have a multi-paragraph merged cell using varwidth. What is important is that \sphinxAtStartPar line itself is never followed by blank line.
…inx-doc#8781 This is cosmetic as the blank line starting varwidth environment used for merged table cells in latex output changed nothing to PDF. Nevertheless I extended a unit test to have a multi-paragraph merged cell using varwidth. What is important is that \sphinxAtStartPar line itself is never followed by blank line.
…r_8781 Let latex writer line trimming from depart_entry() work as before #8781




This is done by adding '\hskip0pt\relax\n' whenever a paragraph starts.
Relates
edited I am removing a screenshot which contained an inaccurate assertion. See updated screenshot in next comment #8781 (comment)
As explained (in an edit) at #8780 the hyphenation in the first table worked because it used
tabulary. The second table also usestabularybut with the cell containing a list, it fails without extra intervention.Tabular and longtable do not hyphenate without intervention the first word of a paragraph in a table cell.
This phenomenon is not related to tables, but to narrow width. So it will also occur with #8779 adding support for
hlist's in LaTeX, which a bit like table have columns of sometimes narrow width.This being said, I am hesitant about this PR. It seems costly to add the
\hskip0pt\relaxat start of each paragraph (I didn't use\hspace{0pt}for reasons of needing to get rid of influence of space token in source after it, possibly, in case of an item list for example this can cause extra whitespace). I had to change many test files to reflect the change.Also, for this PR to be safe it must be certain that the
\hskip0pt\relaxis really followed always with words: if for example it is first in a list item and then comes a list, that nested list will suffer a vertical shift.Besides, perhaps we should not merge this at all but only mention the work-around that I had used to fix a problem with our pdf docs. The trick is simply to define a substitution this way:
and use it manually where needed, like this
I have no strong opinion. And I hesitated to ask for merg in 3.x or master branch.