Skip to content

Conversation

@jfbu
Copy link
Contributor

@jfbu jfbu commented Jan 28, 2021

This is done by adding '\hskip0pt\relax\n' whenever a paragraph starts.

Relates

edited I am removing a screenshot which contained an inaccurate assertion. See updated screenshot in next comment #8781 (comment)

As explained (in an edit) at #8780 the hyphenation in the first table worked because it used tabulary. The second table also uses tabulary but with the cell containing a list, it fails without extra intervention.

Tabular and longtable do not hyphenate without intervention the first word of a paragraph in a table cell.

This phenomenon is not related to tables, but to narrow width. So it will also occur with #8779 adding support for hlist's in LaTeX, which a bit like table have columns of sometimes narrow width.

This being said, I am hesitant about this PR. It seems costly to add the \hskip0pt\relax at start of each paragraph (I didn't use \hspace{0pt} for reasons of needing to get rid of influence of space token in source after it, possibly, in case of an item list for example this can cause extra whitespace). I had to change many test files to reflect the change.

Also, for this PR to be safe it must be certain that the \hskip0pt\relax is really followed always with words: if for example it is first in a list item and then comes a list, that nested list will suffer a vertical shift.

Besides, perhaps we should not merge this at all but only mention the work-around that I had used to fix a problem with our pdf docs. The trick is simply to define a substitution this way:

.. |LaTeXHyphenate| raw:: latex

                    \hspace{0pt}

and use it manually where needed, like this

|LaTeXHyphenate|\ longwordatstartofaparagraphinnarrowwidthcontext

I have no strong opinion. And I hesitated to ask for merg in 3.x or master branch.

This is done by adding '\hskip0pt\relax\n' whenever a paragraph starts.
self.body.append('\n')
# the \hskip0pt\relax is to allow hyphenation of first word of
# a paragraph in narrow contexts such as in a table cell
self.body.append('\\hskip0pt\\relax\n')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is crucial that this is never inserted before "vertical" material, in particular before a list. Else it will create extra vertical space. I hope that visit_paragraph() will always correspond to starting an actual paragraph in TeX sense of the word. But with substitutions I am not sure.

Adding this rather for example at start of each list item like I first considered (due to relevance to hlist of #8779) can not work because the item might itself be a list and adding this will, as I mentioned, then create extra whitespace due to entering horizontal mode with no reason, only to leave it immediately for a "vertical" structure.

Took me some time to change all test files, but if there is a doubt here, better to not merge this and find another way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this has a bug of suppressing in output a blank line which is needed by TeX to recognize paragraphs. I am correcting this stupidity in next commit, which will abstract the modification into \sphinxAtStartPar.

table having …
\begin{itemize}
\item {}
\item {} \hskip0pt\relax
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about modifiying visit_list_item but as the item might be a nested sublist, the insertion must come from actual paragraph. Notice here that the blanks in the sequence {} \hskip0pt\relax introduces no extra space at all in pdf output.

Copy link
Contributor Author

@jfbu jfbu Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now \hskip0pt\relax is modified into \sphinxAtStartPar, and here it would be on next line so the \item {}<space> line remains exactly as in current Sphinx. So this will not modify things for those who parse output file for this specific line for whatever purpose.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 29, 2021

The \sphinxAtStartPar line is added as first line of almost all text paragraphs. Not in footnotes, but there is already there \sphinxAtStartFootnote which indirectly also allows TeX to hyphenate first word.

To review this, the thing to check is whether this insertion of \sphinxAtStartPar will really only happen before things that start a paragraph in TeX sense. Indeed it triggers TeX entering into "LR mode" (LaTeX vocabulary), or "horizontal mode" (TeXbook) and if immediately after that there is for example a list environment, this will cause TeX to finish the horizontal mode, hence output in pdf a blank line, resulting in extra vertical space. Also, if a blank line in tex file follows the \sphinxAtStartPar this will cause a blank line in output.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 29, 2021

Here is updated test file
index.rst.txt

With current Sphinx it produces this

Capture d’écran 2021-01-29 à 11 14 05

With this PR it will give this

Capture d’écran 2021-01-29 à 11 16 06

and
Capture d’écran 2021-01-29 à 11 16 15

Here, the hor- is still too long to fit, so it gets shifted down vertically from being first list item, but this is extreme situation and we can't do much about it. If the column had been a bit wider, it would have fitted:

Capture d’écran 2021-01-29 à 11 20 41

note: some captions in the screenshots say that lists in table cells trigger usage of tabular or longtable and not tabulary. However if one uses tabularcolumns directive one can again force usage of tabulary even with a list in a cell.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 29, 2021

Relates to #3042 but does not fix it.

Else, a non-hyphenatable long word as first word in a narrow column in a
longtable/tabular (with column type e.g. p{1cm} from tabularcolumns
directive) gets shifted downwards vertically in PDF output.

Memo:

1. I did not find other cases where such a vertical shift may occur (I
tried with deeply nested lists and artificial words such as 'A'*32) with
LaTeX mark-up produced by Sphinx,

2. but with the support of hlist directive via PR sphinx-doc#8779 using multicols
environment, there is again this situation of downwards shift of
non-hyphenatable long first words.  But it occurs whether or not
\sphinxAtStartPar is used (\nobreak does not modify this).
Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!! +1 for merging. If your worry came from the vertical material of LaTeX, you don't need to care it because the children of paragraph node is all inline nodes (basically).

self.body.append('\n')
# the \sphinxAtStartPar is to allow hyphenation of first word of
# a paragraph in narrow contexts such as in a table cell
self.body.append('\n\\sphinxAtStartPar\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The children of paragraph node are regularly text nodes or inline nodes. It is defined at the doctree specification.
https://docutils.sourceforge.io/docs/ref/doctree.html

Almost of the inline nodes are decorated text. But, some of them are not; for example, math, image, problematic and raw. Additionally, some extensions might generate broken doctree under the paragraph node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A math node is fine (via the math role), an image (via a substitution) also is no problem. As per raw latex, also inserted by a substitution then if the raw latex starts some list environment it will be shifted one line down, but it seems then a user error to do this, the raw directive should have been used directly to insert the LaTeX "display" material in place of "inline" material. I can not think of a natural example causing a problem, after all docutils "paragraph" concept pretty well matches with latex "paragraph" concept I hope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. From my perspective, docutils' paragraph model and LaTeX's horizontal mode are similar. (But I did not mention that yesterday because I'm not familiar with LaTeX's mode :-p)

in a narrow context (like a table cell). For ``'lualatex'`` which
does not need the trick, the `\sphinxAtStartPar` does nothing.

.. versionadded:: 3.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this change will break PDF generation. But it would be better to add big change into the next major release. It's planned for this April.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge the change can not break PDF generation. As far as I can tell, at worst it could add an extra blank line if the first child is implemented via a "vertical" environment: typically a list, but this seems excluded: in my understanding a list can not be first child of a paragraph node. An image is not a problem if it is not embedded into "figure" (for example I image inserted from a substitution). I tested that math like equation or align do not seem to be affected. At any rate there is no way I can see this can break PDF build whatever nodes follow. The only thing it can cause is this extra blank line. But I can not construct an example from rst source creating this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the best of my knowledge the change can not break PDF generation.

Okay. I understand and agreed this does not break PDF generation. Let us see what will be reported.

(for example I image inserted from a substitution)

Fortunately, reST's substitution only allows inline elements. Nobody can't insert non "inline" elements under the paragraph node without hacks (by raw node or extensions).

A substitution definition block contains an embedded inline-compatible directive (without the leading ".. "), such as "image" or "replace".
https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#substitution-definitions

@jfbu
Copy link
Contributor Author

jfbu commented Jan 29, 2021

Thank you @tk0miya always for reviewing and advice! What this code does is simply to add in LaTeX output at the start of (most) paragraphs a macro \sphinxAtStartPar which does almost nothing (it inserts horizontal glue of zero width). This is a trick to allow hyphenation of first word, especially in tables (tables rendered by tabulary do exactly this same trick already from the latex code of package tabulary, but if user has employed tabularcolumns with some p{} specifier, this would be lost). With lualatex the added macro does nothing because lualatex engine does not have this limitation of original (La)TeX.

I think this can not be breaking change, thus now merging into 3.x!

@jfbu jfbu merged commit 879bf54 into sphinx-doc:3.x Jan 29, 2021
@jfbu jfbu deleted the latex_hyphenation_of_first_word branch January 29, 2021 20:08
@tk0miya
Copy link
Member

tk0miya commented Jan 30, 2021

🎉

jfbu added a commit to jfbu/sphinx that referenced this pull request Jan 30, 2021
…inx-doc#8781

This is cosmetic as the blank line starting varwidth environment used
for merged table cells in latex output changed nothing to PDF.

Nevertheless I extended a unit test to have a multi-paragraph merged
cell using varwidth. What is important is that \sphinxAtStartPar line
itself is never followed by blank line.
jfbu added a commit to jfbu/sphinx that referenced this pull request Jan 30, 2021
…inx-doc#8781

This is cosmetic as the blank line starting varwidth environment used
for merged table cells in latex output changed nothing to PDF.

Nevertheless I extended a unit test to have a multi-paragraph merged
cell using varwidth. What is important is that \sphinxAtStartPar line
itself is never followed by blank line.
jfbu added a commit that referenced this pull request Jan 30, 2021
…r_8781

Let latex writer line trimming from depart_entry() work as before #8781
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants