Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX: optionally apply a second forceful wrapping of long code lines #8854

Merged
merged 9 commits into from
Feb 9, 2021

Conversation

jfbu
Copy link
Contributor

@jfbu jfbu commented Feb 8, 2021

Closes #8849

To try it out:

latex_elements = {
    'sphinxsetup': "verbatimforcewraps",
}

This is a bit experimental as I need to confirm that Pygments LaTeXFormatter output, as we use it, always comply with the pre-conceptions which are summarized in the latex code comments.

For this input:

DryGASCON128k56:

.. code-block:: shell

   $ python3 -m drysponge.drygascon128_aead e 000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F3031323334353637 000102030405060708090A0B0C0D0E0F "" ""
   28830FE67DE9772201D254ABE4C9788D

.. code-block:: shell

   $ python3 -m drysponge.drygascon128_aead e "000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F3031323334353637" 000102030405060708090A0B0C0D0E0F "" ""
   28830FE67DE9772201D254ABE4C9788D

the following output is obtained in PDF:

Capture d’écran 2021-02-08 à 23 15 03

Non highlighted strings like in first example can break at each character. Highlighted strings only every four characters. That here it breaks without one or two characters going in margin or one or two spaces at end is luck. Here is with removing one character before on same line, we see now it breaks one character short of ending line.

Capture d’écran 2021-02-08 à 23 20 53

Actually I could make even highlighted strings have breakpoints at each character. But it makes me pain to let the computer suffer with lots of extra work.

@jfbu
Copy link
Contributor Author

jfbu commented Feb 8, 2021

This breaks with Unicode in the code-blocks.... I got a failure from our own sphinx.pdf due to the examples with directory trees, with a test where the "force wrap" method is to be used always. But then it worked with using latex_engine = 'lualatex', i.e. all code-blocks of our own docs were forced to use the experimental code and no error was raised. (even with the feature activated, not a single code-block in our pdf docs by itself triggers the experimental code, because all are already wrapping fine if needed).

@jfbu jfbu force-pushed the latex_longstringsincodeblocks branch from 7772b2a to 0076ad3 Compare February 8, 2021 23:01
\def\spx@PYGspec{{#1}}%
\spx@PYG#2\@empty\@empty\@empty\@empty\relax
}%
\def\spx@PYG#1#2#3#4{%
Copy link
Contributor Author

@jfbu jfbu Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that we split the argument of the highlighting macro (which could for example do \colorbox) in small bits of 4 tokens, so that highlighting and breakpoints can both cohabit.

But in case of Unicode code points and latex_engine is "pdflatex", this will break because each of #1, #2, #3, #4 is an octet so we are possibly cutting apart a multi-bytes Unicode character. It would be lots of complications to make it work.

However if Unicode code-points are frequent in source it is better to use 'lualatex', 'xelatex', and 'uplatex'. I did tests with 'lualatex' and it works fine (although much slower than pdflatex).

@sebastien-riou
Copy link

@jfbu
Thanks, that seems to fix it indeed. I cannot really try that on my side (by lack of skills to modify my sphinx installation). looking forward to see it in a release.

This needs special coding only for pdflatex, Unicode TeX engines already
handle Unicode characters as one token.

The utf8x LaTeX input encoding is not supported, only utf8.
@jfbu
Copy link
Contributor Author

jfbu commented Feb 9, 2021

I have added Unicode support for the 'pdflatex' engine as well as a way to customize when forceful algorithm is triggered. In particular it is possible to have all codelines be systematically hard-wrapped at the available width, if desired.

Now, the output looks like this (I removed letters/digits to change length, so examples are only typographical):

Capture d’écran 2021-02-09 à 11 31 00

In the above I added some non-ascii letters, to check with 'pdflatex' engine. I used

latex_elements = {
    'fontenc': "\\usepackage[X2,LGR,T1]{fontenc}",
    'sphinxsetup': "verbatimforcewraps",
}

to allow Greek and Cyrillic. As the font changes, the character width also, and as a result, we see a letter sticking a bit in margin, but this is exceptional. If we typeset the same using Unicode engine, all is exactly aligned:

Capture d’écran 2021-02-09 à 11 31 43

and I used for this

latex_engine = 'xelatex'

latex_elements = {
#    'fontenc': "\\usepackage[X2,LGR,T1]{fontenc}",
    'sphinxsetup': "verbatimforcewraps",
}

Even with 'pdflatex' all will be aligned with non-ascii letters not from Greek or Cyrillic alphabets which trigger a font substitution.

To hard-wrap all code-blocks at the maximal width:

latex_elements = {
    'sphinxsetup': "verbatimforcewraps, verbatimmaxunderfull=0",
}

% box does not store in an accessible way what was the maximal
% line-width during paragraph building.
%
% If the max width exceed the linewidth by at least 4 character
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3 in 4=3+1 is in truth the value of verbatimmaxoverfull key to 'sphinxsetup'

@jfbu
Copy link
Contributor Author

jfbu commented Feb 9, 2021

The Changes only indicates in this PR that #8849 is fixed but makes no mention so far of the new sphinxsetup keys verbatimforcewraps, verbatimmaxoverfull, verbatimmaxunderfull. Perhaps a line should be added to "Features added" to says something like "possibility to hard-wrap code lines at the linewidth (see verbatimforcewraps)". Else people reading only change log on new release might think #8849 is fixed without user setting, but for security I have set the default of verbatimforcewraps to inactive. (it will probably break people who add extra decoration to the code-blocks such as raw LaTeX mark-up inside the code-blocks, for example in the comments, using Pygments texcomments)

@jfbu
Copy link
Contributor Author

jfbu commented Feb 9, 2021

@jfbu
Thanks, that seems to fix it indeed. I cannot really try that on my side (by lack of skills to modify my sphinx installation). looking forward to see it in a release.

@sebastien-riou
To test you can start from some project and go the latex build directory. Overwrite the sphinx.sty there with the sphinx.styfrom this PR. You also need to open the .tex main document and replace the line \usepackage{sphinx} by the line \usepackage[verbatimforcewraps]{sphinx}. Finally, from the command line in this directory emit make all-pdf or directly pdflatex <name of main tex file>.

Due to potential frailty the verbatimforcewraps will not be default at 3.5.0 release. Once this release is done you need to add to your conf.py

latex_elements['sphinxsetup'] = "verbatimforcewraps"

Copy link
Member

@tk0miya tk0miya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits.

CHANGES Outdated
@@ -131,6 +131,7 @@ Bugs fixed
* #8780: LaTeX: long words in narrow columns may not be hyphenated
* #8788: LaTeX: ``\titleformat`` last argument in sphinx.sty should be
bracketed, not braced (and is anyhow not needed)
* #8849: LaTex: code-block printed out of margin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you commented by yourself, it would be better to mention new configurations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, done. I will merge after testing completes. The behaviour is "opt-in" so no change to users if not using the feature.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LaTex] code-block printed out of margin
3 participants