Description
Bug report
When running pygettext --docstrings file.py
on Python 3.7 and above, the module docstring does not get extracted.
Reproduction steps:
- Create
repro.py
with the following contents (actually you can omit everything but the first three lines):
"""
Module docstring
"""
class X:
"""class docstring"""
def method(self):
"""method docstring"""
def function():
"""function docstring"""
- Try running:
python pygettext.py --docstrings repro.py
- Look at the
messages.pot
that was created and see that it doesn't contain the module docstring:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2022-08-06 00:54+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
#: repro.py:6
#, docstring
msgid "class docstring"
msgstr ""
#: repro.py:9
#, docstring
msgid "method docstring"
msgstr ""
#: repro.py:13
#, docstring
msgid "function docstring"
msgstr ""
The reason for this appears to be that pygettext doesn't account for token.ENCODING
which was added in Python 3.7.
A simple solution for this would be to skip tokenize.ENCODING
here:
cpython/Tools/i18n/pygettext.py
Lines 338 to 340 in 29650fe
This actually reveals another bug which is caused by the return
in the line 340 - detection of module docstring causes pygettext to swallow one token without handling it. This means that for a code like this:
class X:
"""class docstring"""
pygettext
will not extract the docstring of class X once the solution gets applied if proper care isn't taken. I'm mentioning it so that the fix is tested with both of these cases.
Your environment
- CPython versions tested on: 3.7.13 (installed from deadsnakes ppa), 3.10.4 (default Python on my system)
- Operating system and architecture: Ubuntu 22.04 LTS
Thepygettext.py
script was taken directly from this repository, I'm not sure that my distro even has a package that ships it.