Skip to content

pygettext --docstrings doesn't actually extract module docstring due to tokenize returning ENCODING token #95731

Closed
@Jackenmen

Description

@Jackenmen

Bug report

When running pygettext --docstrings file.py on Python 3.7 and above, the module docstring does not get extracted.

Reproduction steps:

  1. Create repro.py with the following contents (actually you can omit everything but the first three lines):
"""
Module docstring
"""

class X:
    """class docstring"""

    def method(self):
        """method docstring"""


def function():
    """function docstring"""
  1. Try running: python pygettext.py --docstrings repro.py
  2. Look at the messages.pot that was created and see that it doesn't contain the module docstring:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2022-08-06 00:54+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"


#: repro.py:6
#, docstring
msgid "class docstring"
msgstr ""

#: repro.py:9
#, docstring
msgid "method docstring"
msgstr ""

#: repro.py:13
#, docstring
msgid "function docstring"
msgstr ""

The reason for this appears to be that pygettext doesn't account for token.ENCODING which was added in Python 3.7.

A simple solution for this would be to skip tokenize.ENCODING here:

elif ttype not in (tokenize.COMMENT, tokenize.NL):
self.__freshmodule = 0
return

This actually reveals another bug which is caused by the return in the line 340 - detection of module docstring causes pygettext to swallow one token without handling it. This means that for a code like this:

class X:
    """class docstring"""

pygettext will not extract the docstring of class X once the solution gets applied if proper care isn't taken. I'm mentioning it so that the fix is tested with both of these cases.

Your environment

  • CPython versions tested on: 3.7.13 (installed from deadsnakes ppa), 3.10.4 (default Python on my system)
  • Operating system and architecture: Ubuntu 22.04 LTS
    The pygettext.py script was taken directly from this repository, I'm not sure that my distro even has a package that ships it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions