Skip to content

Field names are not resolved properly #1860

Closed as not planned
Closed as not planned
@TimoVFX

Description

@TimoVFX

Hey,
Im trying to use pypdf to fill some templates. We now moved to testing and noticed that some of the form fields are not filled properly because of some field naming issues. I digged a bit and figured out that field names including a . causing the problem.

In our use case we got multiple categories in which all of them include a 1_Name Field. When inspecting the pdf in Acrobat I figured that the fields actually named 1_Name 2.1_Name, ... and so on. With get_form_text_field I only get back the first field. While get_fields results in two fields one named 1_Name and the second just 2. (See Output further down)

While checking with pdftk the field names are retrieved properly.

Since I can't share the initial pdf I was testing with I created a small sample pdf with just two text fields named 1_Name and 2.1_Name

To confirm it's indeed the . causing the problem I tested renaming the 2.1_Name field to 2_1_Name which works as expected.

As a side effect of this problem all fields named x.1_Name are filled with the same value when filling the fields. When running update_page_form_field_values with "1_Name": "test, both fields are filled with test. I did not include this in the example as I think this will be fixed when namings are correct.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-13.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf.__version__)"
3.1.0

Initially I was running on Version 2.10 - I updated to 3.1 with the same result.

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

    template = "tests/field_sample.pdf"
    reader = PdfReader(template)
    form = reader.get_fields()
    textfields=reader.get_form_text_fields()
    print('TextFields: ',textfields)
    print('FormFields: ',form)

field_sample.pdf

Output

TextFields:  {'1_Name': None}
FormFields:  {'1_Name': {'/T': '1_Name', '/FT': '/Tx', '/Parent': {'/Kids': [IndirectObject(25, 0, 4342149280)], '/T': '2'}}, '2': {'/T': '2', '/Kids': [IndirectObject(25, 0, 4342149280)]}}

Renaming all fields to not include a . is an option I would like to dodge since we have quite a number of Templates.
I hope this is an easy fix on your side. Could you let me know if you are looking into this issue and what timeframe to expect a possible fix?

Best,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions