Description
Hey,
Im trying to use pypdf to fill some templates. We now moved to testing and noticed that some of the form fields are not filled properly because of some field naming issues. I digged a bit and figured out that field names including a .
causing the problem.
In our use case we got multiple categories in which all of them include a 1_Name
Field. When inspecting the pdf in Acrobat I figured that the fields actually named 1_Name
2.1_Name
, ... and so on. With get_form_text_field
I only get back the first field. While get_fields
results in two fields one named 1_Name
and the second just 2
. (See Output further down)
While checking with pdftk the field names are retrieved properly.
Since I can't share the initial pdf I was testing with I created a small sample pdf with just two text fields named 1_Name
and 2.1_Name
To confirm it's indeed the .
causing the problem I tested renaming the 2.1_Name
field to 2_1_Name
which works as expected.
As a side effect of this problem all fields named x.1_Name
are filled with the same value when filling the fields. When running update_page_form_field_values
with "1_Name": "test
, both fields are filled with test
. I did not include this in the example as I think this will be fixed when namings are correct.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
macOS-13.1-arm64-arm-64bit
$ python -c "import pypdf;print(pypdf.__version__)"
3.1.0
Initially I was running on Version 2.10 - I updated to 3.1 with the same result.
Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader
template = "tests/field_sample.pdf"
reader = PdfReader(template)
form = reader.get_fields()
textfields=reader.get_form_text_fields()
print('TextFields: ',textfields)
print('FormFields: ',form)
Output
TextFields: {'1_Name': None}
FormFields: {'1_Name': {'/T': '1_Name', '/FT': '/Tx', '/Parent': {'/Kids': [IndirectObject(25, 0, 4342149280)], '/T': '2'}}, '2': {'/T': '2', '/Kids': [IndirectObject(25, 0, 4342149280)]}}
Renaming all fields to not include a .
is an option I would like to dodge since we have quite a number of Templates.
I hope this is an easy fix on your side. Could you let me know if you are looking into this issue and what timeframe to expect a possible fix?
Best,