PDFPlumberParser error: 'list' object has no attribute "name" #26528

ZaraP-NSTARX · 2024-09-16T15:39:41Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.document_loaders import PDFPlumberLoader
from PIL import ImageFile
import pickle

Configuring Libraries

ImageFile.LOAD_TRUNCATED_IMAGES = True

Main Program Code

files = ["./docs/cayenne.pdf",
"./docs/cullinan.pdf",
"./docs/aventador.pdf",
"./docs/performante.pdf"]

loaders = []
for file in files:
loaders.append(PDFPlumberLoader(file, extract_images = True))

docs = []
for loader in loaders:
docs.extend(loader.load())

with open("./docs_processed/pdfplumber_docs.txt", "wb") as file:
pickle.dump(docs, file)

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
File "/home/work/Local Work Files/langchain-doc-loaders/process_docs.py", line 21, in
docs.extend(loader.load())
^^^^^^^^^^^^^
File "/home/work/.pyenv/versions/docloader-venv/lib/python3.12/site-packages/langchain_community/document_loaders/pdf.py", line 644, in load
return parser.parse(blob)
^^^^^^^^^^^^^^^^^^
File "/home/work/.pyenv/versions/docloader-venv/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 126, in parse
return list(self.lazy_parse(blob))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/.pyenv/versions/docloader-venv/lib/python3.12/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 397, in lazy_parse
+ self._extract_images_from_page(page),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/.pyenv/versions/docloader-venv/lib/python3.12/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 428, in _extract_images_from_page
if img["stream"]["Filter"].name in _PDF_FILTER_WITHOUT_LOSS:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'name'

Description

I'm using the PDF Plumber LangChain implementation to extract image information from a set of car brochure PDFs. The error is located in the langchain_community/document_loaders/parsers/pdf.py file on line 428.

I was able to fix this issue by deleting ".name" from the conditional on line 428. I'm opening this issue so I can create a PR with the fix.

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu, 12 Sep 2024 17:21:02 +0000
Python Version: 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805]

Package Information

langchain_core: 0.2.40
langchain: 0.3.0
langchain_community: 0.3.0
langsmith: 0.1.120
langchain_text_splitters: 0.3.0
langchain_unstructured: 0.1.2

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.5
async-timeout: Installed. No version info available.
dataclasses-json: 0.6.7
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
orjson: 3.10.7
packaging: 24.1
pydantic: 2.9.1
pydantic-settings: 2.5.2
PyYAML: 6.0.2
requests: 2.32.3
SQLAlchemy: 2.0.34
tenacity: 8.5.0
typing-extensions: 4.12.2
unstructured-client: 0.24.1
unstructured[all-docs]: Installed. No version info available.

…hould be resolved now.

ZaraP-NSTARX added a commit to ZaraP-NSTARX/langchain that referenced this issue Sep 16, 2024

Fixed issue langchain-ai#26528, PDFPlumberParser error

869a6fa

ZaraP-NSTARX added a commit to ZaraP-NSTARX/langchain that referenced this issue Sep 16, 2024

Missed another instance of the same issue, issue langchain-ai#26528 s…

0fee721

…hould be resolved now.

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDFPlumberParser error: 'list' object has no attribute "name" #26528

PDFPlumberParser error: 'list' object has no attribute "name" #26528

ZaraP-NSTARX commented Sep 16, 2024

PDFPlumberParser error: 'list' object has no attribute "name" #26528

PDFPlumberParser error: 'list' object has no attribute "name" #26528

Comments

ZaraP-NSTARX commented Sep 16, 2024

Checked other resources

Example Code

Configuring Libraries

Main Program Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies