Skip to content

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

@hiroci

Description

@hiroci

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This raises a TypeError missing 1 required positional argument: 'body' (trying to use the bytes_source parameter)

endpoint = ""
key = ""
loader = AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint, api_key=key, mode='single',
    bytes_source=b'%PDF-1.7\n...%',
)

loader.load()

Seems like the error is in the parse_bytes function of the file /langchain_community/document_loaders/parsers/doc_intelligence.py, line 116

all of the other parsers in this file do not specify the name for the second argument in self.client.begin_analyze_document

Example of working parser:

def parse_url(self, url: str) -> Iterator[Document]:
      from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
    
      poller = self.client.begin_analyze_document(
          self.api_model,
          AnalyzeDocumentRequest(url_source=url),
          # content_type="application/octet-stream",
          output_content_format="markdown" if self.mode == "markdown" else "text",
      )
      result = poller.result()
...

Parser that does NOT work

def parse_bytes(self, bytes_source: bytes) -> Iterator[Document]:
      from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
  
      poller = self.client.begin_analyze_document(
          self.api_model,
          analyze_request=AnalyzeDocumentRequest(bytes_source=bytes_source),
          # content_type="application/octet-stream",
          output_content_format="markdown" if self.mode == "markdown" else "text",
      )

The parse_bytes function does not work properly, the second parameter should be body=... instead of analyze_request or do not specify the name of the parameter at all

Error Message and Stack Trace (if applicable)

File "/home/projects/intelligent_chat-be/server/routers/v1/conversation/file_loader.py", line 114, in _load_azure
document = loader.load()
^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/doc_intelligence.py", line 105, in lazy_load
yield from self.parser.parse_bytes(self.bytes_source)
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/parsers/doc_intelligence.py", line 116, in parse_bytes
poller = self.client.begin_analyze_document(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body'

Description

I'm trying to use the azure document intelligence loader from langchain to process a sequence of bytes

System Info

System Information

OS: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Python Version: 3.12.8 (main, Dec 4 2024, 08:54:12) [GCC 11.4.0]

Package Information

langchain_core: 0.3.28
langchain: 0.3.13
langchain_community: 0.3.13
langsmith: 0.2.4
langchain_openai: 0.2.14
langchain_qdrant: 0.2.0
langchain_text_splitters: 0.3.4
langgraph_sdk: 0.1.48

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.11
async-timeout: Installed. No version info available.
dataclasses-json: 0.6.7
fastembed: Installed. No version info available.
httpx: 0.27.2
httpx-sse: 0.4.0
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 2.1.2
openai: 1.58.1
orjson: 3.10.12
packaging: 24.1
pydantic: 2.9.2
pydantic-settings: 2.6.1
PyYAML: 6.0.2
qdrant-client: 1.12.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.36
tenacity: 9.0.0
tiktoken: 0.8.0
typing-extensions: 4.12.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions