Skip to content

poller.continuation_token() crashes if initial request use file stream as input #38713

Open
@anatolip

Description

@anatolip
  • Package Name: azure-ai-documentintelligence
  • Package Version: 1.0.0b4
  • Operating System: Windows
  • Python Version: 3.12.7

Describe the bug
poller.continuation_token() crashes if input file is passed as an octet-steam into initial begin_analyze_document call.

Exception is below:

Traceback (most recent call last):
  File "C:\temp\SDK_issue_report.py", line 25, in <module>
    poller_continuation_token = poller.continuation_token() # get continuation token FAILS
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\anatolip\AppData\Local\anaconda3\envs\py12\Lib\site-packages\azure\core\polling\_poller.py", line 224, in continuation_token
    return self._polling_method.get_continuation_token()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\anatolip\AppData\Local\anaconda3\envs\py12\Lib\site-packages\azure\core\polling\base_polling.py", line 651, in get_continuation_token
    return base64.b64encode(pickle.dumps(self._initial_response)).decode("ascii")
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'BufferedReader' instances

To Reproduce
Code below illustrates that continuation_token() works if file is passed to begin_analyze_document as base64 string, but if file is passed as a octet-stream, get continuation_token() fails.

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
import base64, pickle, os

endpoint = os.environ["AZURE_DI_ENDPOINT"]
key = os.environ["AZURE_DI_KEY"]

path_to_sample_documents = r'c:\temp\SampleInvoice26andLineItems.pdf'

client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

with open(path_to_sample_documents, "rb") as f:
    poller = client.begin_analyze_document("prebuilt-layout", AnalyzeDocumentRequest(bytes_source=f.read()))
poller_continuation_token = poller.continuation_token() # get continuation token WORKS
decoded_token = pickle.loads(base64.b64decode(poller_continuation_token))
resultUrl = decoded_token.http_response.headers.get('Operation-Location')
print(f"Results URL from continuation_token: {resultUrl} \n")

with open(path_to_sample_documents, "rb") as f:
    poller = client.begin_analyze_document("prebuilt-layout", f, content_type="application/octet-stream")
print(f"ERROR during continuation_token!!!\n")
poller_continuation_token = poller.continuation_token() # get continuation token FAILS

Metadata

Metadata

Assignees

No one assigned

    Labels

    Azure.CoreClientThis issue points to a problem in the data-plane of the library.bugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions