Skip to content

azure-ai-documentintelligence samples for uploading base64 #33901

Closed

Description

Is your feature request related to a problem? Please describe.

I want to analyze base64 encoded documents through the new azure-ai-documentintelligence package.
It seems like I should do something like this:

from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeResult, AnalyzeDocumentRequest

client = DocumentIntelligenceClient("my_endpoint", AzureKeyCredential("my_key"))

my_base64 = b"..."
poller = client.begin_analyze_document(
    "prebuilt-layout",
    analyze_request=AnalyzeDocumentRequest(base64_source=my_base64),
    content_type="application/octet-stream",
)

But no matter what I do I'm greeted with the following error:

{
    "code": "InvalidContent",
    "message": "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats."
}

since the documentation for the AnalyzeDocumentRequest Link says that base64_source is of type Optional[bytes] I thought it'd work like this.

However all of the samples use this method either through:

  1. Passing a _io.BufferedReader like so:
with open("path/to/local/file", "rb") as f:
    poller = client.begin_analyze_document(
        "prebuilt-layout",
        analyze_request=f,
        content_type="application/octet-stream",
    )
  1. By passing a URL like so:
    url = "https://some.url/Invoice_1.pdf"
    poller = client.begin_analyze_document(
        "prebuilt-layout",
        AnalyzeDocumentRequest(url_source=url),
    )

Describe the solution you'd like

I'd like to get examples/tutorials for all the different options to upload to the document intelligence through the newly released package.
Especially interesting to me is uploading a base64 bytes object.

Describe alternatives you've considered

I tried to use from io import BytesIO and create a buffer like this and use the with BytesIO(my_base64) as buf but this didn't work either

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Document Intelligencecustomer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions