Open
Description
- Package Name: azure-document-intelligence
- Package Version: 1.0.0b4
- Operating System: Windows
- Python Version: 3.10
Describe the bug
I want to confirm the proper way to stream large files. Does using AnalyzeDocumentRequest
create a JSON payload? (which is less efficient?)
async def get_analyze_result(self, document_data: bytes) -> AnalyzeResult:
"""
Get markdown of a document
"""
document_intelligence_client = DocumentIntelligenceClient(
endpoint=self.document_intelligence_endpoint,
credential=AzureKeyCredential(key=self.document_intelligence_key),
)
async with document_intelligence_client:
poller = await document_intelligence_client.begin_analyze_document(
analyze_request=AnalyzeDocumentRequest(
bytes_source=document_data),
model_id="prebuilt-layout",
output_content_format=ContentFormat.MARKDOWN,
)
analyze_result = await poller.result()
return analyze_result
Does the following code stream the file without blocking the thread? (I don't think a BufferedReader
has async methods) What is the chunk size?
with open(path_to_sample_documents, "rb") as f:
poller = await document_intelligence_client.begin_analyze_document(
model_id=model_id, analyze_request=f, content_type="application/octet-stream"
)
result: AnalyzeResult = await poller.result()
Expected behavior
I was expecting an AsyncBufferedReader
to not block the current thread or avoid having to create other threads.
import aiofiles
async with aiofiles.open('t.pdf', mode='rb') as f: # AsyncBufferedReader
content = await f.read()
I intend to use it with fastapi UploadFile
which has a await file.read(size) method. Maybe creating a protocol will be needed so that it works with both AsyncBufferedReader
and UploadFile
.
Metadata
Metadata
Assignees
Labels
This issue points to a problem in the data-plane of the library.Workflow: This issue is responsible by Azure service team.Issues that are reported by GitHub users external to the Azure organization.Workflow: This issue needs attention from Azure service team or SDK teamThe issue doesn't require a change to the product in order to be resolved. Most issues start as that