Skip to content

Azure Document Intelligence big files issue  #31025

Open

Description

I am using Azure document Intelligence to read (OCR) my pdf files, my code is Ok with files less than 100 pages and 200 MB in size. but when I want to pass this limit, I face this error in my code

Unexpected error: RestError: Error reading response as text: aborted
{
"name": "RestError",
"code": "PARSE_ERROR",
"message": "Error reading response as text: aborted"
}

I have also checked Document intelligence limitation for my tier subscription and it support files up to 500 MB and 2000 pages which I am not passing that limit.

I am using node version 20.16.0
Windows 10
@azure/ai-form-recognizer --> 5.0.0
@azure/storage-blob --> 12.17.0

here is my code

import {
DocumentAnalysisClient,
AzureKeyCredential,
} from '@azure/ai-form-recognizer';
import {
BlobSASPermissions,
BlobServiceClient,
ContainerClient,
RestError,
StorageSharedKeyCredential,
generateBlobSASQueryParameters,
} from '@azure/storage-blob';
import { Injectable } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import * as fs from 'fs/promises';

@Injectable()
export class DocumentInteligenceService {
private documentAnalysisClient: DocumentAnalysisClient;
private endpoint;
private apiKey;
private readonly connectionString: string;
private readonly containerName: string;
private readonly blobServiceClient: BlobServiceClient;
private readonly storageAccountName: string;
private readonly storageAccountKey: string;
private readonly containerClient: ContainerClient;

constructor(private configService: ConfigService) {
this.endpoint = this.configService.get(
'AzureFormRecognizer.Endpoint',
);
this.apiKey = this.configService.get('AzureFormRecognizer.ApiKey');

this.documentAnalysisClient = new DocumentAnalysisClient(
  this.endpoint,
  new AzureKeyCredential(this.apiKey),
);

this.connectionString = this.configService.get<string>(
  'AzureStorageAccount.ConnectionString',
);

this.containerName = this.configService.get<string>(
  'AzureStorageAccount.ContainerName',
);

this.storageAccountName = this.configService.get<string>(
  'AzureStorageAccount.StorageAccountName',
);
this.storageAccountKey = this.configService.get<string>(
  'AzureStorageAccount.StorageAccountKey',
);

this.blobServiceClient = BlobServiceClient.fromConnectionString(
  this.connectionString,
);

this.containerClient = this.blobServiceClient.getContainerClient(
  this.containerName,
);

}

async analyzeDocumentLayout(blobUrl: string): Promise {
try {
const poller =
await this.documentAnalysisClient.beginAnalyzeDocumentFromUrl(
'prebuilt-read',
blobUrl,
{
onProgress: (state) => console.log(Status: ${state.status}),
},
);
const result = await poller.pollUntilDone();

  const resultText = JSON.stringify(result, null, 2);

  await fs.writeFile(`test.json`, resultText, 'utf-8');
  console.log('The results have been saved to a text file.');
} catch (error) {
  if (error instanceof RestError) {
    console.error('Error:', error.message);
    // Add logic to retry or handle specific RestError scenarios.
  } else {
    console.error('Unexpected error:', error);
  }
}

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Cognitive - Form RecognizerService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions