page_type | languages | products | name | description | azureDeploy | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
sample |
|
|
Using Azure OpenAI GPT-4o to extract structured JSON data from PDF documents |
This sample demonstrates how to use GPT-4o to extract structured JSON data from PDF documents using Azure OpenAI. |
This sample demonstrates how to use GPT-4o to extract structured JSON data from PDF documents, such as invoices, using the Azure OpenAI Service.
This approach takes advantage of the GPT-4o model's ability to understand the structure of a document and extract the relevant information using vision capabilities. This approach provides the following advantages:
- No requirement to train a custom model: GPT-4o is a pre-trained model that can be used to extract structured data from PDF documents without the need to train a custom model for your specific document types. This can save time and resources, especially for organizations that need to process a wide variety of document types.
- Extraction by prompt engineering: GPT-4o can extract structured data from documents with a defined JSON schema provided as a one-shot learning technique. This instructs the model to extract data is a defined format, providing a high level of accuracy for downstream processing.
- Ability to extract data from complex documents: GPT-4o can extract structured data from complex visual elements in documents, such as invoices, that contain tables, images, and other non-standard elements.
Important
GPT-4o accrues token-based charges like other Azure OpenAI models. Images are converted into tokens by converting your high resolution images into separate 512px tiled images. For more information, see the Azure OpenAI image token overview.
- Azure OpenAI Service, a managed service for OpenAI GPT models that exposes a REST API.
- GPT-4o (2024-05-13) model deployment
- Note: The GPT-4o model is not available in all Azure OpenAI regions. For more information, see the Azure OpenAI Service documentation.
- Azure Bicep, used to create a repeatable infrastructure deployment for the Azure resources.
Note
This sample comes prepared with a Invoice_1.pdf file that you can use to test the GPT-4o model. You can also use your own PDF files to test the model.
To deploy the infrastructure and test PDF data extraction using GPT-4o, you need to:
- Install the latest .NET SDK.
- Install PowerShell Core.
- Install the Azure CLI.
- Install Visual Studio Code with the Polyglot Notebooks extension.
The Sample.ipynb notebook contains all the necessary steps to deploy the infrastructure using Azure Bicep, and make requests to the deployed Azure OpenAI API to test the GPT-4o model with the provided PDF file.
Note
The sample uses the Azure CLI to deploy the infrastructure from the main.bicep file, and PowerShell commands to test the deployed Azure OpenAI API.
The notebook is split into multiple parts including:
- Login to Azure and set the default subscription.
- Deploy the Azure resources using Azure Bicep.
- Create image assets from the provided PDF file.
- Making requests to the deployed Azure OpenAI API to test the GPT-4o model with the PDF images to return structured JSON data.
Each steps is documented in the notebook with additional information and links to the relevant documentation.
After you have finished testing the GPT-4o model, you can clean up the resources using the following steps:
- Run the
az group delete
command to delete the resource group and all the resources within it.
az group delete --name <resource-group-name> --yes --no-wait
The <resource-group-name>
is the name of the resource group that can be found as the AZURE_RESOURCE_GROUP_NAME environment variable in the config.env file.