Skip to content

Set of useful Azure Functions for doing advanced text processing such as Text Summarization and OCR on unstructured documents using Cognitive Services and Azure Functions

License

Notifications You must be signed in to change notification settings

Yahnoosh/AzureSearch-AzureFunctions-CognitiveServices

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enriching Unstructured Documents in an Azure Search Index using Azure Functions

Set of useful Azure Functions for doing advanced text processing such as Text Summarization and OCR on unstructured documents using Cognitive Services and Azure Functions

The goal of these Azure Functions is to allow you to do additional processing over unstructured contents stored in Azure Blob's such as PDF and Office Docs. The results of the processing allows you to enrich your Azure Search index with additional content. For example, if there are images in your PDF, you can leverage Azure Cognitive Services to extract the text from the images which are then added to the Azure Search index to be searchable. Below are the examples currently available.

##Requirements

##OCR - Optical Character Recognition This includes an Azure Functions that leverages the Azure Blob Trigger that calls an Azure Function whenever a file is added to the configured blob store. This function extracts the images from the PDF and then calls the Azure Cognitive Services Vision API to extract the text from them images which is then added to your Azure Serach index. Currently it only supports PDF processing, but could easily be extended for other file types.

##Text Summarization This includes an Azure Functions that leverages the Azure Blob Trigger that calls an Azure Function whenever a file is added to the configured blob store. This function extracts the text from the PDF and then calls the Azure Cognitive Services Text Analytics API to extract the key phrases from the text. The top phrases are then used to find sentences in the text which best represent the overall text. This summarized test is then added to your Azure Serach index. Currently it only supports PDF processing, but could easily be extended for other file types.

About

Set of useful Azure Functions for doing advanced text processing such as Text Summarization and OCR on unstructured documents using Cognitive Services and Azure Functions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%