Source Code Repository for the Cognitive Search based Doctor Notes with Text Analytics Search App
If you simply want to show this code in a running instance, feel free to use https://doctornotessearchpoc.azurewebsites.net/. Otherwise, you can follow the setup instructions below to recreate your own instance in your Azure subscription.
Give doctors the ability to extract and find meaningful patient data from their notes, to either have a larger view for a patient, to find patterns or for research. How can we use AI to better understand to achieve this goal? In this code, we take a sample set of fake doctor notes and apply several machine learning techniques (name entity recognition of medical terms, finding semantically similar words, and knowledge graphs) providing medical professionals a better way to find and make sense of the research they need.
This repository containes to following assets and code:
- InvokeHealthEntityExtraction: An Azure Function to call the Text Analytics for Health container which is invoked as a custom skill in Azure Cognitive Services Skill-Sets
- Azure SQL Database
- AzureCognitiveSearchService: Jupyter notebook that will create data source, index, skillsets and indexer used by Azure Cognitive Search
- Web Application
- Github actions configuration to deploy the web application
If you are new or new-ish to Azure, at the end of this project you will have a better understanding of the following concepts:
- Azure Storage Accounts
- Azure Cognitive Services
- Azure SQL Server
- Azure Functions
- Azure App Services
- Advanced Azure Cognitive Search
- Azure Container Instances
- Jupiter Notebooks
- Github Actions
Data is pulled from an Azure SQL Database. The main indexer runs data in json format through a skillset which reshapes the data and extracts medical entities, and puts the enriched data in the search index, it also saves Azure Text for Analytics json to the database render marked-up text.
Listed below are the services needed for this solution, if you don't have an azure subscription, you can create a free one. If you already have an subscription, please make sure that your administration has granted access to the services below:
- Azure Subscription
- Azure SQL Serverless
- Cognitive Services
- Azure Container Instances
- Text Analytics for Health
- Azure Functions
- Storage Account
- Azure Cognitive Search
- Azure App Services
Programming Tools needed:
- VS Code to edit Azure Functions
- Visual Studio to edit web-app (this is only if you want to customize the application)
This project should take about 4 hours to complete
Before you begin, fork this repository to your own github account then download it to your local drive
- Azure account - login or create one
- Create a resource group
- Import database package
- Create a Storage Account
- Implement Text Analytics For Health
- Deploy InvokeHealthEntityExtraction Azure function
- Create Azure search service
- Run Notebook to configure Indexes and Data for Azure Search
- Deploy Website
First, you will need an Azure account. If you don't already have one, you can start a free trial of Azure here.
Log into the Azure Portal using your credentials
If you are new to Azure,a resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group, click here to learn how to create a group
Write the name of your resource group on a text file, we will need it later
Upload the file doctor-note-poc-bacpac located under the folder data-files to a storage account in your subscription. Import the database package to a serverless database, for more information on how to do this click here.
If you have never done this expand this section for detailed steps
Click on create new resource and search for SQL Server (logical server) and select that option
Click the create button
Select the resource group you previously created
Enter a name for the server and a location that matches the location of your resource group. Select use both SQL and Azure AD authentication, add yourself as Azure AD admin. Enter a not easy to guess user name and password for the server. Click Networking
Under firewall rules select Allow Azure Services and resources to access this server. Click Review + create
Verify all information is correct, click on "Create"
Once your database is created, navigate to your new SQL Server and click on Import Database
Once on the Import dabase select backup
Select the storage account where you uploaded the database file and navigate to the file. Click Select
Next click configure database
Under computer tier, select serverless, click ok
Enter a data base name, select SQL server authentication and enter the user name & password you defined for the SQL Server, click ok
Navigate to your SQL server, and select import/export history to see the progress of your import, once completed, navigate to databases to look at your new imported database
Once on your imported database, select Query editor and enter your user credentials. Loging will fail as you need to grant access to your IP address. Click on Allow IP server and then login
Once on the query screen copy and paste this sql statement and click Run to verify data was imported
Select * from DoctorNotes
Write the name of your sql server, database, username and password on a text file, we will need it later
Create a storage account and get the connection string, you will need this connection string for the next steps. If you have never done that, here is the documentation to do it.
Once your storage account is created, navigate to the storage account and create a container named doctor-notes-search
Write the name of your storage account, get the connection string and access key on the same text file, we will need it later
Our implementation uses the Text Analytics for Health container for medical entity extraction. Once you have received access, you will need to set up the container as instructed in their README.
Write your container name on a text file, you wll need it later
Then, you will need to update the InvokeHealthEntityExtraction Azure function with the location of your running container. You will also need to download a file umls_concept_dict.pickle that is too big to host on GitHub, which will allow lookup of UMLS entities.
Specifically, in the InvokeHealthEntityExtraction\InvokeHealthEntityExtraction folder:
- Download the umls_concept_dict.pickle file and save to this directory InvokeHealthEntityExtraction\InvokeHealthEntityExtraction (the same directory as init.py) so it will deploy with the Azure function.
After this action is complete, you can deploy the InvokeHealthEntityExtraction Azure function. One easy way to deploy an Azure function is using Visual Studio Code. You can install VS Code and then follow some of the instructions at this link:
-
Install the Azure Functions extension for Visual Studio Code
-
Sign in to Azure
-
Publish the function to Azure
After the function is deployed you need to update the function configuration parameters and get the value for the function Url follow these steps:
To update function's configuration parameters, in the Azure portal navigate to your Azure function app, under settings click "configuration", then under "Application settings" click "New application setting" (see image below)
Add the following parameters and their corresponding values:
text_analytics_container_url: YOUR_CONTAINER_URL
cognitive_services_enpoint: YOUR_ALL_IN_ONE_COGNITIVE_SERVICES_END_POINT
cognitive_services_key: YOUR_ALL_IN_ONE_COGNITIVE_SERVICES_END_API_KEY
Next Click "Functions" in the left-hand sidebar. Then click on the function name, click "Get Function Url" at the top of the page.
Copy that value of the function URL to the text file, you will need it later.
Create a new Azure search service using the Azure portal at https://portal.azure.com/#create/Microsoft.Search. Select your Azure subscription. Use the previously created resource group. You will need a globally-unique URL as the name of your search service (try something like "doctonotes-search-" plus your name, organization, or numbers). Finally, choose a nearby location to host your search service - please remember the location that you chose, as your Cognitive Services instance will need to be based in the same location. Click "Review + create" and then (after validation) click "Create" to instantiate and deploy the service.
Copy that value of the Azure Service URL, service name and service key to the text file, you will need it later.
After deployment of Azure Search service is complete, click "Go to resource" to navigate to your new search service. We will need some information about your search service to fill in the "Azure Search variables" section in the SetupAzureCognitiveSearchService.ipynb notebook, which is in the AzureCognitiveSearchService directory. Open the notebook for details on how to do this and copy those values into the first code cell, but don't run the notebook yet (you will need to update skillset.json first).
Before running the notebook, you will also need to change the TODOs in the skillset.json (which is also located in the AzureCognitiveSearchService folder). Open skillset.json, search for "TODO", and replace each instance with the following:
- Invoke TA Health Extraction custom skill URI: this value should be "https://" plus the value from the "Get Function Url" for the InvokeHealthEntityExtraction function that you noted down earlier
- Cognitive Services key: create a new Cognitive Services key in the Azure portal using the same subscription, location, and resource group that you did for your Azure search service. Click "Create" and after the resource is ready, click it. Click "Keys and Endpoint" in the left-hand sidebar. Copy the Key 1 value into this TODO.
- Knowledge Store connection string: use the value that you noted down earlier of the connection string to the knowledgeStore container in your Azure blob storage. It should be of the format "DefaultEndpointsProtocol=https;AccountName=YourValueHere;AccountKey=YourValueHere;EndpointSuffix=core.windows.net".
Finally, you are all set to go into the SetupAzureCognitiveSearchService.ipynb notebook and run it. This notebook will call REST endpoints on the search service that you have deployed in Azure to setup the search data sources, index, indexers, and skillset.
To deploy the web application you will need the following steps:
- Create an Azure App Service
- Update Web App Settings file
- Create Github Secret and Update Github Actions File
- Commit changes to your repository
This repository includes a workflow to publish the web application. But first you need to create an App Service with the following configuration:
- Unique name for your application like DoctorNotesApp
- Publish: Code
- Runtime stack: .Net 6(LTS)
- Operating System: windows
- Region: the same region you selected for your resource group
- Create a new Windows plan if you dont have one
You can change the default size of your sizing plan to a development plan if you want to, but performance would be slower
Once the App service is provisioned, navigate to the App and download the publish profile
Open the file and copy the content to a text file
Navigate to the web-app/Cognitive.UI folder and open the appsettings.json file and change the following parameters:
"SearchServiceName": "YOUR_COGNITIVE_SEARCH_SERVICE_NAME",
"SearchApiKey": "YOUR_COGNITIVE_SEARCH_SERVICE_NAME",
"SearchIndexName": "azuresql-index",
"SearchIndexerName": "azure-sql-indexer",
"StorageAccountName": "YOUR_STORAGE_ACCOUNT_NAME",
"StorageAccountKey": "YOUR_STORAGE_ACCOUNT_NAME",
"StorageContainerAddress": "https://YOUR_STORAGE_ACCOUNT_NAME.blob.core.windows.net/doctor-notes-search"
Please make sure the index and indexer names match those created on your Cognitive Search Service
Next nagivate to your Github repository secrets change the value for the secret named DoctorNotesSearchPoc_A28D copy and paste the content of the publish profile you just downloaded to the value box.
If the secret does not exist, please create it.
Next navigate to the workflow file located at .github/workflows/DoctorNotesSearchPoc.yml and replace the value for the variable AZURE_WEBAPP_NAME to match the name of the Azure Service App you just created.
In your GitHub repository, navigate to Actions, select the workflow "Build and Deploy .Net app...." and click on enable this workflow option to your right
Commit your changes to the main branch of the forked Github repository, then navigate to Actions to confirm the Application has been published.
This project was enhanced and changed from the Covid-19 Search repository by Liam Cavanagh.
Markup text for healthcare analytics code was provided by Oren Barnea