Azure - Data Services

In this demo a solution named Databoss will be used to connect and apply Azure data services.

This is the high-level design with main components adn the data flow:

This project is implemented almost fully within private network architecture, making use of Private Link and Service Endpoints to securely connect to resources.

Infrastructure

🚀 1 - Azure resources creation

Copy the .auto.tfvars template:

cp templates/template.tf .auto.tfvars

Check your public IP address to be added in the firewalls allow rules:

dig +short myip.opendns.com @resolver1.opendns.com

Add your public IP address to the public_ip_address_to_allow variable.

Apply and create the Azure infrastructure:

terraform init
terraform apply -auto-approve

Pause the Synapse SQL pool to avoid costs while setting up the infrastructure:

az synapse sql pool pause -n pool1 --workspace-name synw-databoss -g rg-databoss

Once the apply phase is complete, approve the managed private endpoints for ADF:

bash scripts/approveManagedPrivateEndpoints.sh

💡 A single connection to Databricks is required to create the access policies on Azure Key Vault.

If everything is OK, proceed to the next section.

💾 2 - Data setup

Upload some test data:

bash scripts/uploadFilesToDataLake.sh
bash scripts/uploadFilesToExternalStorage.sh

Run the ADF pipeline import data from the external storage into the data lake:

az datafactory pipeline create-run \
    --resource-group rg-databoss \
    --name Adfv2CopyExternalFileToLake \
    --factory-name adf-databoss

🟦 3 - Synapse

If you've stopped the Synapse pool, resume it:

az synapse sql pool resume -n pool1 --workspace-name synw-databoss -g rg-databoss

Create the template scripts in Synapse:

bash scripts/createSynapseSQLScripts.sh

Now, connect to Synapse Web UI or directly to the SQL endpoint and and execute the scripts.

🧰 4 - Databricks cluster configuration

The previous Azure run should have created the databricks/.auto.tfvars file to configure Databricks.

Apply the Databricks configuration:

💡 If you haven't yet, you need to login to Databricks, which will create Key Vault policies.

terraform -chdir="databricks" init
terraform -chdir="databricks" apply -auto-approve

Check the workspace files and run the test notebooks and make sure that connectivity is complete.

🗲 5 - Function

Deployment

Deployment command:

func azure functionapp publish <FunctionAppName>

Local Development

Create the virtual environment:

python -m venv venv
. venv/bin/activate
pip install -r requirements.txt

deactivate

Start the function:

func start

Get the Service Bus connection string:

az servicebus namespace authorization-rule keys list -n RootManageSharedAccessKey --namespace-name bus-databoss -g rg-databoss

Create the local.settings.json file:

{
  "IsEncrypted": false,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
    "AzureWebJobsStorage": "",
    "AzureWebJobsServiceBusConnectionString": ""
  }
}

Extra subjects

Consume IP addresses
Internal runtime
Code repository
AD permissions
Azure Monitor (Logs, Insights)
Enable IR interactive authoring

🧹 Clean-up

Delete the Databricks configuration:

terraform -chdir="databricks" destroy -auto-approve

Delete the Azure infrastructure:

terraform destroy -auto-approve

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.assets		.assets
.vscode		.vscode
data		data
databricks		databricks
function		function
modules		modules
scripts		scripts
templates		templates
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
LICENSE		LICENSE
README.md		README.md
backend.tf		backend.tf
cspell.json		cspell.json
main.tf		main.tf
output.tf		output.tf
provider.tf		provider.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure - Data Services

Infrastructure

🚀 1 - Azure resources creation

💾 2 - Data setup

🟦 3 - Synapse

🧰 4 - Databricks cluster configuration

🗲 5 - Function

Deployment

Local Development

Extra subjects

🧹 Clean-up

Reference

About

Releases

Packages

Languages

License

epomatti/az-data-services

Folders and files

Latest commit

History

Repository files navigation

Azure - Data Services

Infrastructure

🚀 1 - Azure resources creation

💾 2 - Data setup

🟦 3 - Synapse

🧰 4 - Databricks cluster configuration

🗲 5 - Function

Deployment

Local Development

Extra subjects

🧹 Clean-up

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages