Deploy a real Databricks workspace on Azure in 20 minutes. No guesswork, no fluff
Stop wrestling with fragmented docs and broken tutorials. Get hands-on experience with Databricks + Terraform + GitHub Actions for Databricks Asset Bundles that actually works. Learn about Databricks Asset Bundles with the deployed example job. Features production-grade security patterns including VNet injection, managed identity authentication, and private subnets. Perfect for bootcamp students, freelance consultants, data engineers learning the stack, and anyone who needs to spin up secure Databricks infrastructure without corporate IT delays.
Give us a star if this saves you hours of setup time!
Found a bug? Have questions? Create an issue.
Ready for production? This free tier gives you the fundamentals. When you're ready for enterprise features and private networking deployable in 2 hours instead of weeks, explore our production deployment patterns.
Works with a free Azure subscription - perfect for portfolio projects and client prototypes.
- Learning Databricks & Terraform - Get real hands-on experience, not just theory
- Building a portfolio project - Showcase actual cloud infrastructure on GitHub/LinkedIn
- Freelancing or consulting - Validate Databricks for client work without enterprise overhead
- Working around corporate IT - Spin up prototypes in your own Azure account
- In a bootcamp or course - Practice with real infrastructure that mirrors production
- Enterprise SSO integration (this uses personal access tokens)
- Advanced network isolation (basic VNet setup only)
- Fine-grained governance (development-level permissions)
- Production workloads (missing enterprise features)
- Real Infrastructure as Code - Terraform scripts that deploy actual Azure resources
- VNet Security Architecture - VNet injection, managed identity, private subnets, and network security groups
- Unity Catalog Setup - Pre-configured data catalog with sample data and jobs
- CI/CD Pipeline - GitHub Actions that deploy your DAB changes automatically
- Working Examples using DABs - Databricks asset bundles deploy notebooks and jobs that actually run and process data
- One-Click Cleanup - Destroy everything to stop charges instantly
What gets deployed (click to expand)
Infrastructure (Terraform):
- Azure resource group with your own naming
- Virtual network with Databricks subnet delegation
- Databricks workspace with Unity Catalog enabled
- Storage account with hierarchical namespace
- Managed identity for secure data access
Data Platform (Databricks Asset Bundle):
- Unity Catalog with sample database and tables
- Automated SQL job that processes real data
- Interactive notebooks for Python and SQL development
- CI/CD pipeline that deploys your changes
Working Examples:
- Sample CSV data automatically loaded into Unity Catalog
- SQL transformations that create analytics tables
- Schedulable jobs you can modify and extend
The difference: This isn't another "hello world" tutorial. You get real infrastructure with serious security that you can build on, understand, and showcase.
For detailed step-by-step instructions for setting it up with troubleshooting: See SETUP_GUIDE.md
The easiest way to get started is with a devcontainer. All tools are pre-installed.
Requirements:
- Docker Desktop (for local development)
- VS Code + Dev Containers extension
- Or use GitHub Codespaces (no local Docker needed)
Step-by-step instructions (click to expand)
- Install Docker Desktop and ensure it's running
- Clone the repository:
git clone https://github.com/formyron/formyron-free.git cd formyron-free - Open in VS Code:
code . - Open in devcontainer:
- VS Code will show a popup: "Folder contains a Dev Container configuration file"
- Click "Reopen in Container"
- Or use Command Palette (F1):
Dev Containers: Reopen in Container
- Wait for container build (first time takes 2-3 minutes)
- You're ready! All tools are pre-installed.
What you get automatically:
- ✅ Python 3.10 (exact version required for DABs)
- ✅ Azure CLI
- ✅ Terraform >= 1.4
- ✅ Databricks CLI
- ✅ All Python dependencies pre-installed
- ✅ Proper virtual environment setup
Then just:
- Log in to Azure:
az login --use-device-code - Configure terraform in your own way (please change):
cp infra/terraform/terraform.tfvars.example infra/terraform/terraform.tfvars - Run setup:
./setup.sh
Why devcontainer? No version conflicts, no manual installs, consistent environment for everyone. Perfect for learning and collaboration!
Setup script output example:
Prefer manual setup? See the Manual Setup Guide at the bottom of this page.
Unlike basic tutorials that skip security, this implements real enterprise security patterns you can learn from:
- VNet Injection - Databricks compute runs in a private Azure virtual network you control
- Private Subnets - Worker nodes isolated in dedicated subnet (outbound HTTPS allowed for package downloads)
- Managed Identity - No stored credentials or connection strings to manage or leak
- Network Security Groups - Minimal required rules, following security best practices
- Unity Catalog - Secure data access with fine-grained permissions
- Encryption - TLS 1.2+ for all communication, Azure storage encryption at rest
Security Level: This implements foundational security patterns suitable for development, learning, and prototyping. The managed identity approach eliminates credential theft risks. However, production workloads require additional hardening including private endpoints for storage, advanced network isolation, and comprehensive monitoring. Storage accounts use public endpoints with managed identity authentication (no private endpoint), and worker nodes have controlled internet egress for package management.
The benefit: You understand what each security layer does and why it matters for production deployments.
Worried about surprise Azure bills? One command destroys everything:
# Windows: scripts\teardown.bat | Linux/Mac: ./scripts/teardown.sh
./scripts/teardown.shWhat gets deleted:
- Databricks workspace and all content
- Storage account and data
- Virtual network and subnets
- All Azure resources
What this means:
- ✅ Zero ongoing costs after teardown
- ✅ Can redeploy anytime with the same commands
- ✅ Perfect for learning without budget concerns
- ✅ Great for demos - spin up, show, tear down
Tip: Use this for portfolio projects. Include the architecture diagrams and Terraform code to show you understand infrastructure, then teardown to avoid charges.
formyron-free/
├── infra/terraform/ # Infrastructure as Code
│ ├── main.tf # Core Azure resources
│ ├── networking.tf # VNet and security groups
│ ├── workspace.tf # Databricks workspace setup
│ ├── storage.tf # Unity Catalog storage
│ └── terraform.tfvars # Your configuration
├── databricks/bundles/ # Data platform setup
│ ├── databricks.yml # Bundle configuration
│ ├── notebooks/ # SQL and Python examples
│ └── resources/ # Job definitions
├── scripts/ # Automation scripts
│ ├── deploy.sh # Deploy infrastructure
│ └── teardown.sh # Clean up everything
├── .github/workflows/ # CI/CD pipeline
└── data/ # Sample datasets
Why this structure matters: Each folder teaches you a different aspect of modern data platform engineering. From infrastructure to deployment to data processing with Databricks Asset Bundles.
How does this compare to other options?
| Feature | Formyron Free | Databricks Community Edition | Official Databricks Docs | Azure Quickstarts |
|---|---|---|---|---|
| Cost | ~$2-5/hour (teardown to $0) | Free forever | Varies | Varies |
| Time to Deploy | 20 minutes | 5 minutes | 2-4 hours | 1-2 hours |
| Azure Integration | ✅ Full Azure native | ❌ Hosted by Databricks | ✅ Your subscription | ✅ Your subscription |
| VNet Injection | ✅ Included | ❌ Not available | Manual setup required | Often skipped |
| Unity Catalog | ✅ Pre-configured | ❌ Not available | Complex setup | Not included |
| Managed Identity | ✅ Zero credentials | ❌ Not available | Service principals | Keys/secrets |
| Infrastructure as Code | ✅ Complete Terraform | ❌ No IaC | Partial examples | Basic ARM templates |
| CI/CD Pipeline | ✅ GitHub Actions | ❌ Not provided | Not provided | Not provided |
| One-Command Teardown | ✅ ./teardown.sh |
Just delete | Manual cleanup | Manual cleanup |
Bottom line:
- Databricks Community Edition - Best for learning Databricks features without Azure costs
- Formyron Free - Best for learning enterprise security patterns on Azure with real infrastructure
💰 How much does this cost?
During deployment: ~$2-5/hour for active workspace and compute After teardown: $0 (everything is deleted) Recommendation: Use it, learn from it, teardown when done
Costs include:
- Databricks workspace (Premium tier)
- Serverless SQL warehouse (when running queries)
- Storage account (negligible for sample data)
- VNet resources (minimal)
⏱️ How long does deployment take?
- Infrastructure: 15-20 minutes (Terraform)
- Data Platform: 3-5 minutes (Databricks Asset Bundle)
- Total: ~20-25 minutes end-to-end
🏭 Is this production-ready?
For learning/prototyping: Yes ✅ For production workloads: Not out-of-the-box ❌
This template uses production-grade patterns but lacks enterprise features:
- Missing: Azure AD SSO, private endpoints, remote state, advanced monitoring
- Included: VNet injection, managed identity, Unity Catalog, proper networking
See Formyron Core for production-ready templates.
🔧 Can I customize the infrastructure?
Absolutely! Everything is defined in infra/terraform/:
- Change regions, SKUs, naming in
terraform.tfvars - Modify networking in
networking.tf - Adjust compute in
compute.tf - All Terraform is standard and extensible
📦 What gets deployed in my Azure subscription?
3 resource groups:
{prefix}-{stage}-rg- Your main resourcesdatabricks-rg-{prefix}-{stage}-rg- Databricks-managed resources (automatic)NetworkWatcherRG- Azure network monitoring (regional, may pre-exist)
Main resources:
- 1 Databricks workspace (Premium)
- 1 Storage account
- 1 Virtual network (with 2 subnets)
- 1 Managed identity (Access Connector)
- Network security groups
- Unity Catalog metastore
🗑️ How do I completely remove everything?
One command:
./scripts/teardown.sh # Linux/Mac
scripts\teardown.bat # WindowsThis destroys:
- All Databricks assets (jobs, notebooks, catalogs)
- All Azure resources
- All data (permanent deletion!)
Cost stops immediately after deletion.
☁️ Can I use this with an existing Azure subscription?
Yes! Requirements:
- Owner or Contributor role on subscription
- Ability to create resource groups
- No conflicting IP ranges (if you have existing VNets)
The template is isolated - it creates new resource groups and won't affect existing resources.
🌍 What Azure region should I use?
Recommended regions (Databricks available + lower cost):
eastus(US)westeurope(Europe)southeastasia(Asia-Pacific)
Check availability: Databricks regions
🎓 Do I need Databricks experience?
No! This template is designed for learning. You'll get:
- Working examples to explore
- Documentation explaining each component
- Safe environment to experiment (easy teardown)
Recommended learning path:
- Deploy the template
- Run the sample queries
- Modify the notebooks
- Examine the Terraform to understand infrastructure
- Teardown and redeploy with changes
🤝 Can I contribute improvements?
Yes! Contributions welcome:
- Bug fixes
- Documentation improvements
- New example notebooks
- Infrastructure enhancements
See CONTRIBUTING.md for guidelines.
We're here to help:
- Found an issue? Create a GitHub issue with details
- Have questions? Start a GitHub Discussion for help and community support
- Enjoyed this? Give us a star and share with others learning Databricks!
Common quick fixes:
- Python version issues: Make sure you have Python 3.10 installed
- Azure login problems: Try
az login --use-device-codeif behind corporate firewall - Terraform errors: Check our Troubleshooting section below
Contributing: Found a way to improve this? Pull requests welcome! Help make it easier for the next person learning Databricks.
Formyron Free was created by David, a data enthusiast passionate about making enterprise-grade infrastructure accessible to everyone.
If this project helped you learn or build something cool, I'd love to hear about it! Connect with me on LinkedIn or star this repo to show your support.
Click to expand manual setup instructions
- Azure Account - Free Azure account required. Sign up here if you don't have one
- Python 3.10 - CRITICAL REQUIREMENT for Databricks Asset Bundles (other versions will fail)
- Terraform >= 1.4 Please install from: https://www.terraform.io/downloads
- Azure CLI (
az login --use-device-code) Please install from: https://docs.microsoft.com/en-us/cli/azure/install-azure-cli, remember to restart your IDE afterwards. - Databricks CLI >= 2.0 (see below for installation)
Python Version Warning: Databricks Asset Bundles only works with Python 3.10. Versions 3.9, 3.11, 3.12+ are not supported and will cause deployment failures.
Azure CLI Login Tip: Device code login is the easiest method. If it doesn't work in your environment, you can also try the regular
az logincommand.
Mac:
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
export PATH="$HOME/.databricks/bin:$PATH"
echo 'export PATH="$HOME/.databricks/bin:$PATH"' >> ~/.zshrc
source ~/.zshrcWindows:
# Download from https://github.com/databricks/cli/releases/tag/v0.258.0
# Extract to tools/databricks/databricks.exe
export PATH="$PATH:/c/path/to/formyron-free/tools/databricks"python -m venv .venv
# Windows (Command Prompt)
.\.venv\Scripts\activate.bat
# Windows (PowerShell)
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process; .\.venv\Scripts\Activate.ps1
# Mac/Linux
source .venv/bin/activateNote: DAB configuration expects
.venvin project root. Updatevenv_pathindatabricks/bundles/databricks.ymlif moved.
-
Authenticate with Azure:
az login --use-device-code
-
Configure Terraform:
cd infra/terraform cp terraform.tfvars.example terraform.tfvars # Edit terraform.tfvars with your values
Storage Account Naming: Azure storage accounts only allow lowercase letters and numbers. Your
prefixwill be automatically cleaned (e.g., "my-company" becomes "mycompany" for storage). -
Deploy Infrastructure:
terraform init terraform plan terraform apply
Note: This will create 3 resource groups:
{prefix}-{stage}-rg- Your main resources (VNet, Storage, etc.)databricks-rg-{prefix}-{stage}-rg- Databricks managed resources (clusters, etc.)NetworkWatcherRG- Azure network monitoring (regional, may already exist)
-
Configure Notebook Catalog (Recommended):
Before deploying Databricks assets, update the SQL notebook with your specific catalog name:
Windows:
scripts\update_notebook.bat
Linux/Mac:
./scripts/update_notebook.sh
What this does: Automatically updates the SQL notebook with your Unity Catalog name based on your Terraform workspace configuration. This ensures the sample SQL job uses the correct catalog without manual editing.
Note: This step is optional - you can also manually edit
databricks/bundles/notebooks/unity_catalog_sql_job.sqland replace<YOUR_CATALOG_NAME>with your catalog name. -
Deploy Databricks Assets:
Linux/Mac:
cd ../../databricks/bundles export DATABRICKS_HOST="your-workspace-url" export DATABRICKS_TOKEN="your-token" databricks bundle validate databricks bundle deploy
Windows:
cd ..\..\databricks\bundles set DATABRICKS_HOST=your-workspace-url set DATABRICKS_TOKEN=your-token ..\..\tools\databricks\databricks.exe bundle validate ..\..\tools\databricks\databricks.exe bundle deploy
To completely destroy all resources and stop Azure costs:
-
Remove Databricks Assets (Optional):
Linux/Mac:
cd databricks/bundles export DATABRICKS_HOST="your-workspace-url" export DATABRICKS_TOKEN="your-token" databricks bundle destroy --auto-approve
Windows:
cd databricks\bundles set DATABRICKS_HOST=your-workspace-url set DATABRICKS_TOKEN=your-token ..\..\tools\databricks\databricks.exe bundle destroy --auto-approve
-
Destroy Azure Infrastructure:
cd ../../infra/terraform # Linux/Mac cd ..\..\infra\terraform # Windows terraform destroy
WARNING: This permanently deletes ALL data, notebooks, and configurations!
Automated Alternative: Use the provided teardown scripts for guided destruction:
- Windows:
scripts\teardown.bat- Linux/Mac:
./scripts/teardown.sh
Add these secrets in your repository settings for automated CI/CD:
DATABRICKS_HOST- Your workspace URL (e.g.,adb-12345.10.azuredatabricks.net)DATABRICKS_TOKEN- Personal access token
Important:
- The CI/CD pipeline gracefully handles missing secrets (for external PRs)
- Full DAB validation only runs when secrets are available
- Use the exact URL format from your Terraform output
- Infrastructure deployment: ~15 minutes (including Databricks workspace VNet injection)
- DAB setup: ~5 minutes
Interrupted Deployment (terraform apply cancelled):
cd infra/terraform
terraform refresh
terraform plan # Review current state
terraform apply # Resume deploymentState Synchronization Issues:
# If Terraform can't find existing resources
terraform import azurerm_subnet.private /subscriptions/YOUR_SUB_ID/resourceGroups/RG_NAME/providers/Microsoft.Network/virtualNetworks/VNET_NAME/subnets/SUBNET_NAME
# Or clean slate approach
terraform destroy
rm terraform.tfstate*
terraform init
terraform applyStorage Account Access Denied (403 errors):
- Check network rules in Azure Portal
- Ensure your IP is allowed or temporarily set to "Allow from all networks"
- Service principal permissions may need time to propagate (wait 5-10 minutes)
Resource Group Already Exists:
# Import existing resource group
terraform import azurerm_resource_group.main /subscriptions/YOUR_SUB_ID/resourceGroups/YOUR_RG_NAME
terraform plan
terraform applyDatabricks Workspace VNet Injection Timeout:
- VNet injection can take 15-20 minutes
- If timeout occurs, run
terraform applyagain - Check Azure Portal for workspace status
Service Principal Issues:
# Reset credentials manually if they don't work
az ad sp credential reset --id YOUR_SP_ID --query "{appId:appId, password:password, tenant:tenant}" -o json- Wait 5-10 minutes for permission propagation
- Verify role assignment: Storage Blob Data Contributor
General Recovery (Complete Reset):
# Destroy everything and start fresh
./scripts/teardown.sh # or teardown.bat
./setup.sh # or setup.batIf resources seem to exist but Terraform can't find them:
- Azure resource deletion can take time
- Wait 5-10 minutes and retry
- Check Azure Portal to confirm resources are fully deleted

