Skip to content

Azure SQL and Databricks samples and best practices for loading data quickly and efficiently

License

Notifications You must be signed in to change notification settings

Azure-Samples/azure-sql-db-databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

page_type languages products description urlFragment
sample
tsql
sql
scala
azure
azure-databricks
azure-blob-storage
azure-key-vault
azure-sql-database
Fast Data Loading in Azure SQL DB using Azure Databricks
azure-sql-db-databricks

Fast Data Loading in Azure SQL DB using Azure Databricks

License

Azure Databricks and Azure SQL database can be used amazingly well together. This repo will help you to use the latest connector to load data into Azure SQL as fast as possible, using table partitions and column-store and all the known best-practices.

Samples

All the samples start from a partitioned Parquet file, created with data generated from the famous TPC-H benchmark. Free tools are available on TPC-H website to generate a dataset with the size you want:

http://www.tpc.org/tpch/

Once the Parquet file is available,

the samples will guide you through the most common scenarios

all samples will also show how to correctly load table if there are already indexes or if you want to use a column-store in Azure SQL.

Bonus Samples: Reading data as fast as possible

Though this repo focuses on writing data as fast as possible into Azure SQL, I also understand that you may also want to know how to do the opposite: how the read data as fast as possible from Azure SQL into Apache Spark / Azure Databricks? For this reason in the folder notebooks/read-from-azure-sql you will find two samples that shows how to do exactly that:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

About

Azure SQL and Databricks samples and best practices for loading data quickly and efficiently

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •