Welcome to this Microsoft solutions Lab on the architecture on SQL Server Big Data Clusters. As part of a larger complete Workshop, you'll experiment with SQL Server Big Data Clusters (BDC), and how you can use it to implement large-scale data processing and machine learning.
This Lab assumes you have a full understanding the concepts of big data analytics, the technologies (such as containers, Kubernetes, Spark and HDFS, machine learning, and other technologies) that you will use throughout the Lab, the architecture of a BDC. If you are familiar with these topics, you can take a complete course here.
In this Lab you'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization.
This Lab expects that you understand data structures and working with SQL Server and computer networks. This Lab does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform.
▶ You need to have all of the prerequisites completed before taking this Lab.
▶ You need a full Big Data Cluster for SQL Server up and running, and have identified the connection endpoints, with all security parameters. You find out how to do that here.
You will work through six Jupyter Notebooks using the Azure Data Studio tool. Download them and open them in Azure Data Studio, running only one cell at a time.
Notebook | Topics |
bdc_tutorial_00.ipynb | Overview of the Lab and Setup of the source data, problem space, solution options and architectures |
bdc_tutorial_01.ipynb | In this tutorial you will learn how to run standard SQL Server Queries against the Master Instance (MI) in a SQL Server big data cluster. |
bdc_tutorial_02.ipynb | In this tutorial you will learn how to create and query Virtualized Data in a SQL Server big data cluster. |
bdc_tutorial_03.ipynb | In this tutorial you will learn how to create and query a Data Mart using Virtualized Data in a SQL Server big data cluster. |
bdc_tutorial_04.ipynb | In this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster. |
bdc_tutorial_05.ipynb | In this tutorial you will learn how to work with Spark Machine Learning Jobs in a SQL Server big data cluster. |