Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
bdc_tutorial_00.ipynb		bdc_tutorial_00.ipynb
bdc_tutorial_01.ipynb		bdc_tutorial_01.ipynb
bdc_tutorial_02.ipynb		bdc_tutorial_02.ipynb
bdc_tutorial_03.ipynb		bdc_tutorial_03.ipynb
bdc_tutorial_04.ipynb		bdc_tutorial_04.ipynb
bdc_tutorial_05.ipynb		bdc_tutorial_05.ipynb

Lab: SQL Server Big Data Clusters - Architecture

A Microsoft Course from the SQL Server team

Welcome to this Microsoft solutions Lab on the architecture on SQL Server Big Data Clusters. As part of a larger complete Workshop, you'll experiment with SQL Server Big Data Clusters (BDC), and how you can use it to implement large-scale data processing and machine learning.

This Lab assumes you have a full understanding the concepts of big data analytics, the technologies (such as containers, Kubernetes, Spark and HDFS, machine learning, and other technologies) that you will use throughout the Lab, the architecture of a BDC. If you are familiar with these topics, you can take a complete course here.

In this Lab you'll learn how to create external tables over other data sources to unify your data, and how to use Spark to run big queries over your data in HDFS or do data preparation. You'll review a complete solution for an end-to-end scenario, with a focus on how to extrapolate what you have learned to create other solutions for your organization.

Before Taking this Lab

This Lab expects that you understand data structures and working with SQL Server and computer networks. This Lab does not expect you to have any prior data science knowledge, but a basic knowledge of statistics and data science is helpful in the Data Science sections. Knowledge of SQL Server, Azure Data and AI services, Python, and Jupyter Notebooks is recommended. AI techniques are implemented in Python packages. Solution templates are implemented using Azure services, development tools, and SDKs. You should have a basic understanding of working with the Microsoft Azure Platform.

▶ You need to have all of the prerequisites completed before taking this Lab.

▶ You need a full Big Data Cluster for SQL Server up and running, and have identified the connection endpoints, with all security parameters. You find out how to do that here.

Lab Notebooks

You will work through six Jupyter Notebooks using the Azure Data Studio tool. Download them and open them in Azure Data Studio, running only one cell at a time.

Notebook	Topics
bdc_tutorial_00.ipynb	Overview of the Lab and Setup of the source data, problem space, solution options and architectures
bdc_tutorial_01.ipynb	In this tutorial you will learn how to run standard SQL Server Queries against the Master Instance (MI) in a SQL Server big data cluster.
bdc_tutorial_02.ipynb	In this tutorial you will learn how to create and query Virtualized Data in a SQL Server big data cluster.
bdc_tutorial_03.ipynb	In this tutorial you will learn how to create and query a Data Mart using Virtualized Data in a SQL Server big data cluster.
bdc_tutorial_04.ipynb	In this tutorial you will learn how to work with Spark Jobs in a SQL Server big data cluster.
bdc_tutorial_05.ipynb	In this tutorial you will learn how to work with Spark Machine Learning Jobs in a SQL Server big data cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

README.md

Lab: SQL Server Big Data Clusters - Architecture

A Microsoft Course from the SQL Server team

About this Lab

Before Taking this Lab

Lab Notebooks

Files

notebooks

Directory actions

More options

Directory actions

More options

Latest commit

History

notebooks

Folders and files

parent directory

README.md

Lab: SQL Server Big Data Clusters - Architecture

A Microsoft Course from the SQL Server team

About this Lab

Before Taking this Lab

Lab Notebooks