In this workshop you'll cover using a Process and various Platform components to create a SQL Server Big Data Clusters (BDC) solution you can deploy on premises, in the cloud, or in a hybrid architecture. In each module you'll get more references, which you should follow up on to learn more. Also watch for links within the text - click on each one to explore that topic.
(Make sure you check out the prerequisites page before you start. You'll need all of the items loaded there before you can proceed with the workshop.)
You'll cover the following topics in this Module:
Authentication is the process of verifying the identity of a user or service and ensuring they are who they are claiming to be. Authorization refers to granting or denying of access to specific resources based on the requesting user's identity. This step is performed after a user is identified through authentication.
NOTE: Security will change prior to the General Availability (GA) Release. Active Directory integration is planned for production implementations.
There are three endpoints for entry points to the BDC:
Endpoint | Description |
---|---|
HDFS/Spark (Knox) gateway | An HTTPS-based endpoint that proxies other endpoints. The HDFS/Spark gateway is used for accessing services like webHDFS and Livy. Wherever you see references to Knox, this is the endpoint |
Controller endpoint | The endpoint for the BDC management service that exposes REST APIs for managing the cluster. Some tools, such as Azure Data Studio, access the system using this endpoint |
Master Instance | Get a detailed description of a specific pod in json format output. It includes details, such as the current Kubernetes node that the pod is placed on, the containers running within the pod, and the image used to bootstrap the containers. It also shows other details, such as labels, status, and persisted volumes claims that are associated with the pod |
You can see these endpoints in this diagram:
When you create the cluster, a number of logins are created. Some of these logins are for services to communicate with each other, and others are for end users to access the cluster. Non-SQL Server End-user passwords currently are set using environment variables. These are passwords that cluster administrators use to access services:
Use | Variable |
---|---|
Controller username | CONTROLLER_USERNAME=controller_username |
Controller password | CONTROLLER_PASSWORD=controller_password |
SQL Master SA password | MSSQL_SA_PASSWORD=controller_sa_password |
Password for accessing the HDFS/Spark endpoint | KNOX_PASSWORD=knox_password |
Intra-cluster authentication Upon deployment of the cluster, a number of SQL logins are created:
A special SQL login is created in the Controller SQL instance that is system managed, with sysadmin role. The password for this login is captured as a K8s secret. A sysadmin login is created in all SQL instances in the cluster, that Controller owns and manages. It is required for Controller to perform administrative tasks, such as HA setup or upgrade, on these instances. These logins are also used for intra-cluster communication between SQL instances, such as the SQL master instance communicating with a data pool.
Note: In current release, only basic authentication is supported. Fine-grained access control to HDFS objects, the BDC compute and data pools, is not yet available.
For Intra-cluster communication with non-SQL services within the BDC, such as Livy to Spark or Spark to the storage pool, security uses certificates. All SQL Server to SQL Server communication is secured using SQL logins.
Activity: Review Security Endpoints
In this activity, you will review the endpoints exposed on the cluster.
Steps
Open this reference, and read the information you see for the Service Endpoints section. This shows the addresses and ports exposed to the end-users.
Congratulations! You have completed this workshop on SQL Server big data clusters Architecture. You now have the tools, assets, and processes you need to extrapolate this information into other applications.