Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial feature-engineering-on-fabric single-tech sample check-in #652

Merged
merged 27 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
32f162c
feat: initial feature-eng-on-fabric single-tech sample check-in
thurstonchen Nov 9, 2023
78f56b3
doc: resize screenshots with minor contents updates
thurstonchen Nov 10, 2023
0db5811
update for data source landing
Nick287 Nov 10, 2023
05d9dc0
code: update model training notebook
cchenshu Nov 10, 2023
747a039
update for data loading base url and relative path
Nick287 Nov 13, 2023
44ac774
Apply suggestions from code review (Nov. 13th)
thurstonchen Nov 13, 2023
81ecab0
remove App service code and update images
Nick287 Nov 14, 2023
32bc03d
for simplicity remove option 1 and send it as a footnote info no details
Nick287 Nov 14, 2023
64823b6
fix: use Fabric workspace & lakehouse id in Purview qualified names, …
thurstonchen Nov 14, 2023
a0f74b6
Fixing some linking errors
promisinganuj Nov 14, 2023
e5094f2
Updated introduction and architecture description
promisinganuj Nov 15, 2023
08fc3ac
Updated environment setup details
promisinganuj Nov 15, 2023
2469bdf
Updated 'Source Dataset' section
promisinganuj Nov 15, 2023
7f531f5
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
4769fb9
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
8525ae5
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
e7623dc
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
7bb0749
doc: add contents on verifying lineage in Purview
thurstonchen Nov 15, 2023
2b2225f
doc: add missed bullet to Contents table
thurstonchen Nov 15, 2023
e02b924
Updating Lineage section
promisinganuj Nov 16, 2023
786922d
Updating Lineage section
promisinganuj Nov 16, 2023
597e774
Updating Lineage section
promisinganuj Nov 16, 2023
4cc2d21
Updating Lineage section
promisinganuj Nov 16, 2023
0b4f621
Updating Lineage section
promisinganuj Nov 16, 2023
b5d1586
Updating 'Required resources' header
promisinganuj Nov 16, 2023
4622bb2
Fixing URL checks
promisinganuj Nov 16, 2023
2382f59
Fixing URL checks
promisinganuj Nov 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ description: "Code samples showcasing how to apply DevOps concepts to the Modern

# DataOps for the Modern Data Warehouse

This repository contains numerous code samples and artifacts on how to apply DevOps principles to data pipelines built according to the [Modern Data Warehouse (MDW)](https://azure.microsoft.com/en-au/solutions/architecture/modern-data-warehouse/) architectural pattern on [Microsoft Azure](https://azure.microsoft.com/en-au/).
This repository contains numerous code samples and artifacts on how to apply DevOps principles to data pipelines built according to the [Modern Data Warehouse (MDW)](https://azure.microsoft.com/en-au/solutions/architecture/modern-data-warehouse/) architectural pattern on Microsoft Azure.

The samples are either focused on a single azure service (**Single Tech Samples**) or showcases an end to end data pipeline solution as a reference implementation (**End to End Samples**). Each sample contains code and artifacts relating one or more of the following

Expand Down Expand Up @@ -54,7 +54,7 @@ The samples are either focused on a single azure service (**Single Tech Samples*
- [**Temperature Events Solution**](e2e_samples/temperature_events) - This demonstrate a high-scale event-driven data pipeline with a focus on how to implement Observability and Load Testing.
![Architecture](e2e_samples/temperature_events/images/temperature-events-architecture.png?raw=true "Architecture")
- [**Dataset Versioning Solution**](e2e_samples/dataset_versioning) - This demonstrates how to use DataFactory to Orchestrate DataFlow, to do DeltaLoads into DeltaLake On DataLake(DoDDDoD).
- [**MDW Data Governance and PII data detection**](e2e_samples/mdw_governance) - This sample demonstrates how to deploy the Infrastructure of an end-to-end MDW Pipeline using [Azure DevOps pipelines](https://azure.microsoft.com/en-au/services/devops/pipelines/) along with a focus around Data Governance and PII data detection.
- [**MDW Data Governance and PII data detection**](e2e_samples/mdw_governance) - This sample demonstrates how to deploy the Infrastructure of an end-to-end MDW Pipeline using [Azure DevOps pipelines](https://azure.microsoft.com/product/devops/pipelines/) along with a focus around Data Governance and PII data detection.
- *Technology stack*: Azure DevOps, Azure Data Factory, Azure Databricks, Azure Purview, [Presidio](https://github.com/microsoft/presidio)

## Contributing
Expand Down
2 changes: 1 addition & 1 deletion e2e_samples/dataset_versioning/databricks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The [official doc](https://docs.microsoft.com/en-us/azure/databricks/security/da
1. Navigating to your Storage account in the Azure Portal then clicking on `containers` -> `container(datalake)` -> `Manage ACL`
2. Add your READ and EXECUTE permission and click save.
3. [**Optional**] In case you have any existing files in the Data Lake container, you may need to propogate ACL permissions.
1. Open up [Microsoft Azure Storage Explorer](https://azure.microsoft.com/en-us/features/storage-explorer/)
1. Open up [Microsoft Azure Storage Explorer](https://azure.microsoft.com/products/storage/storage-explorer/)
2. Navigate to the storage account and right click on container to select **propagate access control list**.
> Propagate access control list cannot be found: Try updating azure storage explorer to the latest version.

Expand Down
2 changes: 1 addition & 1 deletion e2e_samples/deployment_stamps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The same follow applies to Stamp2 and Stamp3 too. In this current version of sam
### Prerequisites

1. [Azure DevOps account](https://dev.azure.com/)
2. [Azure Account](https://azure.microsoft.com/en-us/free/)
2. [Azure Account](https://azure.microsoft.com/free/)
*Permissions needed*: ability to create and deploy to an azure resource group, a service principal, and grant the collaborator role to the service principal over the resource group; ability to manage Azure AD to create App registration, Users, Groups and Enterprise App Registration.

### Setup and Deployment
Expand Down
26 changes: 13 additions & 13 deletions e2e_samples/mdw_governance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,18 @@ The following shows the architecture of the solution.

### Technologies used

- [Azure Purview](https://azure.microsoft.com/en-au/services/devops/)
- [Azure Data Factory](https://azure.microsoft.com/en-au/services/data-factory/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Databricks](https://docs.microsoft.com/en-us/azure/databricks/)
- [Azure Key Vault](https://azure.microsoft.com/en-us/services/key-vault/)
- [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)
- [Office365 API Connection](https://docs.microsoft.com/en-us/azure/connectors/connectors-create-api-office365-outlook)
- [Azure Virtual Network](https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview)
- [Private Endpoint](https://docs.microsoft.com/en-us/azure/private-link/private-endpoint-overview)
- [Azure Function](https://docs.microsoft.com/en-us/azure/azure-functions/)
- [Azure Logic App](https://azure.microsoft.com/en-us/services/logic-apps/)
- [Azure Private DNS](https://docs.microsoft.com/en-us/azure/dns/private-dns-overview)
- [Azure Purview](https://azure.microsoft.com/products/purview/)
- [Azure Data Factory](https://azure.microsoft.com/products/data-factory/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Databricks](https://docs.microsoft.com/azure/databricks/)
- [Azure Key Vault](https://azure.microsoft.com/products/key-vault/)
- [Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/app-insights-overview)
- [Office365 API Connection](https://docs.microsoft.com/azure/connectors/connectors-create-api-office365-outlook)
- [Azure Virtual Network](https://docs.microsoft.com/azure/virtual-network/virtual-networks-overview)
- [Private Endpoint](https://docs.microsoft.com/azure/private-link/private-endpoint-overview)
- [Azure Function](https://docs.microsoft.com/azure/azure-functions/)
- [Azure Logic App](https://azure.microsoft.com/products/logic-apps/)
- [Azure Private DNS](https://docs.microsoft.com/azure/dns/private-dns-overview)

## Key Learnings

Expand Down Expand Up @@ -81,7 +81,7 @@ Each environment has an identical set of resources
### Prerequisites

1. [Azure DevOps account](https://dev.azure.com/)
2. [Azure Account](https://azure.microsoft.com/en-au/free/search/?&ef_id=Cj0KCQiAr8bwBRD4ARIsAHa4YyLdFKh7JC0jhbxhwPeNa8tmnhXciOHcYsgPfNB7DEFFGpNLTjdTPbwaAh8bEALw_wcB:G:s&OCID=AID2000051_SEM_O2ShDlJP&MarinID=O2ShDlJP_332092752199_azure%20account_e_c__63148277493_aud-390212648371:kwd-295861291340&lnkd=Google_Azure_Brand&dclid=CKjVuKOP7uYCFVapaAoddSkKcA)
2. [Azure Account](https://azure.microsoft.com/free/)
- *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.

### Setup and Deployment
Expand Down
14 changes: 7 additions & 7 deletions e2e_samples/parking_sensors/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The sample demonstrate how DevOps principles can be applied end to end Data Pipe

## Solution Overview

The solution pulls near realtime [Melbourne Parking Sensor data](https://www.melbourne.vic.gov.au/about-council/governance-transparency/open-data/Pages/on-street-parking-data.aspx) from a publicly available REST api endpoint and saves this to [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction). It then validates, cleanses, and transforms the data to a known schema using [Azure Databricks](https://azure.microsoft.com/en-au/services/databricks/). A second Azure Databricks job then transforms these into a [Star Schema](https://en.wikipedia.org/wiki/Star_schema) which are then loaded into [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/en-au/services/synapse-analytics/) using [Polybase](https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15). The entire pipeline is orchestrated with [Azure Data Factory](https://azure.microsoft.com/en-au/services/data-factory/).
The solution pulls near realtime [Melbourne Parking Sensor data](https://www.melbourne.vic.gov.au/about-council/governance-transparency/open-data/Pages/on-street-parking-data.aspx) from a publicly available REST api endpoint and saves this to [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction). It then validates, cleanses, and transforms the data to a known schema using [Azure Databricks](https://azure.microsoft.com/products/databricks/). A second Azure Databricks job then transforms these into a [Star Schema](https://en.wikipedia.org/wiki/Star_schema) which are then loaded into [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/products/synapse-analytics/) using [Polybase](https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15). The entire pipeline is orchestrated with [Azure Data Factory](https://azure.microsoft.com/products/data-factory/).

### Architecture

Expand All @@ -82,11 +82,11 @@ See [here](#build-and-release-pipeline) for details.

It makes use of the following azure services:

- [Azure Data Factory](https://azure.microsoft.com/en-au/services/data-factory/)
- [Azure Databricks](https://azure.microsoft.com/en-au/services/databricks/)
- [Azure Data Factory](https://azure.microsoft.com/products/data-factory/)
- [Azure Databricks](https://azure.microsoft.com/products/databricks/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/en-au/services/synapse-analytics/)
- [Azure DevOps](https://azure.microsoft.com/en-au/services/devops/)
- [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/products/synapse-analytics/)
- [Azure DevOps](https://azure.microsoft.com/products/devops/)
- [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)
- [PowerBI](https://powerbi.microsoft.com/en-us/)

Expand Down Expand Up @@ -212,9 +212,9 @@ More resources:
### Prerequisites

1. [Github account](https://github.com/)
2. [Azure Account](https://azure.microsoft.com/en-au/free/search/?&ef_id=Cj0KCQiAr8bwBRD4ARIsAHa4YyLdFKh7JC0jhbxhwPeNa8tmnhXciOHcYsgPfNB7DEFFGpNLTjdTPbwaAh8bEALw_wcB:G:s&OCID=AID2000051_SEM_O2ShDlJP&MarinID=O2ShDlJP_332092752199_azure%20account_e_c__63148277493_aud-390212648371:kwd-295861291340&lnkd=Google_Azure_Brand&dclid=CKjVuKOP7uYCFVapaAoddSkKcA)
2. [Azure Account](https://azure.microsoft.com/free/)
- *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.
3. [Azure DevOps Project](https://azure.microsoft.com/en-us/services/devops/)
3. [Azure DevOps Project](https://azure.microsoft.com/products/devops/)
- *Permissions needed*: ability to create [service connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml), [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) and [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml).

#### Software pre-requisites if you don't use dev container<!-- omit in toc -->
Expand Down
8 changes: 4 additions & 4 deletions e2e_samples/parking_sensors_synapse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ See [here](#build-and-release-pipeline) for details.

It makes use of the following azure services:

- [Azure Synapse Analytics](https://azure.microsoft.com/en-au/services/synapse-analytics/)
- [Azure Synapse Analytics](https://azure.microsoft.com/products/synapse-analytics/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
- [Azure DevOps](https://azure.microsoft.com/en-au/services/devops/)
- [Azure DevOps](https://azure.microsoft.com/products/devops/)
- [PowerBI](https://powerbi.microsoft.com/en-us/)
- [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)
- [Log Analytics](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview)
Expand Down Expand Up @@ -166,9 +166,9 @@ Please check the details [here](docs/observability.md).
### Prerequisites

1. [Github account](https://github.com/)
2. [Azure Account](https://azure.microsoft.com/en-au/free/search/?&ef_id=Cj0KCQiAr8bwBRD4ARIsAHa4YyLdFKh7JC0jhbxhwPeNa8tmnhXciOHcYsgPfNB7DEFFGpNLTjdTPbwaAh8bEALw_wcB:G:s&OCID=AID2000051_SEM_O2ShDlJP&MarinID=O2ShDlJP_332092752199_azure%20account_e_c__63148277493_aud-390212648371:kwd-295861291340&lnkd=Google_Azure_Brand&dclid=CKjVuKOP7uYCFVapaAoddSkKcA)
2. [Azure Account](https://azure.microsoft.com/free/)
- *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.
3. [Azure DevOps Project](https://azure.microsoft.com/en-us/services/devops/)
3. [Azure DevOps Project](https://azure.microsoft.com/products/devops/)
- *Permissions needed*: ability to create [service connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml), [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) and [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml). Ability to install Azure DevOps extensions (unless the required [Synapse extension](https://marketplace.visualstudio.com/items?itemName=AzureSynapseWorkspace.synapsecicd-deploy) is already installed).

#### Software pre-requisites if you don't use dev container<!-- omit in toc -->
Expand Down
4 changes: 2 additions & 2 deletions e2e_samples/temperature_events/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ It makes use of the following azure services:
- [Azure Event Hubs](https://azure.microsoft.com/en-us/services/event-hubs/)
- [Azure Functions](https://azure.microsoft.com/en-us/services/functions/)
- [Azure IoT Device Telemetry Simulator](https://github.com/Azure-Samples/Iot-Telemetry-Simulator/)
- [Azure DevOps](https://azure.microsoft.com/en-au/services/devops/)
- [Azure DevOps](https://azure.microsoft.com/products/devops/)
- [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)
- [Terraform](https://www.terraform.io/)

Expand All @@ -90,7 +90,7 @@ There are 3 major steps to running the sample. Follow each sub-page in order:
- [Github account](https://github.com/) [Optional]
- [Azure Account](https://azure.microsoft.com/en-au/free/)
- *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.
- [Azure DevOps Project](https://azure.microsoft.com/en-us/services/devops/)
- [Azure DevOps Project](https://azure.microsoft.com/products/devops/)
- *Permissions needed*: ability to create [service connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml), [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml) and [variable groups](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml).
- Software
- [Azure CLI 2.18+](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)
Expand Down
6 changes: 3 additions & 3 deletions single_tech_samples/azuresql/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Azure SQL Database

[Azure SQL Database](https://azure.microsoft.com/en-au/services/sql-database/) is a relational database commonly used in the MDW architecture, typically in the serving layer. The following samples demonstrates how you might build CI/CD pipelines to deploy changes to
[Azure SQL Database](https://azure.microsoft.com/products/azure-sql/database/) is a relational database commonly used in the MDW architecture, typically in the serving layer. The following samples demonstrates how you might build CI/CD pipelines to deploy changes to
Azure SQL Database.

## Contents
Expand Down Expand Up @@ -76,9 +76,9 @@ The following are some sample [Azure DevOps](https://docs.microsoft.com/en-us/az
### Prerequisites

1. [Github account](https://github.com/)
2. [Azure Account](https://azure.microsoft.com/en-au/free/search/?&ef_id=Cj0KCQiAr8bwBRD4ARIsAHa4YyLdFKh7JC0jhbxhwPeNa8tmnhXciOHcYsgPfNB7DEFFGpNLTjdTPbwaAh8bEALw_wcB:G:s&OCID=AID2000051_SEM_O2ShDlJP&MarinID=O2ShDlJP_332092752199_azure%20account_e_c__63148277493_aud-390212648371:kwd-295861291340&lnkd=Google_Azure_Brand&dclid=CKjVuKOP7uYCFVapaAoddSkKcA)
2. [Azure Account](https://azure.microsoft.com/free)
- *Permissions needed*: ability to create and deploy to an azure [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview), a [service principal](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals), and grant the [collaborator role](https://docs.microsoft.com/en-us/azure/role-based-access-control/overview) to the service principal over the resource group.
3. [Azure DevOps Account](https://azure.microsoft.com/en-us/services/devops/)
3. [Azure DevOps Account](https://azure.microsoft.com/products/devops/)
- *Permissions needed*: ability to create [service connections](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml) and [pipelines](https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/pipelines-get-started?view=azure-devops&tabs=yaml).

#### Software Prerequisites
Expand Down
Loading
Loading