Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial feature-engineering-on-fabric single-tech sample check-in #652

Merged
merged 27 commits into from
Nov 17, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
32f162c
feat: initial feature-eng-on-fabric single-tech sample check-in
thurstonchen Nov 9, 2023
78f56b3
doc: resize screenshots with minor contents updates
thurstonchen Nov 10, 2023
0db5811
update for data source landing
Nick287 Nov 10, 2023
05d9dc0
code: update model training notebook
cchenshu Nov 10, 2023
747a039
update for data loading base url and relative path
Nick287 Nov 13, 2023
44ac774
Apply suggestions from code review (Nov. 13th)
thurstonchen Nov 13, 2023
81ecab0
remove App service code and update images
Nick287 Nov 14, 2023
32bc03d
for simplicity remove option 1 and send it as a footnote info no details
Nick287 Nov 14, 2023
64823b6
fix: use Fabric workspace & lakehouse id in Purview qualified names, …
thurstonchen Nov 14, 2023
a0f74b6
Fixing some linking errors
promisinganuj Nov 14, 2023
e5094f2
Updated introduction and architecture description
promisinganuj Nov 15, 2023
08fc3ac
Updated environment setup details
promisinganuj Nov 15, 2023
2469bdf
Updated 'Source Dataset' section
promisinganuj Nov 15, 2023
7f531f5
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
4769fb9
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
8525ae5
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
e7623dc
Updated 'Data Activity' section
promisinganuj Nov 15, 2023
7bb0749
doc: add contents on verifying lineage in Purview
thurstonchen Nov 15, 2023
2b2225f
doc: add missed bullet to Contents table
thurstonchen Nov 15, 2023
e02b924
Updating Lineage section
promisinganuj Nov 16, 2023
786922d
Updating Lineage section
promisinganuj Nov 16, 2023
597e774
Updating Lineage section
promisinganuj Nov 16, 2023
4cc2d21
Updating Lineage section
promisinganuj Nov 16, 2023
0b4f621
Updating Lineage section
promisinganuj Nov 16, 2023
b5d1586
Updating 'Required resources' header
promisinganuj Nov 16, 2023
4622bb2
Fixing URL checks
promisinganuj Nov 16, 2023
2382f59
Fixing URL checks
promisinganuj Nov 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Fixing URL checks
  • Loading branch information
promisinganuj committed Nov 16, 2023
commit 2382f59ebd17bca8350ac101316f1586e3a64c6e
12 changes: 12 additions & 0 deletions .markdownlinkcheck.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,18 @@
},
{
"pattern": "^https://stmdwpublic.blob.core.windows.net/"
},
{
"pattern": "^https://azure.microsoft.com/en-us/free/"
},
{
"pattern": "^https://azure.microsoft.com/en-us/products/data-factory/"
},
{
"pattern": "^https://dev.azure.com"
},
{
"pattern": "^https://azure.microsoft.com/en-us/free/databricks/"
}
]
}
2 changes: 1 addition & 1 deletion e2e_samples/mdw_governance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ The following shows the architecture of the solution.
- [Azure Data Factory](https://azure.microsoft.com/en-us/products/data-factory/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Databricks](https://docs.microsoft.com/azure/databricks/)
- [Azure Key Vault](https://azure.microsoft.com/products/key-vault/)
- [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/)
- [Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/app-insights-overview)
- [Office365 API Connection](https://docs.microsoft.com/azure/connectors/connectors-create-api-office365-outlook)
- [Azure Virtual Network](https://docs.microsoft.com/azure/virtual-network/virtual-networks-overview)
Expand Down
6 changes: 3 additions & 3 deletions e2e_samples/parking_sensors/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The sample demonstrate how DevOps principles can be applied end to end Data Pipe

## Solution Overview

The solution pulls near realtime [Melbourne Parking Sensor data](https://www.melbourne.vic.gov.au/about-council/governance-transparency/open-data/Pages/on-street-parking-data.aspx) from a publicly available REST api endpoint and saves this to [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction). It then validates, cleanses, and transforms the data to a known schema using [Azure Databricks](https://azure.microsoft.com/products/databricks/). A second Azure Databricks job then transforms these into a [Star Schema](https://en.wikipedia.org/wiki/Star_schema) which are then loaded into [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/products/synapse-analytics/) using [Polybase](https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15). The entire pipeline is orchestrated with [Azure Data Factory](https://azure.microsoft.com/en-us/products/data-factory/).
The solution pulls near realtime [Melbourne Parking Sensor data](https://www.melbourne.vic.gov.au/about-council/governance-transparency/open-data/Pages/on-street-parking-data.aspx) from a publicly available REST api endpoint and saves this to [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction). It then validates, cleanses, and transforms the data to a known schema using [Azure Databricks](https://azure.microsoft.com/en-us/products/databricks/). A second Azure Databricks job then transforms these into a [Star Schema](https://en.wikipedia.org/wiki/Star_schema) which are then loaded into [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/products/synapse-analytics/) using [Polybase](https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15). The entire pipeline is orchestrated with [Azure Data Factory](https://azure.microsoft.com/en-us/products/data-factory/).

### Architecture

Expand All @@ -83,12 +83,12 @@ See [here](#build-and-release-pipeline) for details.
It makes use of the following azure services:

- [Azure Data Factory](https://azure.microsoft.com/en-us/products/data-factory/)
- [Azure Databricks](https://azure.microsoft.com/products/databricks/)
- [Azure Databricks](https://azure.microsoft.com/en-us/products/databricks/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Synapse Analytics (formerly SQLDW)](https://azure.microsoft.com/products/synapse-analytics/)
- [Azure DevOps](https://azure.microsoft.com/en-us/products/devops/)
- [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)
- [PowerBI](https://powerbi.microsoft.com/en-us/)
- [PowerBI](https://www.microsoft.com/en-us/power-platform/products/power-bi/)

For a detailed walk-through of the solution and key concepts, watch the following video recording:

Expand Down
2 changes: 1 addition & 1 deletion e2e_samples/parking_sensors_synapse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ It makes use of the following azure services:
- [Azure Synapse Analytics](https://azure.microsoft.com/products/synapse-analytics/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
- [Azure DevOps](https://azure.microsoft.com/en-us/products/devops/)
- [PowerBI](https://powerbi.microsoft.com/en-us/)
- [PowerBI](https://www.microsoft.com/en-us/power-platform/products/power-bi/)
- [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview)
- [Log Analytics](https://docs.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The following technologies are used to build this sample:

- [Azure Databricks](https://azure.microsoft.com/en-us/free/databricks/)
- [Azure Storage](https://azure.microsoft.com/en-us/products/storage/data-lake-storage/)
- [Azure Key Vault](https://azure.microsoft.com/products/key-vault/)
- [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/)
- [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/)
- [Azure Resource Manager](https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/overview)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ The following technologies are used to build this sample:

- [Azure Databricks](https://azure.microsoft.com/en-us/free/databricks/)
- [Azure Storage](https://azure.microsoft.com/en-us/products/storage/data-lake-storage/)
- [Azure Key Vault](https://azure.microsoft.com/products/key-vault/)
- [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/)
- [Azure Virtual networks](https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-overview)
- [Azure Firewall](https://docs.microsoft.com/en-us/azure/firewall/overview)
- [Azure Route tables](https://docs.microsoft.com/en-us/azure/virtual-network/manage-route-table)
Expand Down
2 changes: 1 addition & 1 deletion single_tech_samples/datafactory/sample1_cicd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ The following shows the overall CI/CD process as built with Azure DevOps Pipelin

- [Azure Data Factory](https://azure.microsoft.com/en-us/products/data-factory/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Key Vault](https://azure.microsoft.com/products/key-vault/)
- [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/)
- [Azure DevOps](https://azure.microsoft.com/en-us/products/devops/)
- [pytest-adf](https://github.com/devlace/pytest-adf)

Expand Down
2 changes: 1 addition & 1 deletion single_tech_samples/purview/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ The following shows the simple architecture of the Azure Purview connected to fe
- [Azure Purview](https://azure.microsoft.com/en-us/products/devops/)
- [Azure Data Factory](https://azure.microsoft.com/en-us/products/data-factory/)
- [Azure Data Lake Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-introduction)
- [Azure Key Vault](https://azure.microsoft.com/products/key-vault/)
- [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault/)

## Key Learnings

Expand Down
Loading