All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added
SQLiteQuery
task - Added
CloudForCustomers
source - Added
CloudForCustomersToDF
andCloudForCustomersToCSV
tasks - Added
CloudForCustomersToADLS
flow - Added support for parquet in
CloudForCustomersToDF
- Added style guidelines to the
README
- Added local setup and commands to the
README
- Changed CI/CD algorithm
- the
latest
Docker image is now only updated on release and is the same exact image as the latest release - the
dev
image is released only on pushes and PRs to thedev
branch (so dev branch = dev image)
- the
- Modified
ADLSToAzureSQL
- read_sep and write_sep parameters added to the flow.
- Fixed
ADLSToAzureSQL
breaking in"append"
mode if the table didn't exist (#145). - Fixed
ADLSToAzureSQL
breaking in promotion path for csv files.
- Added flows library docs to the references page
- Moved task library docs page to topbar
- Updated docs for task and flows
- Added
start
andend_date
parameters toSupermetricsToADLS
flow - Added a tutorial on how to pull data from
Supermetrics
- Added documentation (both docstrings and MKDocs docs) for multiple tasks
- Added
start_date
andend_date
parameters to theSupermetricsToAzureSQL
flow - Added a temporary workaround
df_to_csv_task
task to theSupermetricsToADLS
flow to handle mixed dtype columns not handled automatically by DataFrame'sto_parquet()
method
- Modified
RunGreatExpectationsValidation
task to use the built in support for evaluation parameters added in Prefect v0.15.3 - Modified
SupermetricsToADLS
andADLSGen1ToAzureSQLNew
flows to align with this recipe for reading the expectation suite JSON The suite now has to be loaded before flow initialization in the flow's python file and passed as an argument to the flow's constructor. - Modified
RunGreatExpectationsValidation
'sexpectations_path
parameter to point to the directory containing the expectation suites instead of the Great Expectations project directory, which was confusing. The project directory is now only used internally and not exposed to the user - Changed the logging of docs URL for
RunGreatExpectationsValidation
task to use GE's recipe from the docs
- Added a test for
SupermetricsToADLS
flow -Added a test forAzureDataLakeList
task - Added PR template for new PRs
- Added a
write_to_json
util task to theSupermetricsToADLS
flow. This task dumps the input expectations dict to the local filesystem as is required by Great Expectations. This allows the user to simply pass a dict with their expectations and not worry about the project structure required by Great Expectations - Added
Shapely
andimagehash
dependencies required for fullvisions
functionality (installingvisions[all]
breaks the build) - Added more parameters to control CSV parsing in the
ADLSGen1ToAzureSQLNew
flow - Added
keep_output
parameter to theRunGreatExpectationsValidation
task to control Great Expectations output to the filesystem - Added
keep_validation_output
parameter andcleanup_validation_clutter
task to theSupermetricsToADLS
flow to control Great Expectations output to the filesystem
- Removed
SupermetricsToAzureSQLv2
andSupermetricsToAzureSQLv3
flows - Removed
geopy
dependency
- Added support for parquet in
AzureDataLakeToDF
- Added proper logging to the
RunGreatExpectationsValidation
task - Added the
viz
Prefect extra to requirements to allow flow visualizaion - Added a few utility tasks in
task_utils
- Added
geopy
dependency - Tasks:
AzureDataLakeList
- for listing files in an ADLS directory
- Flows:
ADLSToAzureSQL
- promoting files to conformed, operations, creating an SQL table and inserting the data into itADLSContainerToContainer
- copying files between ADLS containers
- Renamed
ReadAzureKeyVaultSecret
andRunAzureSQLDBQuery
tasks to match Prefect naming style - Flows:
SupermetricsToADLS
- changed csv to parquet file extension. File and schema info are loaded to theRAW
container.
- Removed the broken version autobump from CI
- Flows:
SupermetricsToADLS
- supporting immutable ADLS setup
- A default value for the
ds_user
parameter inSupermetricsToAzureSQLv3
can now be specified in theSUPERMETRICS_DEFAULT_USER
secret - Updated multiple dependencies
- Fixed "Local run of
SupermetricsToAzureSQLv3
skips all tasks afterunion_dfs_task
" (#59) - Fixed the
release
GitHub action
-
Sources:
AzureDataLake
(supports gen1 & gen2)SQLite
-
Tasks:
DownloadGitHubFile
AzureDataLakeDownload
AzureDataLakeUpload
AzureDataLakeToDF
ReadAzureKeyVaultSecret
CreateAzureKeyVaultSecret
DeleteAzureKeyVaultSecret
SQLiteInsert
SQLiteSQLtoDF
AzureSQLCreateTable
RunAzureSQLDBQuery
BCPTask
RunGreatExpectationsValidation
SupermetricsToDF
-
Flows:
SupermetricsToAzureSQLv1
SupermetricsToAzureSQLv2
SupermetricsToAzureSQLv3
AzureSQLTransform
Pipeline
ADLSGen1ToGen2
ADLSGen1ToAzureSQL
ADLSGen1ToAzureSQLNew
-
Examples:
- Hello world flow
- Supermetrics Google Ads extract
- Tasks now use secrets for credential management (azure tasks use Azure Key Vault secrets)
- SQL source now has a default query timeout of 1 hour
- Fix
SQLite
tests - Multiple stability improvements with retries and timeouts
- Moved from poetry to pip
- Fix
AzureBlobStorage
'sto_storage()
method is missing the final upload blob part