From cc19fe8a127a18455bb8822fc73fa8b1afb1c820 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Zawadzki?= <michalmzawadzki@gmail.com>
Date: Tue, 27 Aug 2024 14:30:33 +0200
Subject: [PATCH] =?UTF-8?q?=F0=9F=94=A5=20Remove=20the=20changelog=20(#101?=
 =?UTF-8?q?7)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Remove the changelog, as it's now automated within the release process.
---
 CHANGELOG.md | 475 ---------------------------------------------------
 1 file changed, 475 deletions(-)
 delete mode 100644 CHANGELOG.md

diff --git a/CHANGELOG.md b/CHANGELOG.md
deleted file mode 100644
index e3847c62e..000000000
--- a/CHANGELOG.md
+++ /dev/null
@@ -1,475 +0,0 @@
-# Changelog
-
-All notable changes to this project will be documented in this file.
-
-The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
-and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
-
-## [Unreleased]
-
-- Added new version of `Genesys` connector and test files.
-- Added new version of `Outlook` connector and test files.
-- Added new version of `Hubspot` connector and test files.
-- Added `Mindful` connector and test file.
-
-### Added
-
-- Added `sql_server_to_parquet` Prefect flow.
-- Added `sap_to_parquet` Prefect flow.
-- Added `duckdb_to_sql_server`, `duckdb_to_parquet`, `duckdb_transform` Prefect flows.
-- Added `bcp` and `duckdb_query` Prefect tasks.
-- Added `DuckDB` source class.
-- Added `sql_server_to_minio` flow for prefect.
-- Added `df_to_minio` task for prefect
-- Added handling for `DatabaseCredentials` and `Secret` blocks in `prefect/utlis.py:get_credentials`
-- Added `SQLServer` source and tasks `create_sql_server_table`, `sql_server_to_df`,`sql_server_query`
-- Added `basename_template` to `MinIO` source
-- Added `_empty_column_to_string` and `_convert_all_to_string_type` to convert data types to string.
-- Added `na_values` parameter to `Sharepoint` class to parse `N/A` values coming from the excel file columns.
-- Added `get_last_segment_from_url` function to sharepoint file.
-- Added `validate` function to `viadot/utils.py`
-- Fixed `Databricks.create_table_from_pandas()` failing to overwrite a table in some cases even with `replace="True"`
-- Enabled Databricks Connect in the image. To enable, [follow this guide](./README.md#executing-spark-jobs)
-- Added `Databricks` source
-- Added `ExchangeRates` source
-- Added `from_df()` method to `AzureDataLake` source
-- Added `SAPRFC` source
-- Added `S3` source
-- Added `RedshiftSpectrum` source
-- Added `upload()` and `download()` methods to `S3` source
-- Added `Genesys` source
-- Fixed a bug in `Databricks.create_table_from_pandas()`. The function that converts column names to snake_case was not used in every case. (#672)
-- Added `howto_migrate_sources_tasks_and_flows.md` document explaining viadot 1 -> 2 migration process
-- `RedshiftSpectrum.from_df()` now automatically creates a folder for the table if not specified in `to_path`
-- Fixed a bug in `Databricks.create_table_from_pandas()`. The function now automatically casts DataFrame types. (#681)
-- Added `close_connection()` to `SAPRFC`
-- Added `Trino` source
-- Added `MinIO` source
-- Added `gen_split()` method to `SAPRFCV2` class to allow looping over a data frame with generator - improves performance
-- Added `adjust_where_condition_by_adding_missing_spaces()` to `SAPRFC`. The function that is checking raw sql query and modifing it - if needed.
-
-### Changed
-
-- Changed location of `task_utils.py` and removed unused/prefect1-related tasks.
-- Changed the way of handling `NA` string values and mapped column types to `str` for `Sharepoint` source.
-- Added `SQLServerToDF` task
-- Added `SQLServerToDuckDB` flow which downloads data from SQLServer table, loads it to parquet file and then uploads it do DuckDB
-- Added complete proxy set up in `SAPRFC` example (`viadot/examples/sap_rfc`)
-- Added Databricks/Spark setup to the image. See README for setup & usage instructions
-- Added rollback feature to `Databricks` source
-- Changed all Prefect logging instances in the `sources` directory to native Python logging
-- Changed `rm()`, `from_df()`, `to_df()` methods in `S3` Source
-- Changed `get_request()` to `handle_api_request()` in `utils.py`
-- Changed `SAPRFCV2` in `to_df()`for loop with generator
-- Updated `Dockerfile` to remove obsolete `adoptopenjdk` and replace it with `temurin`
-
-### Removed
-
-- Removed the `env` param from `Databricks` source, as user can now store multiple configs for the same source using different config keys
-- Removed Prefect dependency from the library (Python library, Docker base image)
-- Removed `catch_extra_separators()` from `SAPRFCV2` class
-
-### Fixed
-
-- Fixed `bcp` prefect task to run correct.
-- Fixed the typo in credentials in `SQLServer` source
-
-## [0.4.3] - 2022-04-28
-
-### Added
-
-- Added `adls_file_name` in `SupermetricsToADLS` and `SharepointToADLS` flows
-- Added `BigQueryToADLS` flow class which anables extract data from BigQuery
-- Added `Salesforce` source
-- Added `SalesforceUpsert` task
-- Added `SalesforceBulkUpsert` task
-- Added C4C secret handling to `CloudForCustomersReportToADLS` flow (`c4c_credentials_secret` parameter)
-
-### Fixed
-
-- Fixed `get_flow_last_run_date()` incorrectly parsing the date
-- Fixed C4C secret handling (tasks now correctly read the secret as the credentials, rather than assuming the secret is a container for credentials for all environments and trying to access specific key inside it). In other words, tasks now assume the secret holds credentials, rather than a dict of the form `{env: credentials, env2: credentials2}`
-- Fixed `utils.gen_bulk_insert_query_from_df()` failing with > 1000 rows due to INSERT clause limit by chunking the data into multiple INSERTs
-- Fixed `get_flow_last_run_date()` incorrectly parsing the date
-- Fixed `MultipleFlows` when one flow is passed and when last flow fails.
-- Fixed issue with async usage in `Genesys.genesys_generate_exports()` (#669).
-
-## [0.4.2] - 2022-04-08
-
-### Added
-
-- Added `AzureDataLakeRemove` task
-
-### Changed
-
-- Changed name of task file from `prefect` to `prefect_date_range`
-
-### Fixed
-
-- Fixed out of range issue in `prefect_date_range`
-
-## [0.4.1] - 2022-04-07
-
-### Changed
-
-- bumped version
-
-## [0.4.0] - 2022-04-07
-
-### Added
-
-- Added `custom_mail_state_handler` function that sends mail notification using custom smtp server.
-- Added new function `df_clean_column` that cleans data frame columns from special characters
-- Added `df_clean_column` util task that removes special characters from a pandas DataFrame
-- Added `MultipleFlows` flow class which enables running multiple flows in a given order.
-- Added `GetFlowNewDateRange` task to change date range based on Prefect flows
-- Added `check_col_order` parameter in `ADLSToAzureSQL`
-- Added new source `ASElite`
-- Added KeyVault support in `CloudForCustomers` tasks
-- Added `SQLServer` source
-- Added `DuckDBToDF` task
-- Added `DuckDBTransform` flow
-- Added `SQLServerCreateTable` task
-- Added `credentials` param to `BCPTask`
-- Added `get_sql_dtypes_from_df` and `update_dict` util tasks
-- Added `DuckDBToSQLServer` flow
-- Added `if_exists="append"` option to `DuckDB.create_table_from_parquet()`
-- Added `get_flow_last_run_date` util function
-- Added `df_to_dataset` task util for writing DataFrames to data lakes using `pyarrow`
-- Added retries to Cloud for Customers tasks
-- Added `chunksize` parameter to `C4CToDF` task to allow pulling data in chunks
-- Added `chunksize` parameter to `BCPTask` task to allow more control over the load process
-- Added support for SQL Server's custom `datetimeoffset` type
-- Added `AzureSQLToDF` task
-- Added `AzureDataLakeRemove` task
-- Added `AzureSQLUpsert` task
-
-### Changed
-
-- Changed the base class of `AzureSQL` to `SQLServer`
-- `df_to_parquet()` task now creates directories if needed
-- Added several more separators to check for automatically in `SAPRFC.to_df()`
-- Upgraded `duckdb` version to 0.3.2
-
-### Fixed
-
-- Fixed bug with `CheckColumnOrder` task
-- Fixed OpenSSL config for old SQL Servers still using TLS < 1.2
-- `BCPTask` now correctly handles custom SQL Server port
-- Fixed `SAPRFC.to_df()` ignoring user-specified separator
-- Fixed temporary CSV generated by the `DuckDBToSQLServer` flow not being cleaned up
-- Fixed some mappings in `get_sql_dtypes_from_df()` and optimized performance
-- Fixed `BCPTask` - the case when the file path contained a space
-- Fixed credential evaluation logic (`credentials` is now evaluated before `config_key`)
-- Fixed "$top" and "$skip" values being ignored by `C4CToDF` task if provided in the `params` parameter
-- Fixed `SQL.to_df()` incorrectly handling queries that begin with whitespace
-
-### Removed
-
-- Removed `autopick_sep` parameter from `SAPRFC` functions. The separator is now always picked automatically if not provided.
-- Removed `dtypes_to_json` task to task_utils.py
-
-## [0.3.2] - 2022-02-17
-
-### Fixed
-
-- fixed an issue with schema info within `CheckColumnOrder` class.
-
-## [0.3.1] - 2022-02-17
-
-### Changed
-
--`ADLSToAzureSQL` - added `remove_tab` parameter to remove unnecessary tab separators from data.
-
-### Fixed
-
-- fixed an issue with return df within `CheckColumnOrder` class.
-
-## [0.3.0] - 2022-02-16
-
-### Added
-
-- new source `SAPRFC` for connecting with SAP using the `pyRFC` library (requires pyrfc as well as the SAP NW RFC library that can be downloaded [here](https://support.sap.com/en/product/connectors/nwrfcsdk.html)
-- new source `DuckDB` for connecting with the `DuckDB` database
-- new task `SAPRFCToDF` for loading data from SAP to a pandas DataFrame
-- new tasks, `DuckDBQuery` and `DuckDBCreateTableFromParquet`, for interacting with DuckDB
-- new flow `SAPToDuckDB` for moving data from SAP to DuckDB
-- Added `CheckColumnOrder` task
-- C4C connection with url and report_url documentation -`SQLIteInsert` check if DataFrame is empty or object is not a DataFrame
-- KeyVault support in `SharepointToDF` task
-- KeyVault support in `CloudForCustomers` tasks
-
-### Changed
-
-- pinned Prefect version to 0.15.11
-- `df_to_csv` now creates dirs if they don't exist
-- `ADLSToAzureSQL` - when data in csv coulmns has unnecessary "\t" then removes them
-
-### Fixed
-
-- fixed an issue with duckdb calls seeing initial db snapshot instead of the updated state (#282)
-- C4C connection with url and report_url optimization
-- column mapper in C4C source
-
-## [0.2.15] - 2022-01-12
-
-### Added
-
-- new option to `ADLSToAzureSQL` Flow - `if_exists="delete"`
-- `SQL` source: `create_table()` already handles `if_exists`; now it handles a new option for `if_exists()`
-- `C4CToDF` and `C4CReportToDF` tasks are provided as a class instead of function
-
-### Fixed
-
-- Appending issue within CloudForCustomers source
-- An early return bug in `UKCarbonIntensity` in `to_df` method
-
-## [0.2.14] - 2021-12-01
-
-### Fixed
-
-- authorization issue within `CloudForCustomers` source
-
-## [0.2.13] - 2021-11-30
-
-### Added
-
-- Added support for file path to `CloudForCustomersReportToADLS` flow
-- Added `flow_of_flows` list handling
-- Added support for JSON files in `AzureDataLakeToDF`
-
-### Fixed
-
-- `Supermetrics` source: `to_df()` now correctly handles `if_empty` in case of empty results
-
-### Changed
-
-- `Sharepoint` and `CloudForCustomers` sources will now provide an informative `CredentialError` which is also raised early. This will make issues with input credenials immediately clear to the user.
-- Removed set_key_value from `CloudForCustomersReportToADLS` flow
-
-## [0.2.12] - 2021-11-25
-
-### Added
-
-- Added `Sharepoint` source
-- Added `SharepointToDF` task
-- Added `SharepointToADLS` flow
-- Added `CloudForCustomers` source
-- Added `c4c_report_to_df` task
-- Added `def c4c_to_df` task
-- Added `CloudForCustomersReportToADLS` flow
-- Added `df_to_csv` task to task_utils.py
-- Added `df_to_parquet` task to task_utils.py
-- Added `dtypes_to_json` task to task_utils.py
-
-## [0.2.11] - 2021-10-30
-
-### Fixed
-
-- `ADLSToAzureSQL` - fixed path to csv issue.
-- `SupermetricsToADLS` - fixed local json path issue.
-
-## [0.2.10] - 2021-10-29
-
-### Release due to CI/CD error
-
-## [0.2.9] - 2021-10-29
-
-### Release due to CI/CD error
-
-## [0.2.8] - 2021-10-29
-
-### Changed
-
-- CI/CD: `dev` image is now only published on push to the `dev` branch
-- Docker:
-  - updated registry links to use the new `ghcr.io` domain
-  - `run.sh` now also accepts the `-t` option. When run in standard mode, it will only spin up the `viadot_jupyter_lab` service.
-    When ran with `-t dev`, it will also spin up `viadot_testing` and `viadot_docs` containers.
-
-### Fixed
-
-- ADLSToAzureSQL - fixed path parameter issue.
-
-## [0.2.7] - 2021-10-04
-
-### Added
-
-- Added `SQLiteQuery` task
-- Added `CloudForCustomers` source
-- Added `CloudForCustomersToDF` and `CloudForCustomersToCSV` tasks
-- Added `CloudForCustomersToADLS` flow
-- Added support for parquet in `CloudForCustomersToDF`
-- Added style guidelines to the `README`
-- Added local setup and commands to the `README`
-
-### Changed
-
-- Changed CI/CD algorithm
-  - the `latest` Docker image is now only updated on release and is the same exact image as the latest release
-  - the `dev` image is released only on pushes and PRs to the `dev` branch (so dev branch = dev image)
-- Modified `ADLSToAzureSQL` - _read_sep_ and _write_sep_ parameters added to the flow.
-
-### Fixed
-
-- Fixed `ADLSToAzureSQL` breaking in `"append"` mode if the table didn't exist (#145).
-- Fixed `ADLSToAzureSQL` breaking in promotion path for csv files.
-
-## [0.2.6] - 2021-09-22
-
-### Added
-
-- Added flows library docs to the references page
-
-### Changed
-
-- Moved task library docs page to topbar
-- Updated docs for task and flows
-
-## [0.2.5] - 2021-09-20
-
-### Added
-
-- Added `start` and `end_date` parameters to `SupermetricsToADLS` flow
-- Added a tutorial on how to pull data from `Supermetrics`
-
-## [0.2.4] - 2021-09-06
-
-### Added
-
-- Added documentation (both docstrings and MKDocs docs) for multiple tasks
-- Added `start_date` and `end_date` parameters to the `SupermetricsToAzureSQL` flow
-- Added a temporary workaround `df_to_csv_task` task to the `SupermetricsToADLS` flow to handle mixed dtype columns not handled automatically by DataFrame's `to_parquet()` method
-
-## [0.2.3] - 2021-08-19
-
-### Changed
-
-- Modified `RunGreatExpectationsValidation` task to use the built in support for evaluation parameters added in Prefect v0.15.3
-- Modified `SupermetricsToADLS` and `ADLSGen1ToAzureSQLNew` flows to align with this [recipe](https://docs.prefect.io/orchestration/flow_config/storage.html#loading-additional-files-with-git-storage) for reading the expectation suite JSON
-  The suite now has to be loaded before flow initialization in the flow's python file and passed as an argument to the flow's constructor.
-- Modified `RunGreatExpectationsValidation`'s `expectations_path` parameter to point to the directory containing the expectation suites instead of the
-  Great Expectations project directory, which was confusing. The project directory is now only used internally and not exposed to the user
-- Changed the logging of docs URL for `RunGreatExpectationsValidation` task to use GE's recipe from [the docs](https://docs.greatexpectations.io/docs/guides/validation/advanced/how_to_implement_custom_notifications/)
-
-### Added
-
-- Added a test for `SupermetricsToADLS` flow
-  -Added a test for `AzureDataLakeList` task
-- Added PR template for new PRs
-- Added a `write_to_json` util task to the `SupermetricsToADLS` flow. This task dumps the input expectations dict to the local filesystem as is required by Great Expectations.
-  This allows the user to simply pass a dict with their expectations and not worry about the project structure required by Great Expectations
-- Added `Shapely` and `imagehash` dependencies required for full `visions` functionality (installing `visions[all]` breaks the build)
-- Added more parameters to control CSV parsing in the `ADLSGen1ToAzureSQLNew` flow
-- Added `keep_output` parameter to the `RunGreatExpectationsValidation` task to control Great Expectations output to the filesystem
-- Added `keep_validation_output` parameter and `cleanup_validation_clutter` task to the `SupermetricsToADLS` flow to control Great Expectations output to the filesystem
-
-### Removed
-
-- Removed `SupermetricsToAzureSQLv2` and `SupermetricsToAzureSQLv3` flows
-- Removed `geopy` dependency
-
-## [0.2.2] - 2021-07-27
-
-### Added
-
-- Added support for parquet in `AzureDataLakeToDF`
-- Added proper logging to the `RunGreatExpectationsValidation` task
-- Added the `viz` Prefect extra to requirements to allow flow visualizaion
-- Added a few utility tasks in `task_utils`
-- Added `geopy` dependency
-- Tasks:
-  - `AzureDataLakeList` - for listing files in an ADLS directory
-- Flows:
-  - `ADLSToAzureSQL` - promoting files to conformed, operations,
-    creating an SQL table and inserting the data into it
-  - `ADLSContainerToContainer` - copying files between ADLS containers
-
-### Changed
-
-- Renamed `ReadAzureKeyVaultSecret` and `RunAzureSQLDBQuery` tasks to match Prefect naming style
-- Flows:
-  - `SupermetricsToADLS` - changed csv to parquet file extension. File and schema info are loaded to the `RAW` container.
-
-### Fixed
-
-- Removed the broken version autobump from CI
-
-## [0.2.1] - 2021-07-14
-
-### Added
-
-- Flows:
-  - `SupermetricsToADLS` - supporting immutable ADLS setup
-
-### Changed
-
-- A default value for the `ds_user` parameter in `SupermetricsToAzureSQLv3` can now be
-  specified in the `SUPERMETRICS_DEFAULT_USER` secret
-- Updated multiple dependencies
-
-### Fixed
-
-- Fixed "Local run of `SupermetricsToAzureSQLv3` skips all tasks after `union_dfs_task`" (#59)
-- Fixed the `release` GitHub action
-
-## [0.2.0] - 2021-07-12
-
-### Added
-
-- Sources:
-
-  - `AzureDataLake` (supports gen1 & gen2)
-  - `SQLite`
-
-- Tasks:
-
-  - `DownloadGitHubFile`
-  - `AzureDataLakeDownload`
-  - `AzureDataLakeUpload`
-  - `AzureDataLakeToDF`
-  - `ReadAzureKeyVaultSecret`
-  - `CreateAzureKeyVaultSecret`
-  - `DeleteAzureKeyVaultSecret`
-  - `SQLiteInsert`
-  - `SQLiteSQLtoDF`
-  - `AzureSQLCreateTable`
-  - `RunAzureSQLDBQuery`
-  - `BCPTask`
-  - `RunGreatExpectationsValidation`
-  - `SupermetricsToDF`
-
-- Flows:
-
-  - `SupermetricsToAzureSQLv1`
-  - `SupermetricsToAzureSQLv2`
-  - `SupermetricsToAzureSQLv3`
-  - `AzureSQLTransform`
-  - `Pipeline`
-  - `ADLSGen1ToGen2`
-  - `ADLSGen1ToAzureSQL`
-  - `ADLSGen1ToAzureSQLNew`
-
-- Examples:
-  - Hello world flow
-  - Supermetrics Google Ads extract
-
-### Changed
-
-- Tasks now use secrets for credential management (azure tasks use Azure Key Vault secrets)
-- SQL source now has a default query timeout of 1 hour
-
-### Fixed
-
-- Fix `SQLite` tests
-- Multiple stability improvements with retries and timeouts
-
-## [0.1.12] - 2021-05-08
-
-### Changed
-
-- Moved from poetry to pip
-
-### Fixed
-
-- Fix `AzureBlobStorage`'s `to_storage()` method is missing the final upload blob part