diff --git a/README.md b/README.md index 1ab6d13..13d7f7e 100644 --- a/README.md +++ b/README.md @@ -4,12 +4,11 @@ ETL process to: + Extract data about 273 Currencies exchange rates from a public API + Organize, transform and store data in two tables (Dollar and Euro based rates) in a SQLite DB + Generate a Customized Excel report (decision-making tool) - + Write unit tests to ensure data quality and availability + Orchestrate a job with Airflow to recurrently run all steps # Step 1: ETL -Data will be Extract from a public API https://github.com/fawazahmed0/currency-api. The API return a daily updated json with all the exchange rates for the for the selected base currency and date. +Data will be Extract from a public API https://github.com/fawazahmed0/exchange-api/tree/main. The API return a daily updated json with all the exchange rates for the for the selected base currency and date. The script [update_currency_exchange.py](update_currency_exchange.py) is responsible for the whole ETL process. In short, this script contains functions to: @@ -34,23 +33,14 @@ it's easier to show than to describe: ![png](readme_files/report_print.PNG) -# Step 3: Unit Tests +# Step 3: Orchestrate a Job/ Run the Pipeline -Although I'm calling this the "step 3" (just to keep the logical order) it will be the first step to be executed in the pipeline. +There are two ways to run the pipeline responsible for all stages of the process: -The scrip [test_api_input.py](test_api_input.py) you will run a few tests and save a txt log file in the folder [log](log). Tests are checking: -+ API connection -+ DB connection -+ API latest date (test if API is being updated) -+ Quantity of currencies the API returned, test id there are new currencies available +The first and simplest is through the script [main.py](main.py) , running it from the CLI or from Docker [run.sh](run.sh) will execute all the steps in the pipeline. -# Step 4: Orchestrate a Job/ Run the Pipeline +The second is option is to orchestrate a job with Airflow. the DAG [dag_currency_exchange_etl.py](src/airflow/dag_currency_exchange_etl.py) will also run all the steps in the pipeline, it will only be necessary to have an active Airflow server. -There are two ways to run the pipeline responsible for the three stages of the process (Unit Tests, ETL, Excel report) - -The first and simplest is through the script [main.py](main.py) , running it from the CLI will execute all the steps in the pipeline. - -The second is option is to orchestrate a job with Airflow. the DAG [dag_currency_exchange_etl.py](dag_currency_exchange_etl.py) will also run all the steps in the pipeline, it will only be necessary to have an active Airflow server. # Usage @@ -67,7 +57,37 @@ Linting: ./run_linting.sh ``` -Tests: +Unit tests: ```shell ./run_unit_tests.sh -``` \ No newline at end of file +``` +Integration tests: +```shell +./run_integration_tests.sh +``` + + +# Structure + +```bash +├── coverage +├── readme_files +├── src +│ ├── airflow +│ ├── database +│ ├── modules +│ └── reports +└── tests + ├── integration + └── unit + └── sample_data + ``` + +- `coverage` (not present in Github) is created when you run unit tests, and contains HTML for the code coverage. Specifically, open `coverage/index.html` to view the code coverage. 100% coverage is required. +- [readme_files](readme_files) Images used in readme. +- [src](src) Contains the application code. +- [src/airflow](src/airflow/) DAG file to run the app through Airflow. +- [src/database](src/database) SQLLite DB. +- [src/modules](src/pipeline/) Modules to run the ETL pipeline and generate the Excel report. +- [src/reports](src/reports/) Contains the generated Excel reports. +- [tests](tests) unit tests, integration tests and data samples. \ No newline at end of file