Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
FelipeMezzarana committed Mar 2, 2024
1 parent 72c2811 commit 67241a8
Showing 1 changed file with 37 additions and 17 deletions.
54 changes: 37 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ ETL process to:
+ Extract data about 273 Currencies exchange rates from a public API
+ Organize, transform and store data in two tables (Dollar and Euro based rates) in a SQLite DB
+ Generate a Customized Excel report (decision-making tool)
+ Write unit tests to ensure data quality and availability
+ Orchestrate a job with Airflow to recurrently run all steps

# Step 1: ETL

Data will be Extract from a public API https://github.com/fawazahmed0/currency-api. The API return a daily updated json with all the exchange rates for the for the selected base currency and date.
Data will be Extract from a public API https://github.com/fawazahmed0/exchange-api/tree/main. The API return a daily updated json with all the exchange rates for the for the selected base currency and date.

The script [update_currency_exchange.py](update_currency_exchange.py) is responsible for the whole ETL process. In short, this script contains functions to:

Expand All @@ -34,23 +33,14 @@ it's easier to show than to describe:
![png](readme_files/report_print.PNG)


# Step 3: Unit Tests
# Step 3: Orchestrate a Job/ Run the Pipeline

Although I'm calling this the "step 3" (just to keep the logical order) it will be the first step to be executed in the pipeline.
There are two ways to run the pipeline responsible for all stages of the process:

The scrip [test_api_input.py](test_api_input.py) you will run a few tests and save a txt log file in the folder [log](log). Tests are checking:
+ API connection
+ DB connection
+ API latest date (test if API is being updated)
+ Quantity of currencies the API returned, test id there are new currencies available
The first and simplest is through the script [main.py](main.py) , running it from the CLI or from Docker [run.sh](run.sh) will execute all the steps in the pipeline.

# Step 4: Orchestrate a Job/ Run the Pipeline
The second is option is to orchestrate a job with Airflow. the DAG [dag_currency_exchange_etl.py](src/airflow/dag_currency_exchange_etl.py) will also run all the steps in the pipeline, it will only be necessary to have an active Airflow server.

There are two ways to run the pipeline responsible for the three stages of the process (Unit Tests, ETL, Excel report)

The first and simplest is through the script [main.py](main.py) , running it from the CLI will execute all the steps in the pipeline.

The second is option is to orchestrate a job with Airflow. the DAG [dag_currency_exchange_etl.py](dag_currency_exchange_etl.py) will also run all the steps in the pipeline, it will only be necessary to have an active Airflow server.

# Usage

Expand All @@ -67,7 +57,37 @@ Linting:
./run_linting.sh
```

Tests:
Unit tests:
```shell
./run_unit_tests.sh
```
```
Integration tests:
```shell
./run_integration_tests.sh
```


# Structure

```bash
├── coverage
├── readme_files
├── src
│ ├── airflow
│ ├── database
│ ├── modules
│ └── reports
└── tests
├── integration
└── unit
└── sample_data
```

- `coverage` (not present in Github) is created when you run unit tests, and contains HTML for the code coverage. Specifically, open `coverage/index.html` to view the code coverage. 100% coverage is required.
- [readme_files](readme_files) Images used in readme.
- [src](src) Contains the application code.
- [src/airflow](src/airflow/) DAG file to run the app through Airflow.
- [src/database](src/database) SQLLite DB.
- [src/modules](src/pipeline/) Modules to run the ETL pipeline and generate the Excel report.
- [src/reports](src/reports/) Contains the generated Excel reports.
- [tests](tests) unit tests, integration tests and data samples.

0 comments on commit 67241a8

Please sign in to comment.