PROJECT: Toronto climate ETL data pipeline that extracts climate data from API and transfrom the data by combining the climate data files using python and shell script and then loading the transfromed data into a local ouptut folder

Author: 👤 Joshua Omolewa

1. Business Scenario

Company requires data engineer to obtain Toronto climate data from Canadian Climate API and concatenate them into a single file and also generate log files for error tracking . To download the weather data manually, visit https://climate.weather.gc.ca/historical_data/search_historic_data_e.html.

2. Business Requirements

Download the data from Canadian Climate API. Concatenate the downloaded data files into one final csv file, called all_years.csv as ouput. Upload the scripts and final csv file all_years.csv to Github repository.

3. Deliverable

Upload shell script, python script and all_years.csv to the github repository .

Shell script: The shell script will control every operation, including data downloading, log setting, python script running.

Python script: The Python script is used to concatenate all the data into one file.

all_years.csv: The output file to be generated after concatenating the files.

4. Specification Detail

The data required is from Station ID = 48549. The year range of the data we want is from 2020 to 2022. We only want the data in February. The data will be downloaded in hourly format. The output file will be named as all_years.csv.

Please note the following to use the climate data API (see shell script)

year = year (e.g 2022, 2023, 2000 etc)
month = 2 (this refers to February)
format= [csv|xml]: the format output
timeframe = 1: for hourly data
timeframe = 2: for daily data
timeframe = 3 for monthly data
Day = Day of the month the value of the "day" variable is not used and can be an arbitrary value
station ID= station ID, For another station, change the value of the variable stationID
format: file format (specify csv, xml e.t.c) For the data in XML format, change the value of the variable format to xml in the URL.

Project Architecture

5. STEPS USED TO COMPLETE THIS PROJECT

Download data with shell script into the input folder in the Ubuntu virtual machine (VM) and automate log generation process
Execute python script ./python_script.py from shell script to concatenate the data in input folder into one file called all_years_csv and store transformed data in output folder
Shell script to print out SUCCESS when if all operations are completed successfuly.
Upload files to the github repo using git git push

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
img		img
input		input
logs		logs
output		output
README.md		README.md
python_script.py		python_script.py
shell_script.sh		shell_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PROJECT: Toronto climate ETL data pipeline that extracts climate data from API and transfrom the data by combining the climate data files using python and shell script and then loading the transfromed data into a local ouptut folder

Author: 👤 Joshua Omolewa

1. Business Scenario

2. Business Requirements

3. Deliverable

4. Specification Detail

Please note the following to use the climate data API (see shell script)

Project Architecture

5. STEPS USED TO COMPLETE THIS PROJECT

Note: Pipeline can be automated using chronjob if needed

PROJECT FILES

PROJECT BEING EXECUTED ON SHELL

Follow Me On

Show your support

About

Uh oh!

Releases

Packages

Languages

Joshua-omolewa/Toronto_Climate_API_ETL_project

Folders and files

Latest commit

History

Repository files navigation

PROJECT: Toronto climate ETL data pipeline that extracts climate data from API and transfrom the data by combining the climate data files using python and shell script and then loading the transfromed data into a local ouptut folder

Author: 👤 Joshua Omolewa

1. Business Scenario

2. Business Requirements

3. Deliverable

4. Specification Detail

Please note the following to use the climate data API (see shell script)

Project Architecture

5. STEPS USED TO COMPLETE THIS PROJECT

Note: Pipeline can be automated using chronjob if needed

PROJECT FILES

PROJECT BEING EXECUTED ON SHELL

Follow Me On

Show your support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages