PROJECT: Toronto climate ETL data pipeline that extracts climate data from API and transfrom the data by combining the climate data files using python and shell script and then loading the transfromed data into a local ouptut folder
Company requires data engineer to obtain Toronto climate data from Canadian Climate API and concatenate them into a single file and also generate log files for error tracking . To download the weather data manually, visit https://climate.weather.gc.ca/historical_data/search_historic_data_e.html.
Download the data from Canadian Climate API. Concatenate the downloaded data files into one final csv file, called all_years.csv as ouput. Upload the scripts and final csv file all_years.csv to Github repository.
Upload shell script, python script and all_years.csv to the github repository .
Shell script: The shell script will control every operation, including data downloading, log setting, python script running.
Python script: The Python script is used to concatenate all the data into one file.
all_years.csv: The output file to be generated after concatenating the files.
The data required is from Station ID = 48549. The year range of the data we want is from 2020 to 2022. We only want the data in February. The data will be downloaded in hourly format. The output file will be named as all_years.csv.
- year = year (e.g 2022, 2023, 2000 etc)
 - month = 2 (this refers to February)
 - format= [csv|xml]: the format output
 - timeframe = 1: for hourly data
 - timeframe = 2: for daily data
 - timeframe = 3 for monthly data
 - Day = Day of the month the value of the "day" variable is not used and can be an arbitrary value
 - station ID= station ID, For another station, change the value of the variable stationID
 - format: file format (specify csv, xml e.t.c) For the data in XML format, change the value of the variable format to xml in the URL.
 
- Download data with shell script into the input folder in the Ubuntu virtual machine (VM) and automate log generation process
 - Execute python script 
./python_script.pyfrom shell script to concatenate the data in input folder into one file called all_years_csv and store transformed data in output folder - Shell script to print out SUCCESS when if all operations are completed successfuly.
 - Upload files to the github repo using git 
git push 
- LinkedIn: @omolewajoshua
 - Github: @joshua-omolewa
 
Give a ⭐️ if this project helped you!

