Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add T015–T017 to clean three ADB ATO data sets #78

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Data cleaning and update for T015_T016_T017 ato dataset
  • Loading branch information
HannaMorde committed Nov 28, 2023
commit dd5a503c0b161391dae754cb113e85b713951ead
25 changes: 25 additions & 0 deletions item/hanna/Summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Instructions that was followed during cleaning of the dataset.

## The following steps has been followed during importing and cleaning of the input dataset.

```
> Step 1) Load "ATO Workbook (TRANSPORT ACTIVITY & SERVICES (TAS))2023.xlsx" file and transform into DataFrame using Panda.

>> Step 2) Load "master dataset.csv" file and transform into DataFrame using Panda, and create a new output DataFrame using
columuns extracted from the "master dataset.csv" file.

> Step 3) Extract the following attributes such as: Vehicle Type, Variable, Unit, Unit_factor, Rule_id, Mode, Source,
Service, Indicator and Sheet_name from the upper part of the DataFrame.

>> Step 4) Extract the Economy Code and the value under each years within the series from the lower part of the DataFrame, as
well as cleaning of unwanted columuns was done.

Step 5) The country_region_mapping() function is used to generate Country and Region of each Economy Codes.

Step 6) The values for each years has been transformed using a unit_factor that is obtained from the upper part of the
DataFrame.

Step 7) Finally, the above extracted datas from both upper and lower part of the DataFrame are updated into the output
DataFrame and save the output as csv file.
```

Loading