GitHub - Rafo044/SimpleDataFlow: Data moves from the source, gets cleaned and transformed, then saved in a usable format like CSV for further use.

Introduction

In this project, I created a simple ETL data pipeline to work with different types of datasets. The pipeline takes the data, changes it into a clean format, and saves it as CSV files. All steps of the process are written into a log file, so it is easy to follow what happened and fix problems if needed.

Data Flow

Folder Structure

SimpleDataFlow/
│
├── data/
│   ├── json/
│   │   ├── source1.json
│   │   ├── source2.json
│   │   └── source3.json
│   │
│   ├── log/
│   │   └── log_file.txt
│   │
│   ├── processed/
│   │   ├── source1.csv
│   │   ├── source2.csv
│   │   └── source3.csv
│   │
│   └── xml/
│       ├── source1.xml
│       ├── source2.xml
│       └── source3.xml
│
├── docs/
│   ├── image/
│   │   └── SimpleDataFlow.png
│   │
│   └── index.md
│
├── src/
│   └── etl.py
│
├── tests/
│   └── test_etl.py
│
├── .gitignore
├── LICENSE
├── README.md
├── mkdocs.yml
└── requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Data Flow

Folder Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
LICANSE		LICANSE
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt

Rafo044/SimpleDataFlow

Folders and files

Latest commit

History

Repository files navigation

Introduction

Data Flow

Folder Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages