Viadot

Documentation: https://dyvenia.github.io/viadot/

Source Code: https://github.com/dyvenia/viadot

A simple data ingestion library to guide data flows from some places to other places.

Getting Data from a Source

Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.

from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
df

Output:

	from	to	forecast	actual	index
0	2021-08-10T11:00Z	2021-08-10T11:30Z	211	216	moderate

The above df is a python pandas DataFrame object. The above df contains data downloaded from viadot from the Carbon Intensity UK API.

Loading Data to a Source

Depending on the source, viadot provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. We also provide ready-made pipelines including data validation steps using Great Expectations.

An example of loading data into SQLite from a pandas DataFrame using the SQLiteInsert Prefect task:

from viadot.tasks import SQLiteInsert

insert_task = SQLiteInsert()
insert_task.run(table_name=TABLE_NAME, dtypes=dtypes, db_path=database_path, df=df, if_exists="replace")

Running tests

To run tests, log into the container and run pytest:

cd viadot/docker
run.sh
docker exec -it viadot_testing bash
pytest

Running flows locally

You can run the example flows from the terminal:

run.sh
docker exec -it viadot_testing bash
FLOW_NAME=hello_world; python -m viadot.examples.$FLOW_NAME

However, when developing, the easiest way is to use the provided Jupyter Lab container available at http://localhost:9000/.

How to contribute

Clone the release branch
Pull the docker env by running viadot/docker/update.sh -t dev
Run the env with viadot/docker/run.sh
Log into the dev container and install in development mode so that viadot will auto-install at each code change:

docker exec -it viadot_testing bash
pip install -e .

Edit and test your changes with pytest
Submit a PR. The PR should contain the following:

new/changed functionality
tests for the changes
changes added to CHANGELOG.md
any other relevant resources updated (esp. viadot/docs)

Please follow the standards and best practices used within the library (eg. when adding tasks, see how other tasks are constructed, etc.). For any questions, please reach out to us here on GitHub.

Style guidelines

the code should be formatted with Black using default settings (easiest way is to use the VSCode extension)
commit messages should:
- begin with an emoji
- start with one of the following verbs, capitalized, immediately after the summary emoji: "Added", "Updated", "Removed", "Fixed", "Renamed", and, sporadically, other ones, such as "Upgraded", "Downgraded", or whatever you find relevant for your particular situation
- contain a useful description of what the commit is doing

Name		Name	Last commit message	Last commit date
Latest commit History 756 Commits
.config		.config
.github		.github
.vscode		.vscode
docker		docker
docs		docs
tests		tests
viadot		viadot
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Viadot

Getting Data from a Source

Loading Data to a Source

Running tests

Running flows locally

How to contribute

Style guidelines

About

Releases

Packages

Languages

License

myeh3/viadot

Folders and files

Latest commit

History

Repository files navigation

Viadot

Getting Data from a Source

Loading Data to a Source

Running tests

Running flows locally

How to contribute

Style guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages