Skip to content

A Python framework for data processing on GCP.

License

Notifications You must be signed in to change notification settings

agnieszkarybak/bigflow

 
 

Repository files navigation

BigFlow

Documentation

  1. What is BigFlow?
  2. Getting started
  3. Installing Bigflow
  4. Help me
  5. BigFlow tutorial
  6. CLI
  7. Configuration
  8. Project structure and build
  9. Deployment
  10. Workflow & Job
  11. Starter
  12. Technologies
  13. Logging
  14. Development

Cookbook

What is BigFlow?

BigFlow is a Python framework for data processing pipelines on GCP.

The main features are:

Getting started

Start from installing BigFlow on your local machine. Next, go through the BigFlow tutorial.

Installing BigFlow

Prerequisites. Before you start, make sure you have the following software installed:

  1. Python == 3.7
  2. Google Cloud SDK
  3. Docker Engine

You can install the bigflow package globally but we recommend to install it locally with venv, in your project's folder:

python -m venv .bigflow_env
source .bigflow_env/bin/activate

Install the bigflow PIP package:

pip install bigflow[bigquery,dataflow,dataproc,log]

Test it:

bigflow -h

Read more about BigFlow CLI.

To interact with GCP you need to set a default project and log in:

gcloud config set project <your-gcp-project-id>
gcloud auth application-default login

Finally, check if your Docker is running:

docker info

Help me

You can ask questions on our gitter channel or stackoverflow.

About

A Python framework for data processing on GCP.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.4%
  • Jinja 2.3%
  • Dockerfile 0.3%