GitHub - petaly-labs/petaly: Python Open-source ETL tool for seamless data movement across PostgreSQL, MySQL, Redshift, BigQuery, S3, GCS, and CSV files, with yaml/json-based configuration.

Overview

Petaly is an open-source ETL/ELT (Extract, Load, "Transform") tool, created by and for data professionals! Our mission is to simplify data movement across different platforms with a tool that truly understands the needs of the data community.

Key Features

Multiple Data Sources: Support for various endpoints:
- PostgreSQL
- MySQL
- BigQuery
- Redshift
- Google Cloud Storage (GCS Bucket)
- S3 Bucket
- Local CSV files
Features:
- Source to target schema evaluation and mapping
- CSV file load with column-type recognition
- Target table structure generation
- Configurable type mapping between different databases
- Full table unload/load in CSV format
User-Friendly: No programming knowledge required
YAML/JSON Configuration: Easy pipeline setup
Cloud Ready: Full support for AWS and GCP

[EXPERIMENTAL]:

Petaly went agentic!
The AI Agent can create and run pipeline using natural language prompts.
If you're interested in exploring, check out the experimental branch: petaly-ai-agent

Feedback is welcome!

Requirements

System Requirements

Python 3.10 - 3.12
Operating System:
- Linux
- MacOS

Note: Petaly may work on other operating systems and Python versions, but these haven't been tested yet.

Installation

Basic Installation

# Create and activate virtual environment
mkdir petaly
cd petaly
python3 -m venv .venv
source .venv/bin/activate

# Install Petaly
python3 -m pip install petaly

Cloud Provider Support

GCP Support

# Install with GCP support
python3 -m pip install petaly[gcp]

Prerequisites:

Install Google Cloud SDK
Configure access to your Google Project
Set up service account authentication

AWS Support

# Install with AWS support
python3 -m pip install petaly[aws]

Prerequisites:

Install AWS CLI
Configure AWS credentials

Full Installation

# Install all features including AWS, GCP
python3 -m pip install petaly[all]

From Source

# Clone the repository
git clone https://github.com/petaly-labs/petaly.git
cd petaly

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install development dependencies
pip3 install -r requirements.txt

# Install in editable mode (recommended)
pip install -e .

# Alternative: Add src to PYTHONPATH
export PYTHONPATH=$PYTHONPATH:$(pwd)/src

Configuration

1. Initialize Configuration

# Create petaly.ini in default location (~/.petaly/petaly.ini)
python3 -m petaly init

# Or specify custom location
python3 -m petaly -c /absolute-path-to-your-config-dir/petaly.ini init

2. Set Environment Variable (Optional)

# Set the environment variable if the folder differs from the default location
export PETALY_CONFIG_DIR=/absolute-path-to-your-config-dir

# Alternative run command using the main config parameter: -c /absolute-path-to-your-config-dir/petaly.ini
python3 -m petaly -c /absolute-path-to-your-config-dir/petaly.ini [command]

3. Initialize Workspace

Configure petaly.ini:

[workspace_config]
pipeline_dir_path=/home/user/petaly/pipelines
logs_dir_path=/home/user/petaly/logs
output_dir_path=/home/user/petaly/output

[global_settings]
logging_mode=INFO
pipeline_format=yaml

Create workspace:

python3 -m petaly init --workspace

Create Pipeline

Initialize a new pipeline:

python3 -m petaly init -p my_pipeline

Follow the wizard to configure your pipeline. For detailed configuration options, see Pipeline Configuration Guide.

Run Pipeline

Execute your pipeline:

python3 -m petaly run -p my_pipeline

Run Specific Operations

# Extract data from source only
python3 -m petaly run -p my_pipeline --source_only

# Load data to target only
python3 -m petaly run -p my_pipeline --target_only

# Run specific objects
python3 -m petaly run -p my_pipeline -o object1,object2

Tutorial: CSV to PostgreSQL

Prerequisites

Petaly installed and workspace initialized
PostgreSQL server running

Steps

Initialize Pipeline

python3 -m petaly init -p csv_to_postgres

Download Test Data

# Download and extract test files
gunzip options.csv.gz
gunzip stocks.csv.gz

Configure Pipeline

Use csv as source
Use postgres as target
Configure database connection details

Run Pipeline

python3 -m petaly run -p csv_to_postgres

Example Configuration

pipeline:
  pipeline_attributes:
    pipeline_name: csv_to_postgres
    is_enabled: true
  source_attributes:
    connector_type: csv
  target_attributes:
    connector_type: postgres
    database_user: root
    database_password: db-password
    database_host: localhost
    database_port: 5432
    database_name: petalydb
    database_schema: petaly_tutorial
  data_attributes:
    use_data_objects_spec: only
    object_default_settings:
      header: true
      columns_delimiter: ","
      columns_quote: none

Documentation

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

Petaly is licensed under the Apache License 2.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
docs		docs
images		images
src/petaly		src/petaly
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MANIFEST.ini		MANIFEST.ini
README.md		README.md
install.txt		install.txt
petaly-icla.pdf		petaly-icla.pdf
petaly.ini-template		petaly.ini-template
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
set_environment.sh		set_environment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Key Features

Quick Start

Requirements

System Requirements

Installation

Basic Installation

Cloud Provider Support

GCP Support

AWS Support

Full Installation

From Source

Configuration

1. Initialize Configuration

2. Set Environment Variable (Optional)

3. Initialize Workspace

Create Pipeline

Run Pipeline

Run Specific Operations

Tutorial: CSV to PostgreSQL

Prerequisites

Steps

Example Configuration

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

petaly-labs/petaly

Folders and files

Latest commit

History

Repository files navigation

Overview

Key Features

Quick Start

Requirements

System Requirements

Installation

Basic Installation

Cloud Provider Support

GCP Support

AWS Support

Full Installation

From Source

Configuration

1. Initialize Configuration

2. Set Environment Variable (Optional)

3. Initialize Workspace

Create Pipeline

Run Pipeline

Run Specific Operations

Tutorial: CSV to PostgreSQL

Prerequisites

Steps

Example Configuration

Documentation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages