Petaly is an open-source ETL/ELT (Extract, Load, "Transform") tool, created by and for data professionals! Our mission is to simplify data movement across different platforms with a tool that truly understands the needs of the data community.
-
Multiple Data Sources: Support for various endpoints:
- PostgreSQL
- MySQL
- BigQuery
- Redshift
- Google Cloud Storage (GCS Bucket)
- S3 Bucket
- Local CSV files
-
Features:
- Source to target schema evaluation and mapping
- CSV file load with column-type recognition
- Target table structure generation
- Configurable type mapping between different databases
- Full table unload/load in CSV format
-
User-Friendly: No programming knowledge required
-
YAML/JSON Configuration: Easy pipeline setup
-
Cloud Ready: Full support for AWS and GCP
[EXPERIMENTAL]:
Petaly went agentic!
The AI Agent can create and run pipeline using natural language prompts.
If you're interested in exploring, check out the experimental branch: petaly-ai-agent
Feedback is welcome!
- Python 3.10 - 3.12
- Operating System:
- Linux
- MacOS
Note: Petaly may work on other operating systems and Python versions, but these haven't been tested yet.
# Create and activate virtual environment
mkdir petaly
cd petaly
python3 -m venv .venv
source .venv/bin/activate
# Install Petaly
python3 -m pip install petaly
# Install with GCP support
python3 -m pip install petaly[gcp]
Prerequisites:
- Install Google Cloud SDK
- Configure access to your Google Project
- Set up service account authentication
# Install with AWS support
python3 -m pip install petaly[aws]
Prerequisites:
- Install AWS CLI
- Configure AWS credentials
# Install all features including AWS, GCP
python3 -m pip install petaly[all]
# Clone the repository
git clone https://github.com/petaly-labs/petaly.git
cd petaly
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install development dependencies
pip3 install -r requirements.txt
# Install in editable mode (recommended)
pip install -e .
# Alternative: Add src to PYTHONPATH
export PYTHONPATH=$PYTHONPATH:$(pwd)/src
# Create petaly.ini in default location (~/.petaly/petaly.ini)
python3 -m petaly init
# Or specify custom location
python3 -m petaly -c /absolute-path-to-your-config-dir/petaly.ini init
# Set the environment variable if the folder differs from the default location
export PETALY_CONFIG_DIR=/absolute-path-to-your-config-dir
# Alternative run command using the main config parameter: -c /absolute-path-to-your-config-dir/petaly.ini
python3 -m petaly -c /absolute-path-to-your-config-dir/petaly.ini [command]
- Configure
petaly.ini
:
[workspace_config]
pipeline_dir_path=/home/user/petaly/pipelines
logs_dir_path=/home/user/petaly/logs
output_dir_path=/home/user/petaly/output
[global_settings]
logging_mode=INFO
pipeline_format=yaml
- Create workspace:
python3 -m petaly init --workspace
Initialize a new pipeline:
python3 -m petaly init -p my_pipeline
Follow the wizard to configure your pipeline. For detailed configuration options, see Pipeline Configuration Guide.
Execute your pipeline:
python3 -m petaly run -p my_pipeline
# Extract data from source only
python3 -m petaly run -p my_pipeline --source_only
# Load data to target only
python3 -m petaly run -p my_pipeline --target_only
# Run specific objects
python3 -m petaly run -p my_pipeline -o object1,object2
- Petaly installed and workspace initialized
- PostgreSQL server running
- Initialize Pipeline
python3 -m petaly init -p csv_to_postgres
- Download Test Data
# Download and extract test files
gunzip options.csv.gz
gunzip stocks.csv.gz
- Configure Pipeline
- Use
csv
as source - Use
postgres
as target - Configure database connection details
- Run Pipeline
python3 -m petaly run -p csv_to_postgres
pipeline:
pipeline_attributes:
pipeline_name: csv_to_postgres
is_enabled: true
source_attributes:
connector_type: csv
target_attributes:
connector_type: postgres
database_user: root
database_password: db-password
database_host: localhost
database_port: 5432
database_name: petalydb
database_schema: petaly_tutorial
data_attributes:
use_data_objects_spec: only
object_default_settings:
header: true
columns_delimiter: ","
columns_quote: none
We welcome contributions! Please see our Contributing Guide for details.
Petaly is licensed under the Apache License 2.0. See the LICENSE file for details.