Lightweight, production-hardened Python utilities for data engineering pipelines.
- π Resilient HTTP client for ETL pipelines with bounded retries and exponential backoff.
- ποΈ PostgreSQL helper with safe pooling, sessions, and auto-rollback.
- π Structured JSON logging with automatic deep secret redaction.
pip install dehelpersA complete pipeline in under 15 lines:
from dehelpers import ResilientClient, DatabaseManager, get_logger
log = get_logger("my_pipeline", job_id="daily-sync")
client = ResilientClient()
# Connects automatically via DATABASE_URL env var
with DatabaseManager() as db, client:
users = client.get("https://jsonplaceholder.typicode.com/users").json()
log.info("Fetched users", extra={"count": len(users)})
with db.session() as session:
for user in users:
session.execute(
"INSERT INTO users (id, name) VALUES (:id, :name) ON CONFLICT DO NOTHING",
{"id": user["id"], "name": user["name"]}
)
log.info("Ingestion complete")- π Documentation: Installation, Getting Started, and FAQ
- π API Reference: Full details on every class and function
- π‘ Examples: Runnable scripts for HTTP, DB, and Logging
- π Medium Article: The story behind building this library
(For an interactive version of this diagram, see the Architecture Docs)
Here is exactly what this package is and what it is not:
| Category / Layer | What this IS | What this IS NOT |
|---|---|---|
| API / HTTP | A retry-protected wrapper around requests.Session with exponential backoff, jitter, and simple pagination. |
An asynchronous network library (like aiohttp or httpx), fully-fledged HTTP client replacement, or GraphQL API wrapper. |
| Database | A thread-safe connection manager for PostgreSQL with pooling configuration, automated transaction commits/rollbacks, and lazy DataFrame output. | An Object-Relational Mapper (ORM) (like SQLModel/SQLAlchemy ORM), schema migration engine (like Alembic), or database administration tool. |
| Logging | A zero-dependency structured JSON formatter on top of standard logging with automatic deep secrets redaction. |
A log routing system (like Fluentd/Logstash), file logger, metrics exporter, or complex log management server. |
| Execution Context | Designed for batch execution environments like Airflow tasks, ETL scripts, and containerized Docker runtimes. | Suitable for high-throughput, low-latency, real-time web servers or async microservices. |
How this package compares to a standard DIY setup:
| Feature / Criteria | Standard Setup (requests + logging + psycopg) |
dehelpers |
|---|---|---|
| Secret Leakage Protection | Manual / None. Secrets easily print to stdout or appear in exception tracebacks. | Automatic & Deep Recursive: Redacts predefined secrets from nested metadata, logs, and query parameters. |
| Retry & Jitter Strategy | Manual loops or boilerplate urllib3 retry configurations. |
Out-of-the-box resilience: Exponential backoff with random jitter and clock-based total_timeout limit. |
| Pagination Handling | Custom pagination loop logic required for every API endpoint. | Next-link strategy Protocol: Yields individual items transparently and safely with validation. |
| Connection Safety | Connection leaks or transaction rollback failures if block managers are missed. | Context-managed Session: Engine-pooled with pre-ping checks, pool timeout, and auto-rollback. |
| Dependency Footprint | Heavy setup if installing frameworks like Loguru, Structlog, or heavy database utilities. | Ultra-lightweight: Base dependencies are minimal. Pandas is entirely optional and lazy-loaded. |
| Parameter | Default | Description |
|---|---|---|
DATABASE_URL (env var) |
β | PostgreSQL connection string (fallback when dsn is not passed) |
pool_size |
5 | Persistent connections in the pool |
max_overflow |
2 | Extra connections beyond pool_size |
pool_recycle |
1800 | Seconds before connection recycling |
pool_pre_ping |
True | Health-check connections before use |
pool_timeout |
30 | Seconds to wait for a pool connection |
The logger and API client automatically redact values for these keys in log output:
password, secret, token, api_key, authorization, dsn, connection_string, credential, passphrase, private_key, client_secret
Matching is case-insensitive substring β e.g. db_password matches password.
You can extend the redaction list:
from dehelpers._redact import redact_dict
result = redact_dict(
{"my_custom_secret": "value"},
extra_sensitive_keys=frozenset({"my_custom_secret"}),
)URL query parameter values are redacted, but path segments are not. Never construct URLs like:
https://api.example.com/v1/token/abc123/data # BAD β token in path
Instead, pass secrets via headers or request body.
If you use DatabaseManager in a forked environment (e.g. Airflow workers, multiprocessing), you must either:
- Create the
DatabaseManagerinside each worker process, or - Call
db.dispose()before forking.
SQLAlchemy connection pools are not safe to share across forked processes.
pip install -e ".[dev,dataframe]"
pytest -v --tb=short -m "not postgres"# Start a local PostgreSQL
docker run -d --name pg-test -e POSTGRES_PASSWORD=test -p 5432:5432 postgres:16
# Run integration tests
DATABASE_URL="postgresql+psycopg://postgres:test@localhost:5432/postgres" \
pytest -m postgres -vpytest --cov=dehelpers --cov-report=term-missing -m "not postgres"To ensure the library remains production-grade, reliable, and easily maintainable, we enforce the following open-source standards:
- CONTRIBUTING.md: Guidelines for cloning the fork, setting up local editable environments, running unit tests, and opening PRs.
- CODE_OF_CONDUCT.md: Our pledge to foster an inclusive, welcoming, and harassment-free community.
- CHANGELOG.md: Structured history of features, bugfixes, and breaking changes.
- LICENSE: Permissive MIT License.
Distributed under the MIT License. See LICENSE for more information.

