I design and build production-grade data systems: batch ELT with dbt + Airflow/Meltano, streaming with Kafka/Spark, analytics serving on ClickHouse and DuckDB/Polars for fast iteration. I care about data contracts, tests, lineage, cost/performance, and developer experience (pre-commit, CI, reproducibility).
- Focus: Retail analytics, event streaming, lakehouse on S3 + ClickHouse, dbt modeling, cost optimization.
- Toolbox: Python, SQL, ClickHouse, dbt, Airbyte/Meltano, Airflow, Spark, Kafka, Terraform, Docker, AWS.
- Principles: “Measure twice, cut once.” Automate quality, document decisions, ship small and observable changes.
-
ETL Mobilidade — end-to-end ELT scaffold
Python · dbt · GH Actions · Docs (MKDocs) · Tests
➜ https://github.com/Robso-creator/etl_mobilidade -
ELT with Meltano (ind.) — pipelines & dev-prod workflows
Meltano · dbt · Pre-commit · CI templates
➜ https://github.com/Robso-creator/elt_meltano_ind -
Spark Streaming — Unstructured Data — streaming blueprint
Spark Structured Streaming · Kafka-ready · Dockerized dev
➜ https://github.com/Robso-creator/spark-streaming-unstructured-data -
Discord Bot (DX demo) — testing, docs & CI discipline
Pytest · Pre-commit · MIT License · Pages
➜ https://github.com/Robso-creator/discord_bot
Want a quick overview? Check my Portfolio / Case Studies with context, architecture and trade-offs: https://robsonsampaio.vercel.app.
- Languages: Python (Typer/FastAPI), SQL
- Storage/Compute: ClickHouse (MergeTree, MVs), DuckDB/Polars, S3
- Orchestration: Airflow, Meltano, dbt (core + tests)
- Streaming: Kafka, Spark Structured Streaming
- Quality & Lineage: Pytest, Pandera/Great Expectations, OpenLineage/Marquez
- Infra/DevEx: AWS, Terraform, Docker, pre-commit, GitHub Actions
- Contracts & Tests: schema validation (Pandera/GX) + Pytest (unit/integration) with coverage gates.
- CI/CD: Ruff + MyPy + Pytest on PR, Codecov badges, Dependabot for security.
- Docs:
docs/with MKDocs auto-deployed (GitHub Pages), architecture diagrams and runbooks. - Reproducibility:
make up && make seed && make test && make demo(Docker Compose). - Observability: lineage (OpenLineage), data SLOs (freshness/completeness), cost/perf notes for ClickHouse (partitions, codecs, TTLs).
- Security: least-privilege IAM, secrets management, audit logs.
- LinkedIn: https://www.linkedin.com/in/robson-allef
- Email: robson.sampaio@rtechs.tech
Engenheiro de Dados focado em ELT/streaming, ClickHouse, dbt e Airflow, com práticas de qualidade, contratos de dados, CI e documentação. Curto construir stacks reprodutíveis, com observabilidade e custo/performance na prática.
Destaques:
etl_mobilidade · elt_meltano_ind · spark-streaming-unstructured-data · discord_bot
Portfólio com estudos de caso:
Contato: https://www.linkedin.com/in/robson-allef · robson.sampaio@rtechs.tech



