Skip to content
View Robso-creator's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Highlights

  • Pro

Block or report Robso-creator

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Robso-creator/README.md

Robson Allef — Data Engineer (ELT/Streaming, ClickHouse, dbt, Airflow)

GitHub Followers Open to Work Tech Blog Portfolio LinkedIn

I design and build production-grade data systems: batch ELT with dbt + Airflow/Meltano, streaming with Kafka/Spark, analytics serving on ClickHouse and DuckDB/Polars for fast iteration. I care about data contracts, tests, lineage, cost/performance, and developer experience (pre-commit, CI, reproducibility).

  • Focus: Retail analytics, event streaming, lakehouse on S3 + ClickHouse, dbt modeling, cost optimization.
  • Toolbox: Python, SQL, ClickHouse, dbt, Airbyte/Meltano, Airflow, Spark, Kafka, Terraform, Docker, AWS.
  • Principles: “Measure twice, cut once.” Automate quality, document decisions, ship small and observable changes.

🔭 Highlights (selected work)

Want a quick overview? Check my Portfolio / Case Studies with context, architecture and trade-offs: https://robsonsampaio.vercel.app.


🧰 Tech Stack

  • Languages: Python (Typer/FastAPI), SQL
  • Storage/Compute: ClickHouse (MergeTree, MVs), DuckDB/Polars, S3
  • Orchestration: Airflow, Meltano, dbt (core + tests)
  • Streaming: Kafka, Spark Structured Streaming
  • Quality & Lineage: Pytest, Pandera/Great Expectations, OpenLineage/Marquez
  • Infra/DevEx: AWS, Terraform, Docker, pre-commit, GitHub Actions

📈 How I work (prod-ready defaults)

  • Contracts & Tests: schema validation (Pandera/GX) + Pytest (unit/integration) with coverage gates.
  • CI/CD: Ruff + MyPy + Pytest on PR, Codecov badges, Dependabot for security.
  • Docs: docs/ with MKDocs auto-deployed (GitHub Pages), architecture diagrams and runbooks.
  • Reproducibility: make up && make seed && make test && make demo (Docker Compose).
  • Observability: lineage (OpenLineage), data SLOs (freshness/completeness), cost/perf notes for ClickHouse (partitions, codecs, TTLs).
  • Security: least-privilege IAM, secrets management, audit logs.

📬 Contact


📊 Stats

Top Langs

GitHub Stats


🇧🇷 PT-BR (resumo)

Engenheiro de Dados focado em ELT/streaming, ClickHouse, dbt e Airflow, com práticas de qualidade, contratos de dados, CI e documentação. Curto construir stacks reprodutíveis, com observabilidade e custo/performance na prática.

Destaques:
etl_mobilidade · elt_meltano_ind · spark-streaming-unstructured-data · discord_bot
Portfólio com estudos de caso:

Contato: https://www.linkedin.com/in/robson-allef · robson.sampaio@rtechs.tech

Pinned Loading

  1. etl_mobilidade etl_mobilidade Public

    Projeto elaborado em 72 horas para realizar a extração, transformação e carga de dados de mobilidade urbana da prefeitura de Belo Horizonte.

    Python 1

  2. elt_meltano_ind elt_meltano_ind Public

    This project is a solution for data extraction, transformation, and loading (ELT) using Airflow, Meltano, Streamlit, and PostgreSQL. It allows extracting data from different sources, loading it int…

    Python

  3. spark-streaming-unstructured-data spark-streaming-unstructured-data Public

    Project implements a scalable data pipeline architecture that combines Apache Spark's processing capabilities with AWS services for data storage, cataloging, and analysis

    Python

  4. discord_bot discord_bot Public

    This is a Discord bot project developed in Python that offers a variety of features to enhance the user experience on Discord servers. The bot was created to be flexible, user-friendly, and expand…

    Python